The Case for Speech Recognition.
For The Record - April 2005 - Vol.
17 No 9 P20.
Committed
to Enhancing the Health Information Profession
An industry staple for the past 15
years,
For the Record is distributed at many state,
regional, and national meetings and is endorsed by 28 state Health
Information Management Associations. Backed by an accomplished
editorial team and a creative art staff, For the Record
informs health information management professionals
with articles that inspect key issues and address industry concerns
in a graphically appealing style.
The Case for Speech Recognition
A sturdy hardware platform, a
comprehensive pilot program, and professional training can help
ensure a tidy return on investment.
Traditionally viewed as simply a
means of dictating text into a personal computer, today’s
speech-recognition software can play a far more significant role in
the healthcare environment. In addition to pure dictation,
speech-recognition software can be used to manage e-mail, streamline
repetitive tasks on the PC, reduce transcription and charting costs,
speed up information turnaround, and protect employees from
repetitive stress injuries (RSIs).
The software can be integrated
with most electronic medical record (EMR) applications to make those
programs more effective and easier to use. Rapid hardware
advancements and improvements in the technology itself have
increased its utility, accuracy, speed, and ease of use. This has
brought the cost of ownership to an affordable level for any size
medical office or clinic, medical department within a healthcare
organization, and even entire hospitals. When properly implemented,
speech-recognition software can increase productivity for every
employee who works with a computer.
Like any technology, the
deployment of a speech-recognition program should be carefully
planned so as to achieve the full benefit of the software and
maximize the return on investment. This article provides an overview
of the basics of how the software works, what medical offices or
departments can do with speech recognition, examples of savings, and
recommendations for implementation.
Why Use Speech-Recognition
Software?
The reason is simple—most people can speak much faster than they can
type. A relatively fast typist who can type 50 net words per minute
(1) can produce a 300-word e-mail in six minutes. Using
speech-recognition software, a person dictating 140 to 160 words per
minute without any errors can produce the same 300-word e-mail in
roughly two minutes. This does not include the additional time the
person can save by using voice commands to open the e-mail program,
look up an e-mail address from their contact management software
programs, and send the e-mail by voice.
How Does Speech-Recognition
Software Work?
Speech-recognition software uses the human voice as the main
communication mechanism between the user and the computer. While
relatively simple to use, speech-recognition software is
sophisticated technology that uses “language modelling” to recognize
and differentiate among the millions of human utterances that make
up any language.(2)
The software enables users to
input text and data into virtually any Microsoft Windows-based
application by voice, as well as to navigate the computer desktop
with little or no use of their hands. Users speak naturally into a
noise-cancelling microphone connected to the computer.(3) The
software “recognizes” the spoken words, converts them into text, and
displays them on the screen for review.
Most speech-recognition programs
also allow users to speak a standard command that prompts the
computer to perform an action. For example, the user says, “Start
WordPerfect.” The more advanced speech-recognition programs also
enable users to create customized commands (macros), such as “Send
an e-mail to Doug Z,” which will open an e-mail addressed to Doug Z.
Configuring the software during
set-up is referred to as “enrollment.” After installing the program,
each user must read aloud from a choice of prepared texts for
approximately five minutes. Based on the dictation the application
captures, the software analyzes how the user pronounces each word
and stores the data to prepare a unique user profile for that
individual.
As an individual uses the software
and corrects recognition errors, the software becomes increasingly
accurate by learning his or her particular speaking style. Most
medical recognition programs enable users to add new words or
customize the vocabulary for their particular practice or specialty.
Using specialty vocabularies can improve accuracy even further. Some
speech-recognition software programs include a medical
vocabulary—incorporating diseases, medications, procedures, and
acronyms in addition to the standard business vocabulary—and can
automatically recognize and format prescriptions and patient
encounters. For certain programs, specialty medical vocabularies can
also be created in-house or purchased from third-party sources.
How is Speech Recognition Used
to Replace Traditional Transcription?
There are many different ways to implement a speech-recognition
solution. Most people choose to have individuals dictate directly
into their own PC and view the transcription as it occurs to correct
any errors. Another method allows users to dictate into a handheld
digital recorder for the user or an assistant to download onto a PC.
Instead of transcribing from scratch, the assistant will download
the audio file, listen to the recorded dictation while reading the
text on screen, and make corrections or edits as necessary.
Speech Recognition Uses
Many different types of healthcare workers can benefit from using
speech recognition. Individual uses for speech recognition can vary
for each employee depending on their responsibilities, workflow,
preferences, and other applications they use as part of their daily
routine. Today, speech recognition is successfully used by a wide
array of healthcare professionals, including doctors, nurses,
physician assistants, pharmacists, administrators, and
transcriptionists.
Dictation is the most versatile
and widespread use for speech-recognition software. Some individuals
can’t or prefer not to type, either because they are untrained as
typists, have a disability, or wish to prevent RSIs. Many practices
have decreased the number of support staff and require physicians to
generate their own records. Even doctors who typically dictate
documents for others to transcribe may use speech recognition
occasionally, such as when they need to produce a document on the
spot or after hours or when they are responding to e-mail.
Doctors who wish to maintain their
traditional workflow can dictate into a handheld recorder (4) or
save their recorded dictation (5) with their documents for someone
else to transcribe or correct at a later time. This can
substantially reduce the turnaround time over traditional
transcription. If transcription is produced in-house, using
speech-recognition software frees up support staff for more
productive tasks. If transcription is outsourced for correction, it
can significantly reduce an organization’s overhead costs.
Navigate the Windows Desktop
by Voice
Speech-recognition software enables users to “command and control”
the computer desktop simply by using their voice. Virtually any menu
item or dialog box can be controlled for hands-free operation. Users
can edit and format their work, launch applications and open files,
cut and paste, and insert standard blocks of text or even their
scanned signature.
Create, Manage, and Send
e-Mail
Managing e-mail takes up an increasing amount of everyone’s day.
Speech-recognition software can be customized so users can create,
navigate, respond, and send e-mail, all by voice, using their
preferred e-mail program. In addition, some speech-recognition
programs contain text-to-speech technology that allows users to have
their e-mail documents read aloud, which enables them to complete
other tasks while reviewing their e-mail.
Mastering the Mundane
Repetitive tasks, such as data entry or form filling, can be
accelerated using speech. In many cases, users who are unfamiliar
with complex software programs are more comfortable “telling” the
computer what to do than trying to master the interface. Macros can
be created to enable users to go from field to field by voice, or to
perform a sequence of keystrokes or mouse movements. The software
can even be configured so a patient’s EMR can be created and edited
using only voice commands.
Create a Paperless Office
Many practices seek to convert all their paper documents into
electronic files to facilitate a secure archive and provide remote
access to staff or patients. Most Windows-based applications can be
navigated by voice using speech-recognition software. The software
can help facilitate the move to a paperless office by making it
easier for anyone to create, format, dictate into, search, and
manage electronic documents by voice.
Increase Productivity Outside
the Office
Healthcare professionals can increase their productivity during
travel time or whenever they are away from the office by dictating
into a portable handheld recorder for transcription later. In
addition, some software programs enable users to easily export their
user file via the network or portable storage device for use on
another computer or laptop so they can use speech recognition
anywhere—at the office, at home, or even on the road.
Work on the Web by Voice
Speech-recognition software enables users to search the Web, access
information, and navigate Web pages by speaking URLs and links.
EMR Applications
Many EMR applications can be more effective and easy to use when
deployed in conjunction with a speech-recognition solution.
Searches, queries, and form filling are all faster to perform by
voice than using a keyboard. Charting, prescription writing,
aftercare instructions, order entry, database searches, document
assembly/automation, and patient record management software programs
are all highly conducive to control by speech. Tasks such as text
and data entry can be completed by voice in most of the programs
without any customization. Other functions can easily be performed
using macros or by speech-enabling the application using a software
development kit.
Avoid RSIs
Musculoskeletal disorders (MSDs), including RSIs, are the single
largest job-related injury in the United States. According to the
Occupational Safety and Health Administration (OSHA), 1.8 million
U.S. workers experience work-related disorders annually.(6)
RSIs, which are often incurred by
employees working at computers, are the most common MSD. RSIs occur
when muscles or tendons are repeatedly overused or forced into an
unnatural position. Keyboarding, clicking, and manoeuvring the mouse
strains and damages muscles and tendons in the fingers, hands,
wrists, and arms.
The widespread use of computers in
the workplace has contributed to the ubiquity of RSI pain and
discomfort. OSHA has identified repetition, such as using a keyboard
and/or mouse steadily for more than four hours daily, as a risk
factor that could cause an RSI or MSD. “Intensive computer use
accounts for a significant number of MSDs each year, and
occupational computer use is growing,” according to OSHA reports.(7)
While most RSI sufferers are able
to find appropriate treatment and return to their positions, some
become permanently disabled and are never able to use their hands to
operate a computer again. Workers with severe MSDs often face
permanent disability that prevents them from returning to their
jobs.
Speech-recognition software can
minimize or eliminate keyboarding and mouse movements that damage
and strain muscles, tendons, and nerves due to excessive repetition.
By giving employees with intensive computer use access to
speech-recognition software, you can prevent an injury before
problems arise or help employees return to work sooner, reducing
workers’ compensation, medical, and replacement labor costs. A
recent study on RSIs in the workplace highlights the average cost of
this type of injury at $20,000 per affected employee.
Assisting with ADA Compliance
Strategies
Title I of the Americans with Disabilities Act (ADA) of 1990
prohibits employers from discriminating against qualified
individuals with disabilities. The workforce includes many qualified
individuals with disabilities who can productively use computers
when equipped with speech-recognition software and supporting
hardware and software. Hiring and retaining qualified workers with
disabilities is not only a smart employment practice for most
employers, it’s the law.
Since speech-recognition software
can help employers hire and maintain qualified workers with RSIs and
other disabilities, this technology plays an important role in
employers’ ADA compliance strategies.(8)
Return on Investment
Speech-recognition software can help healthcare organizations save a
significant amount of money. The benefits can be realized in a
provider as small as a solo practitioner’s office all the way to a
hospital with several hundred doctors and nurses on staff.
Typically, a single doctor or
nurse who utilizes an outside transcription service spends between
$10,000 and $30,000 per year digitizing dictation depending on the
individual’s workload. For example, a private practice doctor in San
Diego replaced outsourced transcription with a voice-recognition
solution and saved more than $10,000 per year by eliminating the
need for transcription. In addition, he now has time to see more
patients each day because he completes the paperwork for each
patient during their visit.
The savings potential in larger
organizations can be tremendous. A large medical group in Seattle
saved $90,000 the first year it deployed speech recognition and
$240,000 the next year as it rolled out the solution to all its
doctors and eliminated the need for an in-house transcription staff.
Basics of Implementing a
Speech-Recognition Solution
Successful implementation of a speech-recognition software program
requires careful attention to hardware, user training, and
customization. Some healthcare organizations manage their own speech
recognition installation, customization, and training, but most
prefer to outsource this work to the software manufacturer, a system
integrator, or a speech recognition value-added reseller (VAR).
Hardware Recommendations
Most organizations develop a standard hardware platform for speech
recognition users, with alternative options for employees who use
speech recognition on a laptop, dictate into a handheld digital
recorder, or have special needs. System requirements for
speech-recognition software vary by software manufacturer. Minimum
needs will also vary by the type and number of applications that
users deploy. Most speech-recognition programs run on PC systems,
although some Macintosh-based products are available.
Although speech-recognition
programs will automatically adjust to the processor and memory of
your computer to provide the best combination of accuracy and speed
possible, most users will be happier with systems that exceed the
software manufacturer’s minimum requirements. Speech-recognition
software is processor-intensive, and in general, the faster the
processor, the better the performance. Users who wish to have
multiple applications running at the same time will also benefit
from having more RAM on their system than the minimum.
A computer’s sound card is another
factor that can affect performance. Speech-recognition programs
require a sound card that will accurately process the electrical
charges that your voice creates when you speak into the microphone.
Static or electrical interference will make it difficult or
impossible to achieve good speech recognition accuracy. Because of
this, speech programs require a high-quality 16-bit sound card.
Check with the software manufacturer to verify which sound cards are
certified to work with the program.
The software performance can also
be affected by the quality of the microphone. Speech recognition
requires a high-quality, high-level speech signal. Noise-cancelling
microphones help block out high ambient noise levels. Most
speech-recognition programs are sold with a high-quality,
noise-cancelling headset microphone that is specifically tuned to the
software. Users who do not like wearing a headset may prefer an
array microphone; others may opt for a wireless headset. Combined
dictation/telephone headsets are also available. Most laptop users
achieve high performance with a regular headset microphone, but
users who are unable to achieve satisfactory sound quality from
their laptop’s built-in sound hardware may wish to use a USB
(universal serial bus) microphone that processes their voice signal
before sending it to the computer. Check with the software
manufacturer to verify which microphones are certified to work with
its program.
User Expectations and Training
Setting realistic expectations has a critical impact on the success
or failure of a speech-recognition program. Although the software
itself is easy to install and operate, users who are not accustomed
to dictating their thoughts may need practice. Most physicians who
are familiar with dictation will find it easy to adopt
speech-recognition software. However, they may be used to mumbling
or garbling words and expecting the transcriptionist to interpret
what they are saying. The quality of the “human sound signal” is
just as important as the sound card’s quality.
Although users can begin dictating
and using the software after completing their initial five-minute
enrollment session, most people increase their productivity when
they receive training. Training speeds the learning curve, instills
confidence in users, reduces support costs, promotes the success of
a pilot program, and maximizes return on investment.
Program Customization
Users who are dictating documents just for others to transcribe and
correct may not need program customization, but virtually everyone
can benefit from customizing the product to complete routine tasks
faster.
Customization may be as simple as
the creation of a macro that inserts your name and title at the end
of a letter when you say “my signature” or as complex as a macro
that executes a series of keyboard commands and mouse strokes with a
spoken command. Macro creation tools are typically included in
high-end speech-recognition software systems. Although simple macros
are easy for users to create, in most cases firms will achieve
better results if an IT (information technology) staff member or a
speech recognition consultant works with each user to analyze their
workflow and customize the program to their needs.
Creating a custom vocabulary
including patient, staff, and other physicians’ names will increase
accuracy. Many speech-recognition programs permit custom
vocabularies and macros to be exported and shared by multiple users,
which decreases the time and cost associated with customization.
Individual users can increase their accuracy by running a feature
contained in most speech programs that analyzes the user’s written
documents to learn their writing style and the words they use most
often.
Conducting a Pilot
The majority of healthcare organizations finds it valuable to
conduct an on-site evaluation with a small number of users before
deploying a full-scale speech-recognition program. The vendor or a
VAR can help set up a pilot, but it is important that you determine
your own criteria for evaluating productivity and participant
satisfaction before the pilot begins.
For best results, select four to
eight computer-savvy employees who want to use speech recognition
and are likely to have the time to use the software on a daily basis
during the pilot period. A typical pilot, from initial assessment
through final evaluation, lasts one to three months. Before the
pilot begins, someone from IT or the training department, the
vendor, consultant, or VAR should sit down with each participant to
analyze his or her daily routine. By doing so, custom vocabularies
and macros can be developed to enhance productivity. After the
software has been customized for each participant’s needs, group or
one-on-one training should be provided.
Conclusion
A growing number of large healthcare organizations, hospitals,
clinics, and solo medical practices have adopted speech-recognition
software programs to increase productivity, reduce costs, and
protect against RSIs. Although implementing a speech-recognition
program requires careful planning, the cost and time savings can be
substantial.
— Matt Revis is the senior
product marketing manager for dictation products at ScanSoft. He has
an MBA from Columbia Business School and has been working in speech
technology marketing for five years.
Resources
1. Net words per minute are determined by measuring a person’s
average gross speed in words per minute and subtracting the number
of errors made.
2. How do speech-recognition
software programs understand speech?
Speech-recognition software
programs are based on statistical probability. The software analyzes
an incoming stream of sounds and interprets those sounds as commands
and dictation. This process of interpretation is called speech
recognition, and its success is measured by the percentage of
correct interpretations or recognition accuracy.
The software relies on three
sources of information to achieve high recognition accuracy:
• Acoustic model — a
mathematical model of the sound patterns used by the speaker’s
language.
• Vocabulary — a list of
words the program can recognize. Each word in the vocabulary has a
text representation and pronunciation.
• Language model —
statistical information associated with a vocabulary that describes
the likelihood of words and sequences of words occurring in the
user’s speech.
When you create and train a user
profile, you start with a standard set of models and then customize
them for the way you speak (acoustic model) and the way you use
words (vocabulary and associated language model). The software
employs your customized user files to determine the words you spoke.
3. The quality and type of
noise-cancelling microphone is a critical success factor in
implementing speech recognition.
4. The handheld recorder is
typically a digital recorder. Not all recorders work with all
speech-recognition software programs. Check with the software
manufacturer to confirm whether a recorder is approved for use with
their product.
5. Some speech-recognition
programs enable users to save their recorded dictation with their
text file so they or a third party can correct or edit the file
while listening to or periodically checking the original dictation.
Check with the software manufacturer to confirm whether this feature
is available.
6. OSHA Fact Sheet. Ergonomics By
the Numbers.
7. OSHA Ergonomics Program.
Federal Register. 2000;65(220):68343.
8. The information contained in this
article does not constitute legal advice. If you have any questions
regarding the Americans with Disabilities Act or any other law, you
should contact a qualified attorney.
For The Record
Magazine
|