Re-Interfacing Human Abilities looks at the design, construction, and use of an augmentative communication aide for the speech impaired. It details the steps in building a custom aide to enable a specific person to be able to communicate. This person suffers from a disorder affecting the fine motor control center of the brain making him unable to speak, gesture, or operate complex switches.
RE-INTERFACING HUMAN ABILITIES
by
Michael T. Kadie
A thesis submitted to the Department of Computer Science
and The Graduate School of The University of Wyoming
in partial fulfillment of the requirements
for the degree of
MASTER OF SCIENCE
in
COMPUTER SCIENCE
Laramie, Wyoming
December, 1994
ACKNOWLEDGMENTS
The author would like to thank the thesis committee members, Ghasem Alijani, William Gavin, Stanley Petrick, and Thomas Bailey for their guidance, advice, and help in completing this project and thesis paper. Special thanks to George Janack for his help and advice in the electrical engineering aspect of the project and for locating affordable parts to build the aide. Thanks for supplying technical expertise and experience with augmentative device to Kathy Bodine (Easter Seals of Colorado), Tracy Kovach (Denver Childrens Hospital), Patty Prats (LCCC Learning Center), the entire staff of the Developmental Preschool (Laramie, WY), the Special Education Department at Spring Creek Elementary School, the Speech Pathology and Audiology Department of the University of Wyoming, and Sand Huburt (executive of Parents of Children With Disabilities, Laramie, WY). Thanks to people that made donations of equipment that helped make this aide possible including Mariah Associates, Inc., Donald McKeith, Mary Anne, Keybank of Laramie, Tom Jones, Sandy and David Gaddis, and Ghasem Alijani.
Table of Contents
List of Figures
Equation 1: Formula For Calculating Battery Life
Figure 2: Arcade Button Wiring Diagram
Figure 3: LCD to Amiga Parallel Port Wiring Diagram
Figure 4: Simplified Diagram of Project Hardware
Figure 6: Layout of Client Interface of Project
Figure 8: Editing Screen From Quick Keys Selection
Figure 9: Michael Submenu Screen
Figure 10: Infinite List Selection Screen
Figure 11: Infinite List Mode Screen
Figure 12: Editor Screen From Infinite List Mode
Figure 13: Client Interface Mapping
Figure 14: Sine Wave Show Within an Arbitrary Measuring Framework
Figure 15: Sine Wave With Intersection Points Marked
Figure 16: Graph of Digital Rendition of Original Waveform
Figure 17: Final Rendition of Waveform That the Computer Will Use
Figure 18: Digitized Graphical Representation of the Word A
Table 1: Power Supply Comparison
Table 2: Quick Keys Selection Formation Based on Client Interface
Table 3: Infinite List Selection
Table 4: Conversion from Alphabetical List into Matrix
Table 5: Button Assignment Within Infinite List Mode
Table 6: LCD Display General Layout
This project started as building a device/aide that would speak for a fully cognitive person trapped inside a body that could not readily communicate. This person suffered severe trauma to his fine-motor control center and as a result he is not able to speak (but can make a single sound), lacks the coordination skills required to operate anything but the largest buttons, and tends to have muscle spasms when excited. This persons family is not rich, and cannot afford the current price tags of the commercially available augmentative communication aides to help him communicate with the world at large.
Given my background in computers and electronics I decided to build a communication aide. I realized that this was a big project and went to other people throughout the university to get their input and take advantage of their background. Together we decided the aide should be based on the following set of goals:
This project lead us into the very interesting and fulfilling area of augmentative aides for the disabled. We learned many important aspects of this part of our society that is mostly ignored because of ignorance, self-absorbence, or lost humanity and caring.
There are two key factors to communication, understanding the thoughts expressed by others and expressing ones own thoughts. "Voice is a preferred means for person-to-person communication. The acoustic signal conveys a sequence of audible sounds- speech- whose meanings (within cultural groups) have been agreed upon a priori. This acoustic code constitutes language"[JF90]. Therefore, communication in a preferred way occurs within the realms of speech and hearing (speech recognition). The concept of real-time is a key issue when we consider interaction with the general population. The American population tends to lack the patience required to interact with disabled persons for extended periods of time if it is not on a real-time or near real-time basis[KB93] (near real time refers to conversation held between the 20-120 words per minute[IF87]). For completeness a possible augmentative "hearing" system will be looked at first, followed by augmentative communication systems (where this project makes its contribution).
One method of regaining "hearing" for a disabled person is through a speech recognition system. The applications for speech recognition extend beyond just hearing and have great potential in other main stream and augmentative areas, " . . . The scope of which covers information distribution, entertainment, voice messaging, transactions, and education"[DF90]. There is a great deal of research that has gone into this area, and some conclusions have been reached. It is import that there be a large vocabulary (1000+ range), speaker independence (ability to recognize multiple accents), continuous speech recognition (recognizing speech taken from fluent sentences), and word spotting (focusing on the key words in casual speech)[AS91][CV89][JW90]. Once we have conquered voice we are more than half finished. All that remains is "displaying" the information in a form that the disabled person can understand. This part needs to be customized for the individual; some examples of "displays" include:
The other aspect of communication aides is regaining the ability to "speak", the area in which this project makes its contributions. This "speech" is regained by means of an aide whereby disabled persons can communicate their thoughts to the public at large. This aide can be broken down into three subparts: interface, feedback, and output. The interface provides a method for clients needing the aide to select words or phrases to express their thoughts. This must be customized to the client needing the device and their individual abilities. Some selection interfaces include:
Feedback to the client is often done by pictures on the selection items called icons (see Communication Device Basics), text displays, or on the output of the device itself. Hopefully, speech is the output of the device or aide (see Basic Communications above). It usually takes the form of a synthesized (computer generated) or a digitized (recorded from human speech) voice. These three parts-the interface, feedback method, and output, are put together in a communication device/aide that will help enable clients to communicate their thoughts and needs.
There are three important people involved in the design and use of a communication device. They are the client, the facilitator, and the designer. The client is the person for whom the device is being built. This is the person who, due to some existing condition, is unable to communicate in an effective manner. The facilitator is the person who helps the client function in society [KB93][TK93][IF87][GV84], and is the person who will program/reprogram the device to meet the clients needs and desires. Finally, there is the designer. This is the person building the device that will allow the client to communicate.
There are some basics common to communication devices/aides that allow the disabled client to communicate with the non-disabled population in both the high tech (electronic) and low tech (non electronic) realms[GV84]. The object is to come up with a system whereby a client can communicate thoughts and ideas. The first part of developing this system is to decide upon a set of useful ideas that the client might want to communicate. It is desirable that this set be very large and as all-encompassing as possible because this brings the client closer to the norm of society[AS91][CV89]. These ideas are often expressed by using icons (simple pictures that try to clearly illustrate an idea)[IF87]. After selecting a set of icons, the next step is to arrange them in a logical fashion, usually in a square or matrix where related ideas are located in groups (see Figure 1).
Figure 1: Set of Icons
Source: "The Picture Communication Symbols"
After the matrix has been established the next step is the client selection method (i.e. how the client will select entries from the matrix). The two methods in common use are direct selection and scanning. In direct selection, the client is able to point at the desired icon; pointing is not limited to those individuals who can use their hands or fingers, but can be accomplished by looking at the desired selection. The other method, called scanning, is for clients with insufficient control to make direct selection feasible. This method usually involves row column scanning[IF87], where the client indicates by some agreed-upon signal, when to stop scrolling across a matrix of choices. This is based on a pointer which will scroll through the rows, and then the column of the matrix at a speed which gives the client the opportunity to indicate when the desired row and then column are reached. The pointer first scrolls down the rows waiting for the client to signal when the desired row has been reached, and then scrolls across the columns in that row until the client indicates the desired column has been reached. These two commands target the desired entry from a matrix of choices. These design decisions give enough information to build a low tech communication aide. By simply putting the icons in a logical order on a board (often something a simple as cardboard) a low tech communication aide can be built. For high tech (i.e., electronic) communication aides this information would be programmed into the aide.
This project was designed for a 14 year old boy named Michael, who will be referred to, in generalizations, as "the client". He is confined to a wheelchair and is unable to communicate in an effective manner. He suffered damage to the fine motor control area of his brain shortly after birth. This prevents him from being able to speak or operate complex input devices such as keyboards. He is at or above normal intelligence as determined by the public school system and the government aid agencies that supply him with special books on tape. Unfortunately it is far too easy to dismiss him because of his inability to communicate his ideas and this fact as much as any inspired this project. I spent a week with him and his family and learned a lot about them. During this time we, myself with a lot of input from his parents, came up with a good idea of what his abilities and limitations are as they relate to a communication aide. From this knowledge we made the following list of design criteria for the aide we would be building:
We also received information about his previous experience with a communication aide called a Touch Talker™ (see Notes From Existing Communication Aides). With this information we began to design an aide for Michael.
Notes From Existing Communication Aids
It would be unrealistic to assume that what we were doing was entirely new and therefore it would inexcusable not to learn from the existing aides. In the course of research we explored two communication aides, the Touch Talker™ and the Liberator™. The information on the Touch Talker™ comes from Michaels experiences with it. The information on the Liberator™ comes from Denver Childrens Hospital where we were allowed some "hands-on" use and direct observation of actual clients using the aide.
Michael had been using a Touch Talkerä for eight years and we were given some very good information about his experiences with the aide by his family and his speech therapist [BH93]. The Touch Talker™ is a box with recessed button areas, speaker, liquid crystal display (LCD), and other necessary support equipment. The button areas are grouped in an ON/OFF area and a grid area for use in selection (all buttons are recessed). The ON/OFF area consists of two regular sized (½" square) recessed buttons, with no other provision for turning the machine on or off. The grid area is set up for two overlays (stiff plastic sheets controlling the number of buttons and the symbols above those buttons). The first overlay consists of eight large areas (with pictures pasted over them) which are supposed to act like large buttons for Michaels use. The other overlay is a full keyboard for the facilitator to use in programming the aide. The LCD lays flat on the box above the button areas. The speaker faces out the back, and there are additional controls for volume and viewing angle located in convenient places on the box. There were several problems with the system that were pointed out to us by the family:
The machine is extremely difficult to program because it requires words to be put in by their phonemes (the sounds that make up words). This and the voice quality problem have been corrected in the current versions of the Touch Talker™. The LCD provided for feedback to the operator is angled in such a way that a wheelchair confined operator cannot read it (regardless of the viewing angle adjustment). To make matters worse the text scrolls across the screen in a fashion the client, though literate, cannot read due to his disorder. Another problem with the LCD is that it scrolls the phonemes, not words; this problem has been correct in later versions of the aide. The complete text scrolls across the LCD in an uninterruptable fashion, and scrolls across slowly. This means that there is often as much as a five minute delay waiting for the LCD to catch up with what was just said. During this time no other communications can take place. This greatly reduces the aides usefulness in real-time situations such as spontaneous conversation. Note: for Michael to upgrade his Touch Talker™ to the current version, which alleviated the noted problems, would cost $3000.
Another aide we investigated was The Liberator™[TK93]. It consists of 128 keys with the same kind of overlays as Touch Talker™. It has the same problems with dead zones, ON/OFF button sizing, and touch typing that the Touch Talker™ has (see above). However, it has the choice of three quite natural sounding voices, and its output is based on a combinative symbol system (e.g. ME + WALKING Þ I want to go for a walk). It is programmed in a fairly intuitive manner; to add a new phrase you type in the words you would like to be said, spelled normally, and then assign the phrase to a key combination. It has a non-scrolling, fast, multi-line LCD display, and a miniature printer. We could not ascertain the usefulness of the printer. The aide is designed to be interfaced to MAC™ computers, supports many different client physical capabilities, and has a price tag of around $8000.
To build an aide such as we wanted for Michael we needed to focus on the ultimate goal and the abilities of the individual. The ultimate goal was to provide as normal a method of communication as possible for the client. Ideally, that would be a device that would communicate the clients thoughts at speeds upwards of 180 words per minute, the estimated speed at which people are capable of speaking [TC84][GV83]. It was also important that the communication medium be clear human or human-like speech. Further, it would be extremely useful for the individual to be able to communicate any idea, including those not preprogrammed into the machine, to ensure a large-vocabulary[AS91][CV89]. Other considerations are durability, portability, and feedback to the user, information given to us by Kathy Bodine, augmentative specialist Easter Seals of Colorado. Specific client information from Michaels physical therapist, speech therapist, family, and experiences with other aides:
With this data, and remember a limiting factor of cost we were ready to go to the next step, the creation of the system outline.
From these goals and ability specifications a system outline/parts list was created. The primary output would be a digitized human voice (with as large a vocabulary as was feasible), for "Practical implementation of speech technologies depends on digital signal processing"[JF90]. It was not possible to digitize all of the words a client might want to use before hand and there were also storage space considerations when determining the vocabulary. To supplement the incomplete digitized vocabulary a clearly intelligible synthesized voice, with text-to-speech (the ability to "read aloud" the text that has been put into the system), was made available. For the eight buttons the client would use we selected arcade buttons. These are the buttons used in arcade games and vending machines and they are both durable and reliable. For the ON/OFF switch a large heavy duty-toggle switch was selected, as it uses a different type of movement than the buttons and could be easily manipulated by the client. An inexpensive, 7/11 bit, LCD was selected for client feedback. It has a wide viewing angle (60° ) and can be positioned so that the center of the viewing angle is aimed at the clients normal viewing position. For portability, it was decided to make the system so it would fit onto the clients wheel chair lap tray and run off batteries.
With this in mind we needed a computer to bring it all together. The necessary included:
Given this criteria we decided to use an Amiga 600 system because it fulfilled our requirements well. It has an 8 bit parallel interface for the LCD. It has fourteen buffered momentary switch detectors for the buttons to be hooked into (joystick ports). It has built-in sound and text-to-speech software is readily available. The system needs a fairly standard ± 12VDC, +5VDC, and GROUND voltages to run (see Power Supply Section). It has a footprint of 14"x10"x2" and costs just under $200. It is a multiprocessor computer and has very good capabilities for real-time applications (see Hardware Section).
The work of the project fell into two distinct categories, hardware and software. For clarity each of these will be covered in their own section. The hardware section will overview the components of the system and how they were interfaced, and the software section will overview the software construction and detail its use.
After finalizing the computer we would be using and finishing the system outline, we began selecting and interfacing the component parts to it. The first hardware task was interfacing the client input. We hooked the arcade buttons in a four-by-two matrix, the configuration the client was familiar with (see Figure 6 in Software Section). Then, we connected the switches to four of the joystick inputs on the computer in an eight position "direction" simulation, as shown in Figure 2. The switches we used were all from arcade games and the diodes were standard geranium diodes.
Figure 2: Arcade Button Wiring Diagram
The next step was to interface the client feedback device, the LCD. This proved more difficult, and in the process of attempting to interface the LCDs several were destroyed. The LCD we selected was an intelligent 4/8 bit data, 40 character by 2 line display. It needs an additional one to three control data bits depending on the application; for our application it needed one additional data bit giving us a net need of five or nine data bits. We decided to use the 4 bit option because we would be hooking the LCDs to an 8 bit interface. We found the timing control line to correspond to the STROBE line on the 8 bit interface. From this, after some more trial and tribulation, we came up with an easy way to interface these displays to the Amiga 600, shown in Figure 3. The 10k pot on the LCD side adjusts the center of the viewing angle.
Figure 3: LCD to Amiga Parallel Port Wiring Diagram
The next step was to build an audio amplifier. After much experimenting we found that a $12 amp from Radio Shack™ worked as well as those amps with wiring diagrams in The Engineers Mini-Notebook series[FM87]; this was due to an oversight in the schematics which we later solved by putting in a matching transformer going to the speaker. With that modification the sound quality was well above that of the $12 amp, but before implementing this into a system we found another, better, solution. Slight modification of a $20 computer speaker system resulted in very high quality output and this modified system has become the standard for future models. Unfortunately we found these improvements after the initial device was sent to the client, but these changes will be kept in mind for all future models.
With the completion of the speaker system, our core computer system was finished. It has input in the form of buttons, output in the form of sound, and processing facilities suitable for our client. The computer was set up in a convenient, friendly environment for us to write the software in. A simplified diagram of the main parts is shown in Figure 4.
Figure 4: Simplified Diagram of Project Hardware
This configuration gives us many useful characteristics, especially in helping ensure real-time responses. It has four processors that we will be taking advantage of, the 68000 CPU, the 8364 I/O processor, the 8361 A/V processor, and the 8620 port processor. The 8364 processor handles the client button input in a buffered fashion; so the client can continue using the buttons while the system is performing other functions (like speaking). The 8361 processor in very useful because, once activated, it will generate the sound and video without CPU overhead (the machine can be processing information while speaking). The 8620 processor allows file I/O and LCD output to be taking place without CPU overhead; allowing the system to continue processing while files are being loaded and saved, or when LCD output is being displayed. This multi-processor environment gives us good resources for the real-time applications of spontaneous conversation.
Figure 5: Power Supplies
The last challenge was the power supply. We decided that 12VDC would be the most convenient battery source. The commercial applications in camcorders, motorcycles, and cars make this the cheapest power source by far. According to early research, the only absolutely necessary voltages are +12,+5, and Ground. This is reflected in Power Supply #1 as show in Figure 5. Power Supply #1 is a modification of a circuit for use with logic ICs published in Engineers Mini-Notebooks [FM87]. It worked quite well with the exception that there was a considerable amount of noise coming out of the audio channels. We found this to be caused by the need of the additional voltage of -12VDC. There were many variations tried, including the five major power supplies version shown in Figure 5. After we shipped the aide with Power Supply #4 to Michael we found the better alternative of a surplus laptop power supply. This is Power Supply #5 in Figure 5, it performed magnificently and we will use it in all future models. A comparison of the various power supplies shown in Table 1 is provided for an empirical comparison.
Table 1: Power Supply Comparison
The Power Supply Number |
Minimum Efficiency |
Cost |
Time to build/incorporate |
1 |
45% |
$6 |
60 mins |
2 |
40% |
$25 |
75 mins |
3 |
50% |
$12 |
75 mins |
4 |
40% |
$23 |
45 mins |
5 |
80% |
$15 |
30 mins |
For calculating battery life, the complete machine draws 13 watts, or 1.1 amps at 12VDC, which gives the formula in Equation 1.
Equation 1: Formula For Calculating Battery Life
So at 100% efficiency with the 2.4 amp hour camcorder battery, the aide would give approximately 130 minutes of use.
The final step, in the hardware category, was the input provision for speech. The project uses digitized speech samples to guarantee a natural sounding voice. We accomplished this by using an audio digitizer designed to work with the Amiga computer and a lapel microphone. With this set up a voice saying a list of common words only needed to be digitized once, and it was then stored on disk for playback by the software. For more information about digitized speech see Appendix A.
The hardware by itself is clearly not enough for the aide we were building, there also needed to be some driver software to integrate the aide for use by the client. We decided to write that software in an authoring system called CanDo™. It allows programming in high level language and provides full support for hardware interrupts, multitasking, and process communication. This gave us an extremely flexible environment and greatly eased our integration tasks. CanDo™ is preprogrammed with information about the hardware on the Amiga computer and has provisions for both interrupt and buffered handling of exterior devices. It also is able to start and communicate with exterior processes in a multitasking environment.
We decided to make the user interface be a menuing based on eight buttons. The software goal was to maximize capability without sacrificing ease of use. We wanted to make the output of the aide as normal and comfortable (as compared to human speech) as possible. We also wanted a non-computer literate facilitator to be able to easily modify and program this communication aide.
The first step to developing the software for the aide was to break down the selection strategy for efficient use with the eight buttons (see Figure 6). In keeping with the familiar machine, Michaels Touch Talker™, we decided to have the same forty-nine base selections made by pressing two key combinations, to be referred to as Quick Keys; see Table 2 (we used a system of denoting selection based on the button layout to increase ease of use and allow for a consistent methodology). We copied the initial contents of these Quick Keys from the setup of the Touch Talker™ system.
Figure 6 : Speech Device Layout
Table 2: Quick Keys Selection Formation Based on Client Interface
First Button Goes From
Main Menu to:
Submenu 1 | Submenu 2 | Submenu 3 | Infinite Menu |
Submenu 4 | Submenu 5 | Submenu 6 | Submenu 7 |
Second Button Goes From
Submenu N to Selection:
Selection [N,1] | Selection [N,2] | Selection [N,3] | Goto Main Menu |
Selection [N, 4] | Selection [N,5] | Selection [N,6] | Selection [N,7] |
note: to convert Selection [N,M] to a number 1-49
number = (N-1)*7 + M
This means that the first button pressed takes you to a submenu and the second button finalizes a selection which is then spoken. The upper right button was set to default to return to the main menu. The only time that the upper right button does not return the user to the main menu is when the user is on the main menu in which case it takes the user to the Infinite Menu; see next paragraph for detailed information on the Infinite Menu.
The Infinite Menu is provided for conveying communication beyond the limited 49 selections available from the Quick Keys. The infinite menu consists of six infinite list selections (corresponding to 6 lists of letters/words/phrases), return to main menu, and a key which is reserved for the client to edit entries (left for later implementation) as shown in Table 3.
Table 3: Infinite List Selection
Letters | Words | MWords | Goto Main Menu |
Numbers | Phrases | MPhases | reserved |
The six choices are defined by what they build:
Each one of these choices takes the client to a different list, of arbitrary length, for further selection. Having created a new extension to the existing selection method it is necessary to have a new methodology for selecting entries from the list and building them into the desired words/phrases/ideas. The first step we took was to transform the list, which is alphabetically sorted, into a matrix as demonstrated in Table 4. This matrix is N columns and N rows where N is the ceiling function (integer formed by rounding up if there is any fraction) of the square root of the number of entries. The client is placed in the "center" of the list. This is done by setting the current selected entire set to by the entry at (length of list)/2 spot (lady in Table 4). The clients interface is now configured so that four of his buttons navigate him in one of four directions within the matrix. Each time one of these buttons is pressed it "moves" the cursor in the corresponding direction. The buttons are assigned as shown in Table 5.
Table 4: List To Matrix Conversion
Initial List |
Transformed List |
|||
big | big | dad | happy | |
dad | Þ | home | lady | man |
happy | mom | sad | small | |
home | ||||
lady | ||||
man | ||||
mom | ||||
sad | ||||
small |
Table 5: Button Assignment Within Infinite List Mode
right 1 word | up 1 row of words | add current to output | return main menu |
left 1 word | down 1 row of words | say current output | add output to list |
After having navigated to a desired entry the client then presses the "add current" button. This allows him to add items one at a time in a building fashion. When he has finished adding selections to the statement he wishes to be said he presses the "say" button and the communication aide will say whatever it statement the client has built. If he wishes to use this statement again he can then press the "add output" and it will be stored either in the Mwords or Mphrases list.
The next part of the software we considered was the facilitator interface. We began with the main menu, shown in Figure 7. We made this menu fully configurable; the names of the submenus, the number of seconds that a spoken phrase will appear on the LCD screen (0 seconds causes the LCD not to display the spoken phrases), and whether or not a beep will sound will be made after a button is pressed is fully programmable by the facilitator on this menu.
Figure 7: Main Menu Screen
We made the facilitator interface so that activating the button portion of the menu takes the facilitator to the applicable submenu (i.e., pressing the left mouse button within the box above Michael in the MainMenu will take you to the Michael submenu, Figure 8).
Figure 8: Editing Screen From Quick Keys Selection
The submenus corresponding to Quick Keys selections allow the editing (via a special editing screen, see Figure 9), of the response phrases when the appropriate button areas are selected.
Figure 9: Michael Submenu Screen
This enables the facilitator to bring up the editing screen for any of the seven button combinations under a submenu simply by selecting the button area that corresponds to it. To edit the Hello message from the Michael submenu, Figure 8, the facilitator would press the mouse button in the box above Hello and the editing screen would appear as is shown in Figure 9. The editing screen has two windowed areas, one for what will be said and one for what will be displayed on the LCD. This is particularly handy when one is saving disk space by using a single, digitized word for multiple words which have the same pronunciation (i.e., two, to, and too) or when spelling phonetically. Additional boxes provide for non-digitized speech, saving changes, and other convenient functions.
When the client or facilitator selects the Infinite Menu from the main menu they are taken to the screen shown in Figure 10.
Figure 10: Infinite List Selection Screen
They then select the list they wish to work with- Letters, Number, Word, Phrases, Mword, or Mphrase. This puts them on the Infinite Options screen (Figure 11) with the appropriate previously selected list loaded in.
Figure 11: Infinite List Mode Screen
To modify a list entry, the facilitator selects that entry with the mouse and is taken to the list entry editing screen (Figure 12). This editing function here is nearly identical to that of the Quick Keys, except that it has a delete entry button. Note that it is also possible to put anything into any list (e.g. put a phrase into the letters list) in keeping with maximum flexibility. Also on this screen are eight boxes corresponding to the eight buttons, which allow the facilitator to simulate client navigation and use. Figure 12 shows an example in which the facilitator has built the statement "Michael is a good boy".
Figure 12: Editor Screen From Infinite List Mode
We designed a system that has forty-nine Quick Keys phrases and six Infinite Lists to select from, mapped onto our eight arcade buttons. From the main menu, a button selection takes the client to a submenu or to Infinite Menu. On a submenu, a button selection results in a Quick Keys phrase being selected and said. In the case of Infinite Menu, the second button selects the list the client will be working with. These two key mappings are shown in Figure 13.
Figure 13: Client Interface Mapping
The numbered circles correspond to the (1-49) Quick Keys selections and the L1-L6 correspond to the six Infinite Lists. Every entry is fully configurable by a facilitator. The system is simple to use; training time among non-disabled test subjects (from an informal study of 10 people at various demonstrations) was under 5 minutes to learn to use the client interface, and under 15 minutes to learn to use the facilitator interface. We consider this sufficient proof to claim that the system is easy to use (training time of 20 minutes cannot be achieved on a system that is not easy to use). It is fast for the client to use, requiring only 2 buttons or key selections to activate one of the Quick Keys phrases. The delay between selection and vocalization of selection is no greater than a few seconds. We maintained capability by the use of the Infinite Lists, which require a reasonably small number of buttons selections to be made. This facility is capable of saying anything the client wishes.
The LCD continuously displays feedback information. It has two display modes, one for showing the current button definitions and one displaying the phrase that was just spoken. In the Quick Keys section this means breaking the display into 10 areas, eight corresponding to the buttons and two informing the client of the current submenu on which that they are working. In the Infinite Lists, this means dividing the screen into 7 sections (4 on top and 3 on bottom) which show:
When the aide is displaying the current statement it displays the text assigned to the display box of the selected entry. This produces a display layout looking like the following:
Table 6: LCD Display General Layout
Quick Keys Display
Item 1 | Item 2 | item 3 | : MainMenu | | Current |
Item 4 | Item 5 | Item 6 | Item 7 | | menu |
Infinite List Mode Display
Phrase right | Phrase up | current phrase | | cur list |
|||
phrase left | phrase down | currently build up super-phrase |
Displayed Speech Window
The last thing said by the speech device |
Examples taken from the testing of the communications aide include:
Michael Submenu
Want | : Remember | : Mess | : MainMenu | | Michael |
Bathroom | : Warm | : Walk | : Walk | | |
Infinite Words List
birth | : a | : boy | | words |
|||
cold | : happy | Michael is a good boy |
Last Phrase
I have to go to the bathroom. |
The speed of conversation varies greatly, depending on how it is calculated. The factors involved are:
A 1 second delay per button push, to allow the LCD to display the new information, will be used for calculations. This mean that Quick Keys selection will be much faster than Infinite Lists selection because they require fewer button presses. Quick Keys require 2 button presses and therefore a 2 second delay will be used to calculate time for button presses. Infinite Lists require 3 button presses plus as many as square root (number of entries) + 1 ("add" button) per selection. Based on a target list length of 200-225 entries, the aide requires a maximum of 15 button presses per selection and an average of 8. This is an efficient number of button press, but we feel that further research should be able to reduce this number in half. The disk access delay is between 0 seconds (words in memory) and 15 seconds (time to load speech synthesizer), with an average of .4 seconds delay per unique word loaded. The words are spoken at full speed with no additional delay with an average of .5 seconds in length. Any selection can be as many words as desired, but the average phrase length was 7 words. Putting it all together we get 50 words per minute from the Quick Keys, and 25 words per minute for the first selection from the Infinite Lists (phrase selections) and 31 words per minute for each subsequent utterance from the list. The speed of Infinite Lists word selection (based on 7 words) is 4.3 words per minute, and letter selection (based on 5 letter words) is 4.5 words per minute. This represents the capabilities of the system; it doesnt count the time it takes the client to press the buttons (if greater than 1 second). We feel that these results are good, but that there is still room for improvement. By rearranging the selection method and adding a word fill algorithm we should be able to improve the letter selection to about 20 words per minute.
File space was a very big consideration as the whole system had to fit onto a single 880k floppy disk; however by adding $200 to the system cost we can increase the storage of the aide by a factor of 100. With this in mind, we needed some supplementary programs to conserve space.. The first program to save space, was a lossy sound compression program (the output is similar but not exactly the same as the original input). Many techniques were tried, but the two that worked were decreasing the resolution and transforming the sound into a series of exponential differences; both gave a two-to-one compression factor. The resolution of the sound is the number of bits that make up each sound sample (music CDs have a resolution between 16 and 18 bits). The words were initially digitized with 8 bits of resolution and then reduced to 4 bits by ignoring the least significant 4 bits. This produced an acceptable result where the words were clearly recognizable, but there was some noise that did not occur in the original 8 bit sample. This noise took the form of static and was much more profound on poorer amplifiers. The second method is based on the fact that the samples, when put together, make a waveform (see Appendix A). This means that the current value tends to be a "small" offset from the previous value. This method takes the difference between the current sample and the previous one, and applies the log2 function to the result. The result is that small differences are very accurately portrayed and large differences are not. The net effect of exponential differences is slightly better at low sampling rates than decreasing resolution (i.e. the 8,000 samples per second we were using) and it produces nearly perfect results at higher sampling rates (such as the 44,000 samples per second that CDs use). Because of its performance exponential differences was the method used in the aide delivered to the client.
We found other programs within the public domain and shareware fields that were useful. These included Playsound, a flexible program that will play a list of digitized sounds based on a "command line" of Playsound sample1, sample2, ..., sampleN. Cache-Disk, a program that speeds up repetitive drive access by setting aside a space in memory to keep track of previous disk accesses. Nuke, a program that transparently compress and decompress files in a very quick fashion; and finally Turbo Imploder, another compression program that decompresses in a fast, transparent fashion. We would like to thank the authors of those programs for permission to use them, all free of charge, and would like to congratulate them on the overall quality of their programs.
This project started out with a single primary goal, to build an aide that would enable the client to express his thoughts in intelligible speech. There were numerous secondary goals:
In the course of building this device we learned a lot, and managed to come up with an aide that met not only the primary goal of communication, but also all of our secondary goals:
Building the device so that the client could operate it efficiently involved looking at the clients capabilities and trying to take full advantage of them. We had learned that Michael was quite capable of operating 8 large buttons, especially if they were spaced fairly far apart. We learned that his fine-motor control problems extended into the realm of not being able to control pressure being applied. Thus our eight buttons needed to be sensitive to light pressure as well as being able to withstand abuse. For these reasons we decided to use buttons from video games, as they are both sensitive and durable by necessity of their market. We also installed a large, durable, lever arm, on-off switch. This was chosen because it enabled the client to easily turn the machine on and off, and because it used a different movement type (a swing instead of a press) which doesnt detract from his ability to operate the buttons.
To provide feedback to the client we took advantage of the fact that he can read and we hooked up an LCD to the parallel port. We then divided the screen up into several sections, corresponding to the most useful information to the client at that particular point. We angled the screen so that it would be aimed at the client's head when he is sitting in his wheelchair, and we avoided scrolling. We also gave the facilitator the ability to independently select the text to be displayed on the LCD from the words that are programmed to be said.
To be comfortable for the people conversing with the client the project needed to provide natural sounding speech at natural speeds. We accomplished this by a combination of digitized human speech and a good speech synthesizer. The speech must also must occur at natural speeds, which means that there cannot be any significant delays between selection and execution of speech items and that the selection process must be fast for the client. We did this by using a combination of hardware appropriate for a real-time environment and efficient software coding. In an effort to keep maximum flexibility and to ensure minimal key presses we adopted a two phase approach, Quick Keys for the most common needs and Infinite Lists for everything else.
The ease of use was provided by having all of the needed information visible to the client or facilitator when they needed it and by keeping the whole system as simple as possible without sacrificing power. For the client this meant having text describing his current options displayed in a manner corresponding to the buttons. For the facilitators this meant having a friendly window-based environment as show in Figure 7-Figure 12.
Flexibility was achieved by the use of a building function applied to lists of letters/words/phrases. This means the client is able to put items together from the lists to form new items and because one of the lists is all letters, he can build any word(s) or phrase he wants. The client is able to save these built-up words or phrases means that he can personalize the system to his own conversational styling.
The project was kept affordable by using or modifying existing technology to our purposes. We started with a computer that had the sound and the input capabilities we needed at a price we could afford. The computer needed to be portable so we made it so it can run off batteries and has a built-in display and sound. The finished size was 14" wide x 14" long x 1-5" deep with a total weight of 23 pounds. It is constructed out of 3/4" plywood and everything is strapped down to insure durability. When completed the project cost was under $800, including the parts that were destroyed or scrapped because better parts were found. To build another similarly configured system would cost somewhere around $450.
In the process of building this system we consulted with a large number of experts in the field. These included Easter Seals of Colorado, Denver Childrens Hospital, the Speech Pathology Department of the University of Wyoming, the Speech Pathology group from Laramie Public Schools, the Developmental Preschool - Laramie, Wyoming, Educational Resources at Laramie County Community College - Cheyenne, Wyoming, and the parents of three children with severe speech impairment. We showed them our work in progress to get their feedback, which was very favorable and which has caused us to investigate further marketing of the product.
The client has now had the device for 4 months and we are told that he likes it very much and uses it daily. The family has told us that it is truly wonderful and they greatly appreciate that we built it for him. His speech therapist is very impressed with the device and has inquired about securing another such device for one of her other clients. We did get one bit of negative feedback, which concerned glue we used to secure the 2" keycaps on the buttons not holding up for extended periods of time.
We have been reluctant to do any real advertising of the device until we had a chance to review the system we built, but word has started to get out and demand for additional systems has started coming in. We have built a second system for a child, in Laramie, who has fine-motor control. His system uses the full keyboard in a combinative manner. His preschool teachers have given us great feedback and have hinted that there is another child in the school who could use a device similar to the one described in this discussion. Additionally, there are three other requests for systems; two more for the original family (they have two adopted 4 year olds who also cannot speak), and one for a young woman in Oklahoma.
Our systems are useful for literate speech disabled persons with some method of manipulating buttons or sensors. This represent 3%-6% of the non-speaking population [KB93] and represents about 40-80 cases per year at Easter Seals of Colorado. Future research can be conducted in many aspect to improve either the systems performance, or the number of people that it is useful to, including:
There are many factors involved in choosing and using digitized speech samples. To begin we will look at what digitized speech is, and how one gets it. Speech is an instance of sound that society has given meaning to. Sound is a wave distortion through a medium usually consisting of air. This wave may consist of multiple frequencies of sound, as shown in the digitized speech example (see Figure 18). It is possible to look at a pure tone sound (a single frequency of sound) in terms of a sinual wave [DC93][DS86] with a period of 1/frequency. The process of taking a wave and converting it into a sequence of representative numbers can be demonstrated in an easily understood fashion. The first step is to take the wave form and place it within a time and amplitude framework (see Figure 14).
Figure 14: Sine Wave Show Within an Arbitrary Measuring Framework
The next step is to select the sampling rate (e.g. CD's sample at 44000 samples per second)[DS86]. The sampling rate is converted to frequency (1/rate) and used as the interval of time for the X axis. The Y axis is some arbitrary unit of measure, often from -1 to 1. This result is incorporated into Figure 14. We next mark the points on the wave that intersect the time interval marks (see Figure 15).
Figure 15: Sine Wave With Intersection Points Marked
These sample values form an instantaneous mapping of the waveform, which is taken as the set of values over the whole time (1/rate) interval. This gives us a new graph of the waveform that we are considering (see Figure 16).
Figure 16: Graph of Digital Rendition of Original Waveform
We then apply a scale function to convert these real numbers into an integer range. In this case the function applied was New Value = INTEGER ((Original Value+1)*128) (shown in Figure 17).
Figure 17: Final Rendition of Waveform That the Computer Will Use
These numbers are then taken sequentially and stored with the sampling rate into a file for future playback.
Now that we know what digital sampling is we need to select a sampling rate. Clearly the 44000 samples per second of a CD player is sufficient, however, we would like to find a more space efficient solution. Some research led us to the value of 8000 samples per second because it is the rate of the American Telephone Standard[WG89] and a rate which speech pathologists use for help in detection and correction of speech disorders[DC93]. This rate was used and a graphical representation of the word A is shown in Figure 18.
Figure 18: Digitized Graphical Representation of the Word A
[KB93] From interview with Kathy Bodine and associates, of Easter Seals of Colorado on October 25, 1993.
[TK93] From interview with Tracy Kovach and associates, Denver Childrens Hospital on October 27, 1993.
[IF87] Iris Fishman, Electronic Communication Aids Selection and Use (Boston/Toronto/San Diego: Little, Brown and Company , 1987).
[RJ81] Roxanna Mayer Johnson, The Picture Communication, Symbols (Stillwater, Minnesota, Mayer-Johnson Co., 1981).
[TC84] Trace Center, "Engineering for People with Disabilities: Breaking Down Communication Barriers," Waisman Center Interactions, January 1984.
[CH84] Carolyn Hurd, "Computers: Help or Hindrance to the Disabled," Center Lifelines, March 1984.
[ASHA91] American Speech-Language,-Hearing Association, "Augmentative and Alternative Communication: A Report of the Committee on Augmentative Communication," Report to Trace Center Communication Developments conference, 1991.
[AN89] "I can Even Stutter Now!", Keyhole Communiqué, May 1989, p 3.
[GV84] Gregg C. Vanderheiden, "High and Low Technology Approaches in the Development of Communication Systems for Severely Physically Handicapped Persons," Exceptional Educational Quarterly, 4 (4), 1984, pp 40-56.
[GV83] G.C. Vanderheiden, "Non-Conversational Communication Technology Needs of Individuals with Handicaps," Rehabilitation World, Summer 1983, Vol. 7, No. 2, pp 8-11.
[SS87] Sue Simpson, "Do-It-Yourself Communication," Communication Outlook, Volume 8, Number 3, Winter 1987, pp 6-7.
[YN87] Yolanda Nieuwesteeg, "Creating Scanning Arrays," Communication Outlook, Volume 8, Number 3, Winter 1987, p 10.
[NC87] Nicki Conway, "A Lifetime of Communication Methods," Communication Outlook, Volume 8, Number 3, Winter 1987, pp 11-13.
[FM87] Forest M. Mims III, Engineers Mini-Notebook, 8 vols., 1987.
[FM92] Forest M. Mims III, Getting Started in Electronics, 1992.
[DL91] David Lines, Building Power Supplies (Richardson, Texas: Mater Publishing, Inc., 1991),
[GJ93] Conversations with George Janack regarding EE problems.
[BH93] Interviews with Bernhart Family for who this device was built.
[JF90] James L. Flanagan and Charles J. Del Riesgo, "Speech Processing: A Perspective on the Science and its Applications", AT&T Technical Journal, September/October 1990.
[DF90] David R. Flachell, Et. All, "Interactive Voice Technology Applications", AT&T Technical Journal, September/October 1990.
[JW90] Jay G. Wilpon, Et. All, "Speech Recognition: From the Laboratory to the Real World", AT&T Technical Journal, September/October 1990.
[AS91] Anton Stölzle, Et. All, "Integrated Circuits for a Real-Time Large-Vocabulary Continuous Speech Recognition System", IEEE Journal of Solid-State Circuits, Vol. 26, No. 1, January 1991.
[DS86] Daniel Sweeney, Demystifying Compact Discs A Guide to Digitial Audio,( Blue Ridge Summit, Pa 17214, TAB BOOKS Inc. , 1986) pp 56-80.
[WG89] Winston D , Telephone Voice Transmission Standards And Measurements, (Englewood Cliffs, New Jersey, 07632, Prentice Hall, 1989) pp 174-199.
[DC93] David Crystal and Rosemary Varley, Introduction to Language Pathology, (San Diego, CA, Singular Publishing Group, Inc., 1993), pp 131-137.
[CV89] C. Vicenzi, Et. All, "Large Vocabulary Isolated Word Recognition: a Real-Time Implementation.", IEEE Proceedings in Solid-State and Electronic Circuits, Vol. 136, No. 2, April 1989, pp 3.