This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier’s computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
* This work was sponsored in part by the Advanced Research Projects Agency and in part by the Department of the Air Force. The views expressed are those of the author and do not reflect the official policy or position of the U.S. government.
This paper describes a broad range of opportunities for military and government applications of human-machine communication by voice and discusses issues to be addressed in bringing the technology into real applications. The paper draws on many visits and contacts by the author with personnel at a variety of current and potential user organizations in the United States. The paper focuses on opportunities and on what is needed to develop real applications, because, despite the many opportunities that were identified and the high user interest, the military and government organizations contacted were generally not using human-machine communication by voice in operational systems (exceptions included an application in air traffic controller training and voice entry of zip codes by the U.S. Postal Service). Furthermore, the visits and discussions clearly identified a number of applications that today’s state-of-the-art technology could support, as well as other applications that require major research advances.
Background for this paper is provided by a number of previous assessments of military applications of speech technology (Beek et al., 1977; Cupples and Beek, 1990; Flanagan et al., 1984; Makhoul et al., 1989; Proceedings of the NATO AGARD Lecture Series, 1990; Woodard and Cupples, 1983; Weinstein, 1991), including prior National Research Council studies (Flanagan et al., 1984; Makhoul et al, 1989) and studies conducted in association with the NATO RSG10 Speech Research Study Group (Beek et al., 1977; Cupples and Beek, 1990; Proceedings of the NATO AGARD Lecture Series, 1990; Weinstein, 1991). Those prior studies provide reviews of the state of the art at the time, and each outlines a number of programs in which prototype speech recognition systems were tested in application environments, including fighter aircraft, helicopters, and ship-based command centers. These efforts, as described in the references but not detailed further here, generally yielded promising technical results but have not yet been followed by operational applications. This paper focuses on users and applications in the United States, but the general trends and conclusions could apply elsewhere as well.
This paper is organized to combine reports on the military and government visits and contacts with descriptions of target applications most closely related to each organization. However, it is important to note that many of the applications pertain to a number of user organizations, as well as having dual use in the civilian and commercial areas. (Other papers in this volume describe applications of speech technology in general consumer products, telecom-
munications, and aids for people with physical and sensory disabilities.) A summary relating the classes of applications to the interests of the various military and government users is provided near the end of the paper. The paper concludes with an outline of a strategy for technology transfer to bring the technology into real applications.
TECHNOLOGY TRENDS AND NEEDS
A thorough discussion of technology trends and needs would be beyond the scope of this paper; hence, the focus here is on description of the applications. But the underlying premise is that both the performance of algorithms and the capability to implement them in real time, in off-the-shelf or compact hardware, has advanced greatly beyond what was tested in prior prototype applications. The papers and demonstrations at a recent DARPA (Defense Advanced Research Projects Agency) Speech and Natural Language Workshops (1992) provide a good representation of the state of current technology for human-machine communication by voice. Updated overviews of the state of the art in speech recognition technology are presented elsewhere in this volume.
With respect to technological needs, military applications often place higher demand on robustness to acoustic noise and user stress than do civilian applications (Weinstein, 1991). But military applications can often be carried out in constrained task domains, where, for example, the vocabulary and grammar for speech recognition can be limited.
SUMMARY OF VISITS AND CONTACTS
The broad range of military and government organizations that were contacted is shown in Figure 1. There was broad-based interest in speech recognition technology across all these organizations. The range of interests was also deep, in the sense that most organizations were interested in applications over a range of technical difficulties, including some applications that today’s state-of-the-art technology could support and others that would require major research advances. Also, many organizations had tested speech recognition systems in prototype applications but had not integrated them into operational systems. This was generally due to a perception that ”the technology wasn’t ready yet.” But major speech recognition tests, such as the Air Force’s F-16 fighter tests (Howard, 1987) and the Army’s helicopter tests (Holden, 1989) were conducted a number of years ago. In general, tests such as these have not been performed with systems
that approach today’s state-of-the-art recognition technology (Weinstein, 1991).
The Army visits and contacts (see Figure 1) pointed out many applications of human-machine communication by voice, of which three will be highlighted here: (1) Command and Control on the Move (C20TM); (2) the Soldier’s Computer; and (3) voice control of radios and other auxiliary systems in Army helicopters. In fact, the applications for voice-actuated user interfaces are recognized by the Army to pervade its engineering development programs (E. Mettala, DARPA, unpublished presentation, Feb. 1992).
In Desert Storm the allied troops moved farther and faster, than troops in any other war in history, and extraordinary efforts were needed to make command and control resources keep pace with the troops. C20TM is an Army program aimed at ensuring the mobility of command and control for potential future needs. Figure 2 illustrates some of the mobile force elements requiring C20TM and some of the potential applications for speech-based systems. Typing is often a very poor input medium for mobile users, whose eyes and hands are busy with pressing tasks. Referring to Figure 2, a foot
soldier acting as a forward observer could use speech recognition to enter a stylized report that would be transmitted to command and control headquarters over a very low-rate, jam-resistant channel. Repair and maintenance in the field can be facilitated by voice access to repair information and helmet-mounted displays to show the information. In a mobile command and control vehicle, commanders need convenient access to battlefield information and convenient means for entering and updating plans. Integrated multimodal input/output (voice, text, pen, pointing, graphics) will facilitate meeting these requirements. Other applications suggested in Figure 2 include simple voice translation (e.g., of forward observer reports), access to battlefield situation information, and weapons system selection.
The Soldier’s Computer is an Army Communications and Electronics Command (CECOM) program that responds to the information needs of the modern soldier. The overall system concept is shown in Figure 3. Voice will be a crucial input mode, since carrying and using a keyboard would be very inconvenient for the foot sol-
dier. Functions of the Soldier’s Computer are similar to those mentioned above for C20TM. Technical issues include robust speech recognition in noise and smooth integration of the various input/ output modes. The technology for both the Soldier’s Computer and C20TM has many dual-use, peacetime applications, both for everyday use and in crises such as fires or earthquakes.
Speech recognition for control of radios and other devices in Army helicopters is an application that has been addressed in test and evaluation programs by the Army Avionic Research and Development Activity (AVRADA) organization, as well as by groups in the United Kingdom and France. Feasibility has been demonstrated, but operational use has not been established. The Army AVRADA people I met described a tragic helicopter collision in which the fact that both pilots were tuning radios may have been the major cause of the crash. Although voice control was considered to be a viable solution, it was not established as a requirement (and therefore not implemented) because of the Army’s view that speaker-independent recognition was necessary and was not yet sufficiently robust. But the state of
the art of speaker-independent recognition, particularly for small vocabularies, has advanced a great deal and is now likely to be capable of meeting the needs for control of radios and similar equipment in a military helicopter.
My Navy visits and contacts uncovered a wide range of important applications of speech technology, with support at very high levels in the Navy. Applications outlined here will be (1) aircraft carrier flight deck control and information management, (2) SONAR supervisor command and control, and (3) combat team tactical training.
The goal in the carrier flight deck control application is to provide speech recognition for updates to aircraft launch, recovery weapon status, and maintenance information. At the request of Vice-Admiral Jerry O. Tuttle (Assistant Chief of Operations for Space and Electronic Warfare), the Naval Oceans Systems (NOSC)1 undertook to develop a demonstration system on board the USS Ranger. Recognition requirements included open microphone; robust, noise-resistant recognition with out-of-vocabulary word rejections; and easy integration into the PC-based onboard system. An extremely successful laboratory demonstration, using a commercially available recognizer, was performed at NOSC for Admiral Tuttle in November 1991. Subsequent tests on board the Ranger in February 1992 identified a number of problems and needed enhancements in the overall human-machine interface systems, but correction of these problems seemed to be well within the current state of the art.
The SONAR supervisor on board a surface ship needs to control displays, direct resources, and send messages while moving about the command center and looking at command and control displays. This situation creates an opportunity for application of human-machine communication by voice, and the Naval Underwater Systems Center (NUSC) has sponsored development of a system demonstrating voice activation of command and control displays at a land-based integrated test site in New London, Connecticut. The system would be used first for training of SONAR supervisors at the test site and later for shipboard applications. Initial tests with ex-supervisors from
1The Naval Oceans Systems Center has subsequently reorganized as the Naval Research and Development Organization.
SONAR were promising, but the supervisors expressed dissatisfaction at having to train the speaker-dependent recognizer that was used.
A scenario for application of speech-and-language-based technology to Navy combat team tactical training, based on a proposal by the Navy Personnel Research and Development Center, is illustrated in Figure 4. The training scenario includes a mix of real forces and force elements simulated by using advanced simulation technology. Personnel in the Combat Information Center (either at sea or in a land-based test environment) must respond to developing combat scenarios using voice, typing, trackballs, and other modes and must communicate both with machines and with each other. As suggested in the figure, speech-based and language-based technology, and fusion of language with multiple data sources, can be used to correlate and analyze the data from a combat training exercise, to allow rapid feedback (e.g., what went wrong?) for debriefing, training, and replanning. These language-based technologies, first developed and applied in training applications where risk is not a major issue, can
later be extended to operational applications, including detection of problems and alerting users, and also to development of improved human-machine interfaces in the Combat Information Center.
The approach of first developing and using a system with human-machine communication by voice in a training application and then extending to an operational application is a very important general theme. The training application is both useful in itself and provides essential data (including, for example, language models and speech data characterizing the human-machine interaction) for developing a successful operational application.
AIR FORCE APPLICATIONS
The Air Force continues its long-term interest in speech input/ output for the cockpit and has proposed to include human-machine communication by voice in the future Multi-Role Fighter. Fighter cockpit applications, ranging from voice control of radio frequency settings to an intelligent Pilot’s Associate system, have been discussed elsewhere (Weinstein, 1991; Howard, 1987) and will not be detailed further here. However, it is likely that the kinds of applications that were tested in the AFTI F-16 Program, with promising results but not complete success, would be much more successful with today’s robust speech recognition technology. Voice control of radio frequencies, displays, and gauges could have significant effect on mission effectiveness and safety. A somewhat more advanced but technically feasible application is use of voice recognition in entering reconnaissance reports. Such a system is currently under development at the Defense Research Agency in the United Kingdom (Russell et al., 1990). Other potential Air Force applications include human-machine voice communication in airborne command posts, similar to Army and Navy command and control applications. In particular, entry of data and log information by voice could potentially provide significant workload reduction in a large variety of command and control center operations.
AIR TRAFFIC CONTROL APPLICATIONS
The air traffic controller is taught to use constrained phraseology to communicate with pilots. This provides an opportunity, which is currently being exploited at the Federal Aviation Adminstration (FAA) Academy in Oklahoma City, at a Naval Air Technical Training Center in Orlando, Florida, and elsewhere, to apply speech recognition and synthesis to emulate pseudo-pilots in the training of air traffic
controllers. This application, illustrated in Figure 5, is an excellent example of military and government application of human-machine communication by voice that is currently in regular use. Advances in speech and language technology will extend the range and effectiveness of these training applications (Weinstein, 1991). As in the Naval Combat Team Tactical training application, speech recognition technology and data fusion could be used to automate training session analysis and to provide rapid feedback to trainees.
A number of automation aid applications in air traffic control are also possible via speech technology, as indicated in Figure 5. Again, the experience can be used to help build operational automation applications. An application of high current interest is on-line recognition of flight identification information from a controller’s speech to quickly access information on that flight (Austin et al., 1992). More advanced potential applications include processing and fusion of multimodal data to evaluate the effectiveness of new automation aids for air traffic control and gisting (Rohlicek et al., 1992) of pilot/controller communications to detect potential air space conflicts.
LAW ENFORCEMENT APPLICATIONS
Discussions with Federal Bureau of Investigation (FBI) personnel revealed numerous potential applications of speech and language technology in criminal investigations and law enforcement. For example, the Agent’s Computer is envisioned as a portable device, with some similarity to the Soldier’s Computer but specialized to the agent’s needs. Functions of particular interest to agents include (1) voice check-in, (2) data or report entry, (3) rapid access to license plate or description-based data, (4) covert communication, (5) rapid access to map and direction information, and (6) simple translation of words or phrases. Fast access to and fusion of multimedia data, some language based and some image based (e.g., fingerprints and photos), together were a major need for aid in investigations. Voice-controlled database access could be used to facilitate this data access. As with the Navy and FAA training applications mentioned above, the FBI had high interest in training using simulation in combination with language-based technology for both mission execution and mission diagnosis. Criminal investigations put a major burden on agents in terms of reporting and documentation; the use of human-machine communication by voice to rapidly prepare reports ranging from structured forms to free text, was identified as an application of major interest to agents.
SUMMARY OF USERS AND APPLICATIONS
The matrix shown in Figure 6 relates the classes of applications that have been described to the interests of the various military and government users. All the applications have dual use in the civilian area. Looking across the rows, it is evident that all the users have interest in a wide range of applications with varying technical difficulty. In fact, upon showing this matrix to potential users, each user generally wanted to fill in all the boxes in his row. The most pervasive near-term application is voice data entry, which can range from entering numerical data to creating formatted military messages, to free-form report entry. The current speech recognition technology is capable of performing these functions usefully in a number of military environments, including particularly to provide operator workload reduction in command and control centers.
A key conclusion of this study is that there is now a great opportunity for military and government applications of human-machine
communication by voice, which will have real impact both on the users and on the development of the technology. This opportunity is due both to technical advances and to very high user interest; there has been a big increase in user interest just within the past few years (i.e., since the study reported in Weinstein, 1991).
The strategy of the technologists should be to select and push applications with a range of technical challenges, so that meaningful results can be shown soon, while researchers continue to advance the technology to address the harder problems. It is essential that technologists work with the users to narrow the gap between the user and the state of the art. Too often, users have tested speech recognition systems that are off the shelf but well behind the state of the art, and have been discouraged by the results.
With today’s software-based recognition technology, and with the increased computing power in PCs, workstations, and digital signal-processing chips, it is now possible to develop and test applications with recognition algorithms that run in real time, in software, on commercially available general-purpose processors and that perform very close to the state of the art. Technologists must work with users
to understand the user requirements and provide appropriate technology. For effective technology transfer, software and hardware must be portable and adaptable to new domains or to unforeseen variations in the user’s needs. Eventually the user should be able to take over and continue adapting the technology to the changing needs, with little support from the technologists. Meanwhile, the technologists, having learned from each generation of operational applications, can be working to develop the research advances that will enable the next generation of operational applications.
I would like to acknowledge the contributions to this study of the following individuals: Victor Zue (MIT), Allen Sears (MITRE), Janet Baker (Dragon Systems), Charles Wayne (DARPA), Erik Mettala (DARPA), George Doddington (DARPA), Deborah Dahl (Paramax), David Ruppe (Army), Jim Schoening (Army), Christine Dean (Navy), Steve Nunn (Navy), Walter Rudolph (Navy), Jim Cupples (Air Force), Tim Anderson (Air Force), Dave Williamson (Air Force), Joe Kielman (FBI), John Hoyt (FBI), and Peter Sielman (Analysis and Technology, Inc.). A special acknowledgment goes to Victor Zue for many helpful discussions and contributions.
Austin, S., et al. 1992. BBN Real-Time Speech Recognition Demonstrations. In Proceeding of the February 1992 DARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 250-251.
Beek, B., E. P. Neuburg, and D. C. Hodge. 1977. An Assessment of the Technology of Automatic Speech Recognition for Military Applications. IEEE Trans. Acoust., Speech, Signal Process., ASSP-25:310-321.
Cupples, E. J., and B. Beek. 1990. Applications of Audio/Speech Recognition for Military Applications. In Proceedings of the NATO/AGARD Lecture Series No. 170, Speech Analysis and Synthesis and Man-Machine Speech Communications for Air Operations, pp. 8-1-8-10.
Flanagan, J. L., et al. 1984. Automatic Speech Recognition in Severe Environments. National Research Council, Committee on Computerized Speech Recognition Technologies. Washington, D.C.: National Academy Press.
Holden, J. M. 1989. Testing Voice in Helicopter Cockpits. In Proceedings of the American Voice Input/Output Society (AVIOS) Conference, Sept.
Howard, J. A. 1987. Flight Testing of the AFTI/F-16 Voice Interactive Avionics System. In Proceedings of the Military Speech Tech 1987. Arlington, Va.: Media Dimensions, pp. 76-82.
Makhoul, J., T. H. Crystal, D. M. Green, D. Hogan, R. J. McAulay, D. B. Pisoni, R. D. Sorkin, and T. G. Stockham, Jr. 1989. Removal of Noise from Noise-Degraded
Speech Signals. National Research Council, Committee on Hearing, Bioacoustics, and Biomechanics. Washington, D.C.: National Academy Press.
Proceedings of the February 1992 DARPA Speech and Natural Language Workshop. 1992. Morgan Kaufmann Publishers.
Proceedings of the NATO AGARD Lecture Series No. 170, Speech Analysis and Synthesis and Man-Machine Speech Communications for Air Operations, 1990.
Rohlicek, J. R., et al. 1992. Gisting Conversational Speech. In Proceedings of the ICASSP’92. San Francisco, pp. 11-113-11-116.
Russell, M. J., et al. 1990. The ARM Continuous Speech Recognition System. In Proceedings of the ICASSP’90. Albuquerque, N.Mex., April.
Weinstein, C. J. 1991. Opportunities for Advanced Speech Processing in Military Computer-Based Systems. Proc. IEEE, 79(11):1626-1641.
Woodard, J. P., and E. J. Cupples. 1983. Selected Military Applications of Automatic Speech Recognition. IEEE Commun. Mag., 21(9):35-44.