The OASIS Speech Recognition System for Dictating Medical Reports

András Kocsor*, András Bánhalmi*, Dénes Paczolay*, János Csirik*, László Pávics+

* Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged
  +University of Szeged, Medical Faculty
Department of Nuclear Medicine

Owing to the remarkable development of speech recognition technologies recently the attention of the medical society has been drawn to the speech based development of clinical systems. The appreciation of such a system depends on the speed of recognition, the quality of the applied medical linguistic knowledge, the recognition accuracy and also the achieved speaker independency. The more specific the speech recognizer’s linguistic environment is, the better the achieved results are. The developers of speech technologies at the present state of research found medical applications especially suitable in their search for proper target-applications. For commonly used languages such systems still exist, but for smaller languages, with specific linguistic features, few software packages for dictating medical reports have been developed up to now. We developed a general speech recognition core module for the Hungarian language use (OASIS) for dictating medical reports for nuclear medicine and radiology. The core module is made up of a so called acoustic model which is capable of recognizing the phoneme set of the Hungarian language and representatively modeling it. The building of the module was carried out based on a huge speech corpus. Currently, we have built a language module applicable for dictating medical reports to thyroid scintigraphy on the bases of 9231 written thyroid medical reports. The reports contain over 2500 words and 11000 different word pairs. We modeled the sentences of the reports with a rule system consisting of several hundred rules. The core and the language module together gave a ground for the development of an easy to use Windows based software package for dictating thyroid gland medical reports. The general speech technology features of the software are: speaker independent speech recognition, speaker adaptation option, commutable linguistic/grammatical model, language adaptation, memorizing individual phrases. The adaptation capability and the recognition performance of the system are tested by using a speech test corpus consisting of five persons’ utterances. First each person uttered a one page text which served for adapting the acoustic model. Following the adaptation phase the increase in the recognition performance is tested, with the result of over 90% correct recognition.