The OASIS Speech Recognition System for
Dictating Medical Reports
András
Kocsor*, András Bánhalmi*, Dénes Paczolay*,
János Csirik*, László Pávics+
* Research
Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of
Szeged
+University
of Szeged, Medical Faculty
Department of Nuclear Medicine
Owing
to the remarkable development of speech recognition technologies recently
the attention of the medical society has been drawn to the speech
based development of clinical systems. The appreciation of such a
system depends on the speed of recognition, the quality of the applied
medical linguistic knowledge, the recognition accuracy and also the
achieved speaker independency. The more specific the speech recognizer’s
linguistic environment is, the better the achieved results are. The
developers of speech technologies at the present state of research
found medical applications especially suitable in their search for
proper target-applications. For commonly used languages such systems
still exist, but for smaller languages, with specific linguistic features,
few software packages for dictating medical reports have been developed
up to now. We developed a general speech recognition
core module for the Hungarian language use (OASIS) for dictating medical
reports for nuclear medicine and radiology. The core module is made
up of a so called acoustic model which is capable of recognizing the
phoneme set of the Hungarian language and representatively modeling
it. The building of the module was carried out based on a huge speech
corpus. Currently, we have built a language module applicable
for dictating medical reports to thyroid scintigraphy on the bases
of 9231 written thyroid medical reports. The reports contain over
2500 words and 11000 different word pairs. We modeled the sentences
of the reports with a rule system consisting of several hundred rules.
The core and the language module together gave a ground for the development
of an easy to use Windows based software package for dictating thyroid
gland medical reports. The
general speech technology features of the software are: speaker independent
speech recognition, speaker adaptation option, commutable linguistic/grammatical
model, language adaptation, memorizing individual phrases. The
adaptation capability and the recognition performance of the system
are tested by using a speech test corpus consisting of five persons’
utterances. First each person uttered a one page text which served
for adapting the acoustic model. Following the adaptation phase the
increase in the recognition performance is tested, with the result
of over 90% correct recognition.