The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (there are no wrappers for other programming languages), which guarantees its platform independency and its ability to be integrated into bigger systems (e.g. web servers).
The modules of the toolkit are:
The UIMA (Unstructured Information Management Application) framework aims at supporting the development of software architectures that want to process a huge amount of unstructured data. Apache UIMA is an open source implementation of the UIMA specification, which is especially tailored to the processing of textual documents.
The UIMA framework is platform independent and it prefers to apply standard solutions to the greatest extent possible. Its main goal is to achieve that each processing module can be easily integrated into parsing chains ("just download and use") and to make it easy for the user to select the most appropriate component (components fulfilling the same role are interchangeable).
The framework makes it possible to divide a complex problem into several smaller subproblems such as: sentence splitting, tokenization, named entity recognition. Each processing unit implements a specific interface (in Java or C++), the framework supervises the construction of the processing chain and its running, besides, it is also responsible for the data flow between units and for measuring the performance of the system etc.
The toolkit can be used free of charge under the licence Creative Commons Attribution Share Alike.
Zsibrita, János; Nagy, István; Farkas, Richárd 2009: Magyar nyelvi elemző modulok az UIMA keretrendszerhez. In: Tanács Attila, Szauter Dóra, Vincze Veronika (eds.): VI. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi Tudományegyetem, pp. 394-395.
Zsibrita, János; Vincze, Veronika; Farkas, Richárd 2010: Ismeretlen kifejezések és a szófaji egyértelműsítés. In: Tanács Attila, Vincze Veronika (eds.): VII. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi Tudományegyetem, pp. 275-283.
For further information please contact Richárd Farkas (rfarkas AT inf.u-szeged.hu).