MAGYAR
 

Research

Multiword expressions Uncertainty and negation detection Morphological and syntactic parsing Corpus building Ontologies

Multiword expressions

Multiword expressions are lexical units that consist of two or more words (tokens), however, they exhibit special syntactic, semantic, pragmatic or statistical features. From an NLP point of view, their treatment is not free of problems since - on the one hand - the system should recognize that they count as one lexical unit (and not two or more words connected) therefore it is advisable to store them as one unit in the lexicon. On the other hand, special rules for their treatment should also be included in the system.

Related publications

Uncertainty and negation detection

In information extraction and retrieval it is of high importance to distinguish uncertain and/or negated propositions from factual information. In most cases, what the user needs is factual information, thus, uncertain or negated propositions should be treated in a special way. Depending on the exact task, the system should either neglect such texts or separate them from factual information (later, the user can decide whether s/he needs them).

Related publications

Morphological and syntactic parsing

For higher-level language technology research and development in Hungarian it is essential to have a basic language resource kit that is used for segmenting, morphological and syntactic parsing and POS tagging of texts. In order to unify the available tools, we harmonized the MSD and KR coding systems, and integrated the morphological parser based on this new coding system into our toolchain called magyarlanc, which is going to be extended with the dependency parser currently under development.

Related publications

Corpus building

In order to develop algorithms for NLP problems, there is immerse need for domain- or task- specific annotated corpora (databases). Thus, building corpora is an essential part of creating NLP applications.

Some corpora the construction of which I participated in:

Related publications

Ontologies

Ontologies are typically large hierarchical datasets in wich words and their relations are stored. Ontologies may efficiently contribute to the performance of several NLP applications, for instance, in information extraction and retrieval hypernymy and hyponymy relations can be usefully exploited.

Some ontologies the building of which I took part in:

  • Hungarian WordNet
  • Hungarian financial domain ontology
  • Hungarian customs law wordnet (TaXWN)

Related publications

  • Vincze, Veronika; Almási, Attila 2014: Non-Lexicalized Concepts in Wordnets: A Case Study of English and Hungarian. In: Proceedings of the 7th International Global WordNet Conference, pp. 118-126.
  • Vincze, Veronika; Almási, Attila; Csirik, János 2012: Multiword Verbs in WordNets. In: Proceedings of the 6th International Global WordNet Conference, pp. 377-381.
  • Alexin, Zoltán; Csirik, János; Almási, Attila; Vincze, Veronika 2010: Domain Specific Wordnet on Customs Law. In: Proceedings of the Fifth Global WordNet Conference, GWC2010, January 31-February 4 2010, Mumbai, India, pp. 234-239.
  • Vincze, Veronika; Almási, Attila; Szauter, Dóra 2008: Comparing WordNet Relations to Lexical Functions. In: Tanács, Attila; Csendes, Dóra; Vincze, Veronika; Fellbaum, Christiane; Vossen, Piek (eds.): Proceedings of the Fourth Global WordNet Conference. GWC 2008. Szeged, University of Szeged, Department of Informatics, pp. 462-473.
  • Vincze, Veronika; Szarvas, György; Csirik, János 2008: Why are wordnets important? In: Cepisca, Costin; Kouzaev, Guennadi A.; Mastorakis, Nikos M. (eds.): New Aspects on Computing Research. Proceedings of the 2nd European Computing Conference (ECC'08), WSEAS Press, pp. 316-322.