. . .

About Us
Education
Research
PhD
Acta Cybernetica
Conferences
Sponsors

Departments:
- Image Processing and Computer Graphics
- Technical Informatics
- Foundations of Computer Science
- Computer Algorithms and Artificial Intelligence
- Computational Optimization
- Software Engineering
- Research Group on Artificial Intelligence

[University of Szeged]
Institute of Informatics>>> Acta Cybernetica>>> Past Issues>>> Volume 19, Number 2, 2009>>> flag_HUMagyarul

Statistical Language Models within the Algebra of Weighted Rational Languages

  Thomas Hanneforth and Kay-Michael Würzner


Abstract (in LaTeX format)

  Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a \emph{corpus}. In this paper, we present a novel way of creating $N$-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter \emph{N}. In addition, we discuss efficient implementations of these transductions in terms of \emph{virtual constructions}.

  Kewords: computational linguistics, weighted rational transductions, statistical language modeling, N-gram models, weighted finite-state automata.


Full text

 Available electronic editions: PDF.

 Note that full text is available only for papers that are at least 3 years old. For more recent papers only the first page of the paper is provided.


BibTeX entry

@article{Hanneforth:2009:ActaCybernetica,
author = {Thomas Hanneforth and Kay-Michael W\"{u}rzner},
title = {Statistical Language Models within the Algebra of Weighted Rational Languages},
journal = {Acta Cybernetica},
volume = {19},
number= {2},
pages = {313--356},
year = {2009},
abstract = {Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a \emph{corpus}. In this paper, we present a novel way of creating $N$-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter \emph{N}. In addition, we discuss efficient implementations of these transductions in terms of \emph{virtual constructions}.},
keywords = {computational linguistics, weighted rational transductions, statistical language modeling, N-gram models, weighted finite-state automata}
}

 

Webmaster:webmaster@inf.u-szeged.hu