Activate Activate Activate
contact  
Hello. Sign in to personalize your visit. New user? Register now.  

In
By author
Computational Linguistics

Quarterly (March, June, September, December)
160 pp. per issue
6 3/4 x 10
Founded: 1974
ISSN 0891-2017
E-ISSN 1530-9312
2008 ISI Impact Factor: 2.656

Computational Linguistics

December 2006, Vol. 32, No. 4, Pages 485-525
Posted Online November 21, 2006.
(doi:10.1162/coli.2006.32.4.485)
© 2006 Massachusetts Institute of Technology
Unsupervised Multilingual Sentence Boundary Detection

Tibor Kiss*Jan Strunk**

* Sprachwissenschaftliches Institut, Ruhr-Universität Bochum, 44780 Bochum, Germany. E-mail: .

** Sprachwissenschaftliches Institut, Ruhr-Universität Bochum, 44780 Bochum, Germany. E-mail: .

PDF (330.972 KB) PDF Plus (343.605 KB)

In this article, we present a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. Instead of relying on orthographic clues, the proposed system is able to detect abbreviations with high accuracy using three criteria that only require information about the candidate type itself and are independent of context: Abbreviations can be defined as a very tight collocation consisting of a truncated word and a final period, abbreviations are usually short, and abbreviations sometimes contain internal periods. We also show the potential of collocational evidence for two other important subtasks of sentence boundary disambiguation, namely, the detection of initials and ordinal numbers. The proposed system has been tested extensively on eleven different languages and on different text genres. It achieves good results without any further amendments or language-specific resources. We evaluate its performance against three different baselines and compare it to other systems for sentence boundary detection proposed in the literature.

Technology Partner - Atypon Systems, Inc.
  CrossRef member COUNTER member