CoNLL-2010 Shared Task
Learning to detect hedges and their scope in natural language text

News

  • Programme Committee member list is available. (Apr 20)
  • Cue-level statistics are available (along with an extended scorer which reports these statistics). (Apr 14)
  • Gold-standard annotation for the evaluation datasets is available. (Apr 8)
  • Paper submission site is open. (Apr 7)
  • Results are available. (Apr 7)
  • Submission closed. Thank you for your efforts and the numerous submissions.
  • Evaluation datasets are released and submission is opened. (March 26)
  • A newly revised version of the training dataset is available (segmentation errors concerning uncertainty cues and character coding errors have been corrected in this phase). (March 22)
  • A revision of the training data (mainly token and sentence segmentation bug-fixes) is released. (Febr 25)
  • Scorer and uCompare reader/writer tools are available. (Febr 24)
  • Preprocessed unlabeled data (for sampling) are available. (Febr 17)
  • The training data for Wikipedia-based uncertainty detection are also available. (Febr 2)
  • Training data are available. (Febr 1)
  • FAQ page set up. (Jan 20)
  • Registration is open. It is required for downloading training data. (Jan 18)
  • Trial data are available. (Jan 11)

Introduction

In Natural Language Processing (NLP) - in particular, in Information Extraction (IE) - many applications aim at extracting factual information from text. In order to distinguish facts from unreliable or uncertain information, linguistic devices such as hedges (indicating that authors do not or cannot back up their opinions/statements with facts) have to be identified. Applications should handle detected speculative parts in a different manner.

Hedge detection has received considerable interest recently in the biomedical NLP community, including research papers addressing the detection of hedge devices in biomedical texts, and some recent work on detecting the in-sentence scope of hedge cues in text. Exploiting the hedge scope annotated BioScope corpus and publicly available Wikipedia weasel annotations, the goals of the Shared Task are

Task 1: learning to detect sentences containing uncertainty and
Task 2: learning to resolve the in-sentence scope of hedge cues.

The shared task will be part of the CoNLL conference to be held in conjunction with ACL 2010 in Uppsala, Sweden, July 15-16, 2010.

For more information please visit the FAQ site or contact: conll2010st(AT)inf(DOT)u-szeged(DOT)hu.

References

Veronika Vincze, György Szarvas, Richárd Farkas, György Mora, and János Csirik: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.

Viola Ganter and Michael Strube: Finding hedges by chasing weasels: Hedge detection using wikipedia tags and shallow linguistic features. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 173-176, Suntec, Singapore, August 2009. Association for Computational Linguistics.

Roser Morante and Walter Daelemans: Learning the scope of hedge cues in biomedical texts. In Proceedings of the BioNLP 2009 Workshop, pages 28-36, Boulder, Colorado, June 2009. Association for Computational Linguistics.

Dates

The important dates for the shared task are as follows (please note that the dates are tentative for now):

  • January 11, 2010: trial datasets and scorer
  • January 18, 2010: registration for the task opens
  • February 1, 2010: training and development sets available
  • March 26, 2010: test set available
  • April 2, 2010: systems' outputs collected
  • April 18, 2010: deadline for paper submission
  • May 2, 2010: notification of acceptance
  • May 9, 2010: deadline for camera ready paper submission
  • July 15 or 16, 2010: Uppsala

Organisers

The CoNLL-2010 Shared Task is organised by the Human Language Technology Group, University of Szeged.

Organising team:

Richárd Farkas, Human Language Technology Group, University of Szeged

Veronika Vincze, Human Language Technology Group, University of Szeged

György Szarvas, Ubiquitous Knowledge Processing Lab, Technische Universitaat Darmstadt

György Móra, Human Language Technology Group, University of Szeged

János Csirik, Research Group of Artificial Intelligence, Hungarian Academy of Sciences

Programme Committee

  • Ekaterina Buyko, University of Jena
  • Kevin Cohen, University of Colorado
  • Hercules Dalianis, Stockholm University
  • Maria Georgescul, University of Geneva
  • Filip Ginter, University of Turku
  • Henk Harkema, University of Pittsburgh
  • Shen Jianping, Harbin Institute of Technology
  • Yoshinobu Kano, University of Tokyo
  • Jin-Dong Kim, Database Center for Life Science, Japan
  • Ruy Milidiu, Pontifícia Universidade Católica do Rio de Janeiro
  • Roser Morante, University of Antwerp
  • Lilja Ovrelid, University of Potsdam
  • Arzucan Ozgur, University of Michigan
  • Vinodkumar Prabhakaran, Columbia University
  • Sampo Pyysalo, University of Tokyo
  • Marek Rei, Cambridge University
  • Buzhou Tang, Harbin Institute of Technology
  • Erik Tjong Kim Sang, University of Groningen
  • Katrin Tomanek, University of Jena
  • Erik Velldal, University of Oslo
  • Andreas Vlachos, University of Wisconsin-Madison
  • Xinglong Wang, University of Manchester
  • Torsten Zesch, University of Darmstadt
  • Qi Zhao, Harbin Institute of Technology
  • HuiWei Zhou, Dalian University of Technology