CoNLL-2010 Shared Task
Learning to detect hedges and their scope in natural language text

News

  • The training data for Wikipedia-based uncertainty detection are also available. (Febr 2)
  • Training data are available. (Febr 1)
  • FAQ page set up. (Jan 20)
  • Registration is open. It is required for downloading training data. (Jan 18)
  • Trial data are available. (Jan 11)

Introduction

In Natural Language Processing (NLP) - in particular, in Information Extraction (IE) - many applications aim at extracting factual information from text. In order to distinguish facts from unreliable or uncertain information, linguistic devices such as hedges (indicating that authors do not or cannot back up their opinions/statements with facts) have to be identified. Applications should handle detected speculative parts in a different manner.

Hedge detection has received considerable interest recently in the biomedical NLP community, including research papers addressing the detection of hedge devices in biomedical texts, and some recent work on detecting the in-sentence scope of hedge cues in text. Exploiting the hedge scope annotated BioScope corpus and publicly available Wikipedia weasel annotations, the goals of the Shared Task are

Task 1: learning to detect sentences containing uncertainty and
Task 2: learning to resolve the in-sentence scope of hedge cues.

The shared task will be part of the CoNLL conference to be held in conjunction with ACL 2010 in Uppsala, Sweden, July 15-16, 2010.

For more information please visit the FAQ site or contact: conll2010st(AT)inf(DOT)u-szeged(DOT)hu.

References

Veronika Vincze, György Szarvas, Richárd Farkas, György Mora, and János Csirik: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.

Viola Ganter and Michael Strube: Finding hedges by chasing weasels: Hedge detection using wikipedia tags and shallow linguistic features. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 173-176, Suntec, Singapore, August 2009. Association for Computational Linguistics.

Roser Morante and Walter Daelemans: Learning the scope of hedge cues in biomedical texts. In Proceedings of the BioNLP 2009 Workshop, pages 28-36, Boulder, Colorado, June 2009. Association for Computational Linguistics.

Dates

The important dates for the shared task are as follows (please note that the dates are tentative for now):

  • January 11, 2010: trial datasets and scorer
  • January 18, 2010: registration for the task opens
  • February 1, 2010: training and development sets available
  • March 28, 2010: test set available
  • April 2, 2010: systems' outputs collected
  • April 18, 2010: deadline for paper submission
  • May 2, 2010: notification of acceptance
  • May 9, 2010: deadline for camera ready paper submission
  • July 15 or 16, 2010: Uppsala

Organisers

The CoNLL-2010 Shared Task is organised by the Human Language Technology Group, University of Szeged.

Organising team:

Richárd Farkas, Human Language Technology Group, University of Szeged

Veronika Vincze, Human Language Technology Group, University of Szeged

György Szarvas, Ubiquitous Knowledge Processing Lab, Technische Universitaat Darmstadt

György Móra, Human Language Technology Group, University of Szeged

János Csirik, Research Group of Artificial Intelligence, Hungarian Academy of Sciences