a

CoNLL-2010 Shared Task: Learning to detect hedges and their scope in natural language text

In Natural Language Processing (NLP) - in particular, in Information Extraction (IE) - many applications aim at extracting factual information from text. In order to distinguish facts from unreliable or uncertain information, linguistic devices such as hedges (indicating that authors do not or cannot back up their opinions/statements with facts) have to be identified. Applications should handle detected speculative parts in a different manner.

Hedge detection has received considerable interest recently in the biomedical NLP community, including research papers addressing the detection of hedge devices in biomedical texts, and some recent work on detecting the in-sentence scope of hedge cues in text. Exploiting the hedge scope annotated BioScope corpus and publicly available Wikipedia texts, the goals of the Shared Task are

Task 1: learning to detect hedge cues in natural language texts and
Taks 2: learning to resolve the in-sentence scope of hedge cues.

The shared task will be part of the CoNLL conference to be held in conjunction with ACL 2010 in Uppsala, Sweden, July 15-16, 2010.

References

Veronika Vincze, György Szarvas, Richárd Farkas, György Mora, and János Csirik: The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(Suppl 11):S9, 2008.

Viola Ganter and Michael Strube: Finding hedges by chasing weasels: Hedge detection using wikipedia tags and shallow linguistic features. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 173-176, Suntec, Singapore, August 2009. Association for Computational Linguistics.

Roser Morante and Walter Daelemans: Learning the scope of hedge cues in biomedical texts. In Proceedings of the BioNLP 2009 Workshop, pages 28-36, Boulder, Colorado, June 2009. Association for Computational Linguistics.

Dates

The important dates for the shared task are as follows (please note that this is a tentative timeline):

  • January 15, 2010: trial datasets and scorer
  • February 1, 2010: registration for the task opens
  • February 15, 2010: training and development sets available
  • April 15, 2010: test set available
  • April 20, 2010: systems' outputs collected
  • May 3, 2010: deadline for paper submission
  • May 24, 2010: notification of acceptance
  • Jun 1, 2010: deadline for camera ready paper submission
  • July 15 or 16, 2010: Uppsala

Organisers

Richárd Farkas, Human Language Technology Group, University of Szeged (contact: rfarkas AT inf.u-szeged.hu)

Veronika Vincze, Human Language Technology Group, University of Szeged

György Szarvas, Ubiquitous Knowledge Processing Lab, Technische Universitaat Darmstadt

György Móra, Human Language Technology Group, University of Szeged

János Csirik, Research Group of Artificial Intelligence, Hungarian Academy of Sciences