The BioScope corpus consists of medical and biological texts annotated for negation, speculation and their linguistic scope. This was done to allow a comparison between the development of systems for negation/hedge detection and scope resolution. The corpus is publicly available for research purposes.
BioNLP-2008 paper on BioScope (please cite if you make use of the corpus):
Veronika Vincze, György Szarvas, Richárd Farkas, György Móra, and János Csirik: The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts, BMC Bioinformatics 2008, 9(Suppl 11):S9
The annotation guidelines: pdf
Annotation principles are also discussed in the following paper:
Vincze, Veronika 2010: Speculation and negation annotation in natural language texts: what the case of BioScope might (not) reveal. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing (NeSp-NLP 2010), Uppsala, Sweden, pp. 28-31.
The corpus was also employed as the training database of the CoNLL-2010 Shared Task Learning to detect hedges and their scope in natural language text.
The corpus consists of texts taken from 3 different sources in order to ensure that it captures the heterogeneity of language use in the biomedical domain. Here is DTD for the xml files containing the annotations: DTD
Abstracts of the Genia corpus: xml v1.1 (In version 1.1 the Genia UIDs were replaced by PMIDs)
Clinical free-texts: The radiology report corpus that was used for the CMC clinical coding challenge. The negation/hedge annotated version of the corpus can be obtained (due to licencing issues) by downloading the original 'ICD-9-CM coding' corpus from Cincinatti Children's Hospital site and merge it with our annotation: readme, merger software.
The full corpus and the evaluation code in one file: zip
The BioScope corpus was annotated by two independent linguists following the guidelines written by our linguist expert before the annotation of the corpus was initiated. These guidelines were developed throughout the annotation process as annotators were often confronted with problematic issues. The annotators were not allowed to communicate with each other as far as the annotation process was concerned, but they could turn to the expert when needed and regular meetings were also held between the annotators and the linguist expert in order to discuss recurring and/or frequent problematic issues. When the two annotations for one subcorpus were finalized, differences between the two were resolved by the linguist expert, yielding the gold standard labeling of the subcorpus.
We measured the consistency level of the annotation using inter-annotator agreement analysis. The inter-annotator agreement rate is defined as the F-measure of one annotation, treating the second one as the gold standard. The evaluation has two levels: first keyword F-measures are calculated, then left/right/full scope F-measures are gathered around the true positive keyword matches.
In the table below, agreement rates are provided in the following format: the first number in each cell represents the agreement rate between the two annotators, whereas the second and third numbers give the agreement rate between one of the annotators and the chief annotator:
|type||clinical records||abstracts||full articles|
|keyword||90.70 / 94.56 / 95.81||91.46 / 91.71 / 98.05||79.42 / 86.77 / 91.71|
|left scope||86.27 / 86.86 / 97.95||97.78 / 97.90 / 100||83.44 / 82.42 / 95.87|
|right scope||88.88 / 91.26 / 97.39||94.56 / 95.17 / 99.42||84.36 / 88.19 / 95.09|
|full scope||76.29 / 79.32 / 95.35||92.46 / 93.07 / 99.42||70.86 / 73.35 / 91.21|
|keyword||84.01 / 89.86 / 92.37||79.12 / 83.92 / 92.05||77.60 / 81.49 / 90.81|
|left scope||89.36 / 88.90 / 97.60||87.52 / 88.37 / 97.58||75.49 / 80.13 / 92.15|
|right scope||91.28 / 92.64 / 97.90||87.13 / 89.92 / 96.16||82.40 / 83.28 / 96.97|
|full scope||81.90 / 82.88 / 95.54||76.72 / 80.07 / 94.04||62.50 / 66.72 / 89.67|