Protein/Gene Coreference Task

Online submission closed. Thank you very much for your participation!

The Protein Coreference (COREF) task is one of the supporting tasks in the BioNLP Shared Task 2011.

It is one of the lessons from BioNLP-ST'09 that anaphoric expressions set a non-trivial obstacle which prevents further improvement of event extraction. The COREF task addresses the problem of finding anaphoric references to proteins or genes. We expect addressing the task to have a potential to significantly improve the event extraction performance. Below is an example of text involving coreferencing expressions: the spans highlighted in red are anaphoric expressions, whose referents are indicated by arrows.

In the example, the definite noun phrase, "this transcription factor" (T32), means "NF-kappa B p65" (T31) or "p65" (T10).  Knowing the connection should be helpful in finding the event, localization of p65 (out of nucleus),  as expressed in "nuclear exclusion of this transcription factor".

With this task, we concentrate on the goal to find anaphoric expressions to proteins (or genes).  Following the tradition of BioNLP-ST, we begin with protein annotations, i.e. the gold protein annotations will be given, e.g. those that are highlighted in purple in the above example.

Then, the first step would be to find candidate anaphoric expressions that may refer to proteins.  In this task, pronouns, e.g. it or they, and definite noun phrases that may refer to proteins, e.g. the transcription factor or the inhibitor are regarded as candidates of anaphoric protein references.

The next step would be to find antecedents of such anaphoric expressions. The training and test materials of this task include annotations that link candidates of anaphoric protein references and their antecedents if exist in the text.

Note that, sometimes, an anaphoric expression, e.g. "which" (T29), is connected to more than one protein references, e.g. "p65" (T4) and "p50" (T5). Sometimes, coreferencing structures do not involve any specific protein references, e.g.  T30 and T27.

In order to establish a stable evaluation, we only focus on coreferencing structures that involve specific protein references, e.g. T29 and T28, and T32 and T31.

R1    Coref Anaphora:T29 Antecedent:T28    [T4, T5]
R2    Coref Anaphora:T30 Antecedent:T27
R3    Coref Anaphora:T32 Antecedent:T31    [T10]

The coreference relation is represented in predicate-argument structure as above. Among the three, only two, R1 and T3, involves specific protein references, T4 and T5, and T10.  Thus finding of R2 will be ignored in evaluation. However, those not involving specific protein references will be provided in the training data to help system development.

Task Definition

The participants will be given gold annotation for protein references, e.g. purple ones in the above example. The participants then have to find expressions having coreference relation with the protein mentions, e.g. R1 and R3. Note that the boundary of the span T28 or T31 does not need to be precise: if they contain T4 and T5, or T10, it is okay. Correct finding of R1 will be credited with 2 points, while finding of R3 will be given 1 point.


The *.a1 files include annotations for specific protein/gene mentions.  These files will be given to the participants. In other words, the participants will begin this task with gold annotations of proteins/genes. Following is protein/gene annotations corresponding to the above example:

T4    Protein 275 278    p65
T5    Protein 294 297    p50
T6    Protein 367 372    v-rel
T7    Protein 406 409    p65
T8    Protein 597 600    p50
T9    Protein 843 848    MAD-3
T10  Protein 879 882    p65

The *.a2 files include annotations for coreferencing expressions.
T27    Exp 179 222        the NF-kappa B transcription factor complex    215 222 complex
T28    Exp 264 297        NF-kappa B p65 and NF-kappa B p50
T29    Exp 307 312        which
T30    Exp 459 471        this complex    464 471 complex
T31    Exp 868 882        NF-kappa B p65
T32    Exp 1022 1047    this transcription factor    1027 1047       transcription factor
R1      Coref Anaphora:T29 Antecedent:T28    [T5, T4]
R2      Coref Anaphora:T30 Antecedent:T27
R3      Coref Anaphora:T32 Antecedent:T31    [T10]

The expressions that may participate in coreference relations are annotated with Exp labels.  It includes followings:
  1. The anaphoric expressions that may refer to protein/genes ("protein markables";  T27, T29, T30, T32).
    • definite noun phrases, pronouns, ...
    • Note that the expression, "the molecular basis", is a definite noun phrase, but is not annotated as it is unlikely to be a protein reference.)
  2. The protein/gene name including expressions that are antecedents of the anaphoric expressions (T28, T31).
    • These are the target of evaluation in the atom link evaluation mode (see below).
  3. The antecedents of the anaphoric expressions that are not linked to protein/gene name including expressions.
    • These are included in the annotation to support machine learning-based approach
    • These are included in the target of evaluation in the surface link evaluation mode (see below).
The coreference relations are annotated with Coref labels, connecting anaphora-antecedent pairs. The protein/gene IDs appeared in square brackets indicate the specific proteins/genes that are related to the coreference relations.  Participants do not need to produce the protein/gene IDs as it is clear from the corresponding *.a1 files. Note that the boundary of Exp annotation can be arbitrary to some extent. For example, the definite article "the" can be omitted from the expression, T27.  Since the minimal span of T27 is "complex", at least the span needs to be included.

The following annotations are equivalent to the annotations T28 and R1:
T28-1    Exp 264 280        NF-kappa B p65
T28-2    Exp 285 279        NF-kappa B p50
R1      Coref Anaphora:T29 Antecedent:T28-1 Antecent2:T28-2   [T5, T4]


The evaluation is carried out in two steps: evaluation of mention detection, and evaluation of mention linking to produce coreference links.
  • Evaluation of mention detection
According to the task definition, a gene/protein mention in this task can be:
            (Type1) an expression that contains gene/protein name annotations, which are called name containing mentions. Note that not all expressions containing gene/protein name annotations refer to the gene/protein entities.
            (Type2) an apposition of (1)
            (Type3) an anaphoric expression, coreferring with (1), (2), or (3)
All of the mentions are represented by 'Exp' in the '.a2' files of the corpus. Mention detection is the detection of these mentions, which include both anaphors and antecedents. The evaluation is based on standard precision, recall, and F-score, calculated as below:
            P = number of correctly detected mentions/number of detected mentions
            R = number of correctly detected mentions/ number of gold mentions
            F = 2PR/(P + R)
Recall is sometimes called coverage rate of detected mentions. While low coverage can be a bottle neck for the next step of linking mentions, high coverage raises the complexity of the next step, since the number of antecedent candidate increases.
In order to provide different views for the results, we use different criteria to judge whether a detected mention is correct or not. They are:
            (1) Exact match:
               begin(detected mention)= begin(gold mention) & end(detected mention)= end(gold mention)
            (2) Partial match based on minimal and maximal boundaries of gold mentions:
               begin(detected mention)>=begin(maximal boundary) & end(detected mention)<=end(maximal boundary)
               begin(detected mention)<=begin(minimal boundary) & end(detected mention)>=end(minimal boundary)                                                           
  • Evaluation of mention linking
Evaluation of mention linking task is reported using precision (P), recall (R), and F-score(F).
A response coreference link is correct when:
- the antecedent and anaphor mentions of the link are correct, following one of the above criteria for mention detection.
- there is a gold coreference link between the corresponding gold mentions.

We calculate evaluation scores for two perspectives: surface coreference links and atom coreference links. A surface coreference link is  represented by 'Coref' type in the '.a2' files of the corpus. An atom coreference link is a link from an anaphoric expression to a name containing mention (Type3->Type1) or (Type3->(Type2)*->Type1). Atom links are generated from surface links. While a surface coreference link gives us a general view of the problem, a successful atom coreference link helps us to trace from an anaphoric expression to a gene/protein name. Such atom links may contribute to the increase of recall for information extraction system.

For that evaluation purpose, atom coreference links can also be considered equivalent to links between anaphoric expressions and the genes/proteins included in antecedents. A third evaluation perspective called protein coreference links have been added in order to loosen the expression boundary matching criteria. 

The following are examples of the above three evaluation perspectives for evaluating of mention linking.

Example 1:
                R1    Coref Anaphora:T29 Antecedent:T28    [T4, T5]
                R2    Coref Anaphora:T30 Antecedent:T27
                R3    Coref Anaphora:T32 Antecedent:T31    [T10]
        Surface coreference links = {(T29->T28)/1 score, (T30->T27)/1 score, (T32->T31)/1 score}
        Atom coreference links  = {(T29->T28)/2 score, (T32->T31)/1 score} 
        Protein coreference links =
{(T29->T4)/1 score, (T29->T5)/1 score, (T32->T10)/1 score} 

Example 2:

                R1    Coref Anaphora:T29 Antecedent:T28    [T4, T5]
                R2    Coref Anaphora:T30 Antecedent:T27
                R3    Coref Anaphora:T32 Antecedent:T31    [T10]
                R4    Coref Anaphora:T33 Antecedent:T32
        Surface coreference links = {(T29->T28)/1 score, (T30->T27)/1 score, (T32->T31)/1 score, (T33->T32)/1 score}
        Atom coreference links  = {(T29->T28)/2 score, (T32->T31)/1 score, (T33->T31)/1 score}
        Protein coreference links = {(T29->T4)/1 score, (T29->T5)/1 score, (T32->T10)/1 score, (T33->T10)/1 score    

Recall is calculated in two ways:
>> to evaluate coreference resolution algorithm:
- R = total correct links/ total gold links after removing broken links caused by failure of mention detection

>> to evaluation of coreference resolution system as a whole:
- R = total correct links/ total gold links


The coreference annotations for BioNLP-ST'11 were produced based on the GENIA-MedCo coreference corpus, which is a product of collaboration between GENIA project and MedCo Annotation Project.  For BioNLP-ST'11, annotations relevant to proteins or genes were selected, cleaned and augmented.  Two other sets of annotations, GENIA event annotation and GENIA syntactic tree annotation, were referenced for the polishing.

* We declare that the use of GENIA-MedCo coreference corpus in anyway for this coreference task is prohibited.

Task Results

The CO  supporting task is completed. Final submissions were received from six teams, of which the evaluation results are summarized in the following table (protein coreference link perspective):

University of Utah 22.18
University of Zurich 21.48
Concordia University
 19.37 63.22
University of Turku         
 14.44 67.21 23.77
University of Szeged3.17 3.47 3.31
University College Dublin 0.70 0.25 0.37

* The protein coreference link evaluation was chosen as the primary evaluation perspective, as it reflects the task definition faithfully.
* For example, the performance of the 1'st ranked system may be interpreted as follows:
   "22.18% of hidden protein references can be found at the precision of 73.26%."

The primary performance metric is overall F-score, shown in bold in the table above.
Jin-Dong Kim,
Nov 7, 2010, 7:19 PM