Epigenetics and Post-translational Modifications Task (EPI)

The Epigenetics and Post-translational Modifications (EPI) task is a main task in the BioNLP 2011 Shared Task on Event Extraction.

This task focuses on events relating to epigenetic change, including DNA methylation and histone modification, as well as other common post-translational protein modifications. The core task follows the task definition for Phosphorylation event extraction in the BioNLP'09 shared task, which the full task extends with two additional arguments. The training and test data are drawn without subdomain restrictions from relevant PubMed abstracts, with additional training data derived from evidence sentences of relevant databases such as PIR and PubMeth.

(Visualization generated using the BioNLP ST visualization tool)

The Epigenetics and Post-translational Modifications Task does not define explicit subtasks. However, it specifies minimal "core" extraction targets in addition to the "full" task targets. Results for this task will be reported separately for "core" targets and the "full" task and participants can choose to only extract "core" targets. Full task results are considered the primary evaluation for the task.

The EPI task is completed. The task received final submissions from seven teams. Thank you for your participation!

The final results are summarized below.

There is also a tool for the visualization and comparison of detailed results.

Primary EPI task evaluation results

Results for FULL task, main test data, primary evaluation criteria

------------------------------------------------------------------------------------

Event Class gold (match) answer (match) recall prec. fscore

------------------------------------------------------------------------------------

UTurku ====[TOTAL]==== 1378 ( 726) 1343 ( 725) 52.69 53.98 53.33

FAUST ====[TOTAL]==== 1378 ( 398) 892 ( 397) 28.88 44.51 35.03

MSR-NLP ====[TOTAL]==== 1378 ( 383) 857 ( 383) 27.79 44.69 34.27

UMass ====[TOTAL]==== 1378 ( 387) 929 ( 386) 28.08 41.55 33.52

Stanford ====[TOTAL]==== 1378 ( 366) 967 ( 366) 26.56 37.85 31.22

CCP-BTMG ====[TOTAL]==== 1378 ( 323) 849 ( 322) 23.44 37.93 28.97

ConcordU ====[TOTAL]==== 1378 ( 287) 681 ( 287) 20.83 42.14 27.88

These are the primary evaluation results for the EPI task.

We request that participants include these results when reporting on the performance and ranking of their system in the task. (You are free to report any other results also, but these results should be included and identified as primary.

Additional EPI task evaluation results

Results for CORE task, main test data, primary evaluation criteria

------------------------------------------------------------------------------------

Event Class gold (match) answer (match) recall prec. fscore

------------------------------------------------------------------------------------

UTurku ====[TOTAL]==== 1194 ( 818) 1182 ( 818) 68.51 69.20 68.86

FAUST ====[TOTAL]==== 1194 ( 715) 891 ( 715) 59.88 80.25 68.59

MSR-NLP ====[TOTAL]==== 1194 ( 665) 857 ( 665) 55.70 77.60 64.85

UMass ====[TOTAL]==== 1194 ( 681) 929 ( 681) 57.04 73.30 64.15

Stanford ====[TOTAL]==== 1194 ( 679) 967 ( 679) 56.87 70.22 62.84

ConcordU ====[TOTAL]==== 1194 ( 481) 627 ( 481) 40.28 76.71 52.83

CCP-BTMG ====[TOTAL]==== 1194 ( 538) 849 ( 538) 45.06 63.37 52.67

These results are provided to give additional perspective into the performance of systems. Please see above for the primary results.

Task Definition

Entities

The core entities of the task are genes and gene products (RNA and proteins), identified in the data simply as "Protein" annotations. The gold standard annotations (correct, human-created annotations) for these entities are provided to participants both for training and test data. Named entity recognition is thus not necessary to address the core task.

Additional arguments in the full task identify the modification sites of proteins and the side chains attached in glycosylation events (sugars). These are identified in the data generically as "Entity" annotations. These additional entities are only provided for training data, and systems addressing the full task will need to incorporate detection for these entities.

Events

The core of the event extraction task targets six types of protein post-translational modification (PTM) events, DNA methylation events, and their reverse reactions (e.g. dephosphorylation for phosphorylation), that is, 14 event types. Additionally, catalysis of these modification events by proteins is captured by a separate event type.

All PTM events and DNA methylation events require a Theme argument identifying the modified Protein. Catalysis events require a Theme argument identifying the catalysed event (either a PTM event or DNA methylation) and a Cause argument identifying the catalyst (Protein). The core task consists of the identification of these events with their Themes and Causes.

In the full task, all PTM and DNA methylation events take an additional Site argument identifying the site of modification (Entity). The full task also defines two event-specific arguments: Glycosylation events take an additional Sidechain argument identifying the sugar attached in the event (Entity) and Methylation and Acetylation events take an additional Contextgene argument identifying the gene that is affected by the PTM event. While Themes are mandatory (i.e. all events must identify a Theme), Site, Sidechain and Contextgene arguments should be omitted when the corresponding information is not stated in text.

The following table summarized the events targeted in the task and their arguments (for example, Phosphorylation takes one core Theme argument, a Protein, and optionally one additional Site argument, an Entity). The reverse of the shown event types are also defined with identical arguments (e.g. Deacetylation for Acetylation) for all types except catalysis.

(Note regarding terminology: the entity and event naming follows the BioNLP'09 Shared Task naming scheme. Here, both genes and their products are simply if somewhat inaccurately marked as "Protein" entities. Additionally, event types such as "Methylation" should be understood to abbreviate for "protein methylation" and thus not include DNA methylation.)

Event Modifications

The extraction targets extend to all statements concerning the targeted events, even if these are discussed merely as a possibility or their occurrence explicitly denied. In the full task, any event may be further specified as being stated speculatively (e.g. "may acetylate") or in a negative context (e.g. "is not acetylated"). These event modifications are specified simply by their type (Speculation or Negation) and the modified event.

Format

The format of the files provided for the task follows the general BioNLP Shared Task 2011 main task file format. In brief, this is a standoff format where the original text of each document is provided in one file (.txt) and the annotations in two files, one containing the given Protein entities (.a1) and the other the event annotations (.a2).

Example annotations

A small sample of annotations for the task was released on 23.8.2010 (see attachment below). Please note that this data is not intended to serve for training or testing systems: the full training and development test data are available from the download page.