Par4Sem

Par4Sem

Par4Sem

Dataset

Learning_to_Rank Dataset collected in each iterations are found here

File formats: It is a TAB separated format where:

Example:

3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        appointments    56      64      0
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        blending        56      64      1
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        clashing        56      64      1
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        dealing 56      64      1
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        interacting     56      64      2
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        interruptions   56      64      1
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        involvement     56      64      3
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        meddlesome      56      64      1
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        opponents       56      64      0
3A520CCNWN1XLHTKZR9HJXTKN5BAEO  3       Others oppose Assad and accuse Damascus of heavy-handed meddling in Lebanese politics.  meddling        upheaval        56      64      0

Difficult unit identification dataset: The difficult unit identification dataset that are used to instantiate the classification model are based on the CWI dataset from here.

The training data are in the following format:

<ID> Both China and the Philippines flexed their muscles on Wednesday. 31 51 flexed their muscles 10 10 3 2 1 0.25
<ID> Both China and the Philippines flexed their muscles on Wednesday. 31 37 flexed 10 10 2 6 1 0.4
<ID> Both China and the Philippines flexed their muscles on Wednesday. 44 51 muscles 10 10 0 0 0 0.0

Each line represents a sentence with one complex word annotation and relevant information, each separated by a TAB character.

The labels in the probabilistic classification task were assigned as the number of annotators who marked the word as difficult/the total number of annotators.

Details about the dataset is available in the CWI Shared Task 2018 dataset section.