简体   繁体   中英

How to use Stanford CoreNLP java implementation for coreference resolution

I am trying to understand output of corenlp-coreference resolution system.

Here is an example input & output pair I obtained by rule-based system:

Input sentence:

His maternal great-grandfather was Henry Percy, 4th Earl of Northumberland, whose wife was Maud Herbert, Countess of Northumberland. His maternal grandmother was a daughter of Sir Robert Spencer and Eleanor Beaufort. Eleanor was a daughter of Edmund Beaufort, 2nd Duke of Somerset and Eleanor Beauchamp. She was a granddaughter of Richard de Beauchamp, 13th Earl of Warwick and Elizabeth Berkeley.

Command I use to get the output:

./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt -outputFormat json

First of all, I don't understand what do keys mean ? What do these numbers represents ? Is it written somewhere ? I was only able to find information about the xml output format here .

> json_output['corefs'].keys()

dict_keys(['1', '2', '3', '4', '6', '7', '9', '10', '11', '12', '15', '16', '17', '18', '19', '20', '22', '23', '24', '25', '26', '29', '30', '31'])

Secondly, are all of values in the dictionary above represent a different cluster found in the input ? In other words, can I say that there are len(json_output['corefs'].keys()) clusters found in the input ?

EDIT

If you want to see the output, I share it below.

Output (I set outputType to json and below I share only the 'corefs' key of the full output):

> json_output['corefs']

{'1': [{'id': 1, 'text': 'Henry Percy', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 5, 'endIndex': 7, 'headIndex': 6, 'sentNum': 1, 'position': [1, 4], 'isRepresentativeMention': True}], '2': [{'id': 2, 'text': '4th', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'UNKNOWN', 'animacy': 'UNKNOWN', 'startIndex': 8, 'endIndex': 9, 'headIndex': 8, 'sentNum': 1, 'position': [1, 5], 'isRepresentativeMention': True}], '3': [{'id': 3, 'text': 'Northumberland', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'NEUTRAL', 'animacy': 'INANIMATE', 'startIndex': 11, 'endIndex': 12, 'headIndex': 11, 'sentNum': 1, 'position': [1, 6], 'isRepresentativeMention': True}, {'id': 5, 'text': 'Northumberland', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'NEUTRAL', 'animacy': 'INANIMATE', 'startIndex': 21, 'endIndex': 22, 'headIndex': 21, 'sentNum': 1, 'position': [1, 10], 'isRepresentativeMention': False}], '4': [{'id': 4, 'text': 'Maud Herbert', 'ty pe': 'PROPER', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 16, 'endIndex': 18, 'headIndex': 17, 'sentNum': 1, 'position': [1, 9], 'isRepresentativeMention': True}], '6': [{'id': 6, 'text': 'His maternal great-grandfather', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 4, 'headIndex': 3, 'sentNum': 1, 'position': [1, 1], 'isRepresentativeMention': False}, {'id': 8, 'text': 'Henry Percy , 4th Earl of Northumberland , whose wife was Maud Herbert , Countess of Northumberland', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 5, 'endIndex': 22, 'headIndex': 9, 'sentNum': 1, 'position': [1, 3], 'isRepresentativeMention': True}, {'id': 13, 'text': 'His', 'type': 'PRONOMINAL', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 2, 'headIndex': 1, 'sentNum': 2, 'position': [2, 2], 'isRepresentativeMention': False}], '7': [{'id': 7, 'text': 'His', 'type': 'PRONOMINAL', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 2, 'headIndex': 1, 'sentNum': 1, 'position': [1, 2], 'isRepresentativeMention': True}], '9': [{'id': 9, 'text': 'Northumberland , whose wife was Maud Herbert , Countess of Northumberland', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'NEUTRAL', 'animacy': 'INANIMATE', 'startIndex': 11, 'endIndex': 22, 'headIndex': 11, 'sentNum': 1, 'position': [1, 7], 'isRepresentativeMention': True}], '10': [{'id': 10, 'text': 'Maud Herbert , Countess of Northumberland', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 16, 'endIndex': 22, 'headIndex': 19, 'sentNum': 1, 'position': [1, 8], 'isRepresentativeMention': True}], '11': [{'id': 11, 'text': 'Robert Spencer', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 9, 'endIndex': 11, 'headIndex': 10, 'sentNum': 2, 'position': [2, 6], 'isRepresentativeMention': True}], '12': [{'id': 12, 'text': 'His maternal grandmother', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 4, 'headIndex': 3, 'sentNum': 2, 'position': [2, 1], 'isRepresentativeMention': True}, {'id': 14, 'text': 'a daughter of Sir Robert Spencer and Eleanor Beaufort', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 5, 'endIndex': 14, 'headIndex': 6, 'sentNum': 2, 'position': [2, 3], 'isRepresentativeMention': False}], '15': [{'id': 15, 'text': 'Sir Robert Spencer and Eleanor Beaufort', 'type': 'LIST', 'number': 'PLURAL', 'gender': 'UNKNOWN', 'animacy': 'ANIMATE', 'startIndex': 8, 'endIndex': 14, 'headIndex': 13, 'sentNum': 2, 'position': [2, 4], 'isRepresentativeMention': True}], '16': [{'id': 16, 'text': 'Sir', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'INANIMATE', 'startIndex': 8, 'endIndex': 9, 'headIndex': 8, 'sentNum': 2, 'position': [2, 5], 'isRepresentativeMention': True}], '17': [{'id': 17, 'text': 'Eleanor', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 2, 'headIndex': 1, 'sentNum': 3, 'position': [3, 1], 'isRepresentativeMention': True}, {'id': 21, 'text': 'a daughter of Edmund Beaufort , 2nd Duke of Somerset and Eleanor Beauchamp', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 3, 'endIndex': 16, 'headIndex': 4, 'sentNum': 3, 'position': [3, 2], 'isRepresentativeMention': False}, {'id': 27, 'text': 'She', 'type': 'PRONOMINAL', 'number': 'SINGULAR', 'gender': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 2, 'headIndex': 1, 'sentNum': 4, 'position': [4, 1], 'isRepresentativeMention': False}, {'id': 28, 'text': 'a granddaughter of Richard de Beauchamp , 13th Earl of Warwick and Elizabeth Berkeley', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gen der': 'FEMALE', 'animacy': 'ANIMATE', 'startIndex': 3, 'endIndex': 17, 'headIndex': 4, 'sentNum': 4, 'position': [4, 2], 'isRepresentativeMention': False}], '18': [{'id': 18, 'text': 'Edmund Beaufort', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 6, 'endIndex': 8, 'headIndex': 7, 'sentNum': 3, 'position': [3, 4], 'isRepresentativeMention': True}], '19': [{'id': 19, 'text': '2nd', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'UNKNOWN', 'animacy': 'UNKNOWN', 'startIndex': 9, 'endIndex': 10, 'headIndex': 9, 'sentNum': 3, 'position': [3, 5], 'isRepresentativeMention': True}], '20': [{'id': 20, 'text': 'Somerset', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'NEUTRAL', 'animacy': 'INANIMATE', 'startIndex': 12, 'endIndex': 13, 'headIndex': 12, 'sentNum': 3, 'position': [3, 7], 'isRepresentativeMention': True}], '22': [{'id': 22, 'text': 'Edmund Beaufort , 2nd Duke of Somerset and Eleanor Beauchamp', 'type': 'PROPER', 'number': 'SINGU LAR', 'gender': 'NEUTRAL', 'animacy': 'ANIMATE', 'startIndex': 6, 'endIndex': 16, 'headIndex': 10, 'sentNum': 3, 'position': [3, 3], 'isRepresentativeMention': True}], '23': [{'id': 23, 'text': 'Somerset and Eleanor Beauchamp', 'type': 'LIST', 'number': 'PLURAL', 'gender': 'UNKNOWN', 'animacy': 'ANIMATE', 'startIndex': 12, 'endIndex': 16, 'headIndex': 15, 'sentNum': 3, 'position': [3, 6], 'isRepresentativeMention': True}], '24': [{'id': 24, 'text': 'Richard de Beauchamp', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 6, 'endIndex': 9, 'headIndex': 8, 'sentNum': 4, 'position': [4, 3], 'isRepresentativeMention': True}], '25': [{'id': 25, 'text': '13th', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'UNKNOWN', 'animacy': 'UNKNOWN', 'startIndex': 10, 'endIndex': 11, 'headIndex': 10, 'sentNum': 4, 'position': [4, 6], 'isRepresentativeMention': True}], '26': [{'id': 26, 'text': 'Warwick', 'type': 'PROPER', 'number': 'UNKNOWN', 'gender': 'UN KNOWN', 'animacy': 'INANIMATE', 'startIndex': 13, 'endIndex': 14, 'headIndex': 13, 'sentNum': 4, 'position': [4, 8], 'isRepresentativeMention': True}], '29': [{'id': 29, 'text': 'Richard de Beauchamp , 13th Earl of Warwick and Elizabeth Berkeley', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 6, 'endIndex': 17, 'headIndex': 8, 'sentNum': 4, 'position': [4, 4], 'isRepresentativeMention': True}], '30': [{'id': 30, 'text': '13th Earl of Warwick and Elizabeth Berkeley', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 10, 'endIndex': 17, 'headIndex': 11, 'sentNum': 4, 'position': [4, 5], 'isRepresentativeMention': True}], '31': [{'id': 31, 'text': 'Warwick and Elizabeth Berkeley', 'type': 'LIST', 'number': 'PLURAL', 'gender': 'UNKNOWN', 'animacy': 'ANIMATE', 'startIndex': 13, 'endIndex': 17, 'headIndex': 16, 'sentNum': 4, 'position': [4, 7], 'isRepresentativeMention': True}]}

The lists represent mention clusters. Each entry is a distinct mention. I would not expect even current state of the art coreference systems to perform well on your example. I would suggest running on a simpler example like "Joe Smith ate his lunch." which should hopefully show a link between the two mentions.

EDIT: I just ran this example and got this JSON (showing a link between "Joe Smith" and "his"):

{'1': [{'id': 1, 'text': 'Joe Smith', 'type': 'PROPER', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 1, 'endIndex': 3, 'headIndex': 2, 'sentNum': 1, 'position': [1, 1], 'isRepresentativeMention': True}, {'id': 3, 'text': 'his', 'type': 'PRONOMINAL', 'number': 'SINGULAR', 'gender': 'MALE', 'animacy': 'ANIMATE', 'startIndex': 4, 'endIndex': 5, 'headIndex': 4, 'sentNum': 1, 'position': [1, 3], 'isRepresentativeMention': False}], '2': [{'id': 2, 'text': 'his lunch', 'type': 'NOMINAL', 'number': 'SINGULAR', 'gender': 'UNKNOWN', 'animacy': 'INANIMATE', 'startIndex': 4, 'endIndex': 6, 'headIndex': 5, 'sentNum': 1, 'position': [1, 2], 'isRepresentativeMention': True}]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM