简体   繁体   English

为父母和祖父母解析CoNLL-U

[英]Parsing CoNLL-U for parents and grandparents

I'm new to NLP add have a task to create a table: 我是NLP的新手,添加了一个创建表的任务:

grandparent:parent:child

for example, for text in CoNLL-U format: 例如,对于CoNLL-U格式的文本:

1 В в ADP _ _ 3 case 3:case _
2 советский советский ADJ _ Animacy=Inan|Case=Acc|Degree=Pos|Gender=Masc|Number=Sing 3 amod 3:amod _
3 период период NOUN _ Animacy=Inan|Case=Acc|Gender=Masc|Number=Sing 11 obl 11:obl _
4 времени время NOUN _ Animacy=Inan|Case=Gen|Gender=Neut|Number=Sing 3 nmod 3:nmod _
5 число число NOUN _ Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing 11 obj 11:obj _
6 ИТ ит PROPN _ Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing 8 compound 8:compound SpaceAfter=No
7 - - PUNCT _ _ 6 punct 6:punct _
8 специалистов специалист NOUN _ Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur 5 nmod 5:nmod _
9 в в ADP _ _ 10 case 10:case _
10 Армении армения PROPN _ Animacy=Inan|Case=Loc|Gender=Fem|Number=Sing 5 nmod 5:nmod _
11 составляло составлять VERB _ Aspect=Imp|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root 0:root _
12 около около ADP _ _ 14 case 14:case _
13 десяти десять NUM _ Case=Gen 14 nummod 14:nummod _
14 тысяч тысяча NOUN _ Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur 11 nsubj 11:nsubj SpaceAfter=No
15 . . PUNCT _ _ 14 punct 14:punct _

The output should be 输出应为

0;11;5
0;11;14
11;5;8
11;5;10
11;14;12
11;14;13
11;14;15
5;8;6
8;6;7

Is there some way or algorithm to automatically parse such texts? 有什么方法或算法可以自动解析此类文本?

from conllu import parse

with open('conllu_text', 'r') as txt:
    data = txt.read()

children = []
parents = []
grandparents = []
for i in range(len(parse(data)[0])):
    children.append(i+1)
    parents.append(parse(data)[0][i]['head'])
    grandparents.append(parse(data)[0][parse(data)[0][i]['head'] -1]['head'])

result = list(zip(grandparents, parents, children))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM