简体   繁体   English

将 pandas dataframe 转换为 CoNLL

[英]Converting pandas dataframe to CoNLL

I have a processed dataframe which is used as a input to train a NLP model:我有一个处理过的 dataframe 用作训练 NLP model 的输入:

 sentence_id    words   labels
0   0            a      B-ORG
1   0            b      I-ORG
2   0            c      I-ORG
5   1            d      B-ORG
6   1            e      I-ORG
7   2            f      B-PER
8   2            g      I-PER

I need to convert this into ConLL text format as below:我需要将其转换为 ConLL 文本格式,如下所示:

a B-ORG
b I-ORG
c I-ORG

d B-ORG
e I-ORG

f B-PER
g I-PER

The CoNLL format is a text file with one word per line with sentences separated by an empty line. CoNLL 格式是一个文本文件,每行一个单词,句子用空行分隔。 The first word in a line should be the word and the last word should be the label.一行中的第一个单词应该是单词,最后一个单词应该是 label。

Anyone have any idea how to do that?有人知道该怎么做吗?

First join both columns by space anf then in DataFrame.groupby add last empty value with write to file:首先通过空格连接两列 anf 然后在DataFrame.groupby中添加最后一个空值并写入文件:

df['join'] = df['words'] + ' ' + df['labels']
#alternative
#df['join'] = df['words'].str.cat(df['labels'], sep=' ')
for i, g in df.groupby('sentence_id')['join']:
    out = g.append(pd.Series({'new':np.nan}))
    out.to_csv('file.txt', index=False, header=None, mode='a')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM