简体   繁体   English

将NLTK标记器输出保存到CSV文件

[英]Save NLTK tagger output to a CSV file

I'm trying to analyze a text to find all the 'NN' and 'nnp', so far the code works well, but when I save the output to a CSV file I haven't been able to get the format I want. 我正在尝试分析文本以查找所有的'NN'和'nnp',到目前为止,代码运行良好,但是当我将输出保存到CSV文件时,我无法获得所需的格式。 which is have the - Word, Tag, Question Analyzed- 具有-单词,标签,问题分析-

this is the code: 这是代码:

training_set = []

text = 'I want to analized this text'
tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]

for i in result:
    training_set.append(i)
    training_set.append([text])
    print(training_set)

listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in training_set:
    writer2.writerow(item)

The outcome is the following: 结果如下:

在此处输入图片说明

Any idea how can I keep all the information within the same line. 知道如何将所有信息保持在同一行中。 like this: 像这样:

在此处输入图片说明

I have change the code and using two lists and then use Zip to add both to the CSV file, this seems to work however, all close in "" and () 我更改了代码并使用了两个列表,然后使用Zip将它们都添加到CSV文件中,但这似乎可行,但是都在“”和()中关闭

training_set = []
question = []


        text = 'I want to analyzed this text'
        tokenized = nltk.word_tokenize(text)
        tagged = nltk.pos_tag(tokenized)
        result= [(word, tag) for word, tag in tagged if tag in ('NN', 'NNP')]
        for i in result:
            training_set.append(i)
            question.append([text])

listFile2 = open('sample.csv', 'w', newline='')
writer2 = csv.writer(listFile2,quoting=csv.QUOTE_ALL, lineterminator='\n', delimiter=',')
for item in zip(training_set, question):
    writer2.writerow(item)

Result: 结果:

在此处输入图片说明

You can try something like this to get your data in the desired format, before writing it to csv: 在将数据写入csv之前,可以尝试执行以下操作以所需的格式获取数据:

[tag + (text,) for tag in result]

OUTPUT: OUTPUT:

[('text', 'NN', 'I want to analyze this text')]

It will essentially give you a list of tuples in the format you need, which you can then write to your csv. 本质上,它将以所需的格式为您提供元组列表,然后您可以将其写入到csv中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM