简体   繁体   English

用句子和标签在python中分割线

[英]splitting lines in python with sentences and labels

I have a sample of a file with sentences and labels. 我有一个带有句子和标签的文件样本。 How can it be split into sentences and labels? 如何将其分为句子和标签?

A very, very, very slow-moving, aimless movie about a distressed, drifting young man. 一部非常,非常,非常缓慢,漫无目的的电影,讲述了一个心疼,漂泊的年轻人。 0 0

Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out. 不知道谁更迷路-扁平人物或观众,其中近一半人走了出去。 0 0

Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. 尝试用黑白和巧妙的相机角度进行巧妙处理,这部电影令人失望-变得更加荒谬-由于表演不佳,情节和线条几乎不存在。 0 0

Very little music or anything to speak of. 几乎没有音乐或任何可谈论的东西。 0 0

output 产量
list of sentences: 句子列表:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man','Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out'] [“一部非常,非常,非常缓慢,漫无目的的电影,讲述一个心疼,漂泊的年轻人”,“不确定谁更迷失了–扁平人物或观众,其中近一半人走了出来”]

corresponding labels: 对应的标签:
['0','0'] [ '0', '0']

Assuming that the number after the last "."(dot) is the Label 假设最后一个“。”(点)之后的数字是Label

For the given example when stored in a file 'yourdata.txt' the following code should produce 2 lists sentence_list and label_list . 对于存储在文件中时给定的例子“yourdata.txt”下面的代码应该产生2所列出sentence_listlabel_list You can write the data in these lists to files separately then as requested by you. 您可以根据需要将这些列表中的数据分别写入文件。

fmov=open('yourdata.txt','r')
sentence_list=[]
label_list=[]
for f in fmov.readlines():
    lineinfo=f.split('.')
    sentenceline=".".join(lineinfo[0:-1])
    sentence_list.append(sentenceline)
    label_list.append(str(lineinfo[-1]).replace('\n',''))
print(sentence_list)
print(label_list) 

OUT:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']
[' 0', ' 0', ' 0', ' 0']

Is '0' the label? 标签为“ 0”吗? If it's only one sentence, you can do a string.split('.') using a period as a delimiter. 如果只有一个句子,则可以使用句点作为分隔符来执行string.split('.') Though this might catch some errors if you have a sentence with something like 'Mr.' 如果您的句子带有“先生”之类的字词,这可能会引起一些错误。 or 'Mrs.' 或“太太” so you might need to add some if statements to handle those. 因此您可能需要添加一些if语句来处理这些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM