[英]splitting lines in python with sentences and labels
I have a sample of a file with sentences and labels.
我有一个带有句子和标签的文件样本。 How can it be split into sentences and labels?
如何将其分为句子和标签?
A very, very, very slow-moving, aimless movie about a distressed, drifting young man. 一部非常,非常,非常缓慢,漫无目的的电影,讲述了一个心疼,漂泊的年轻人。 0
0
Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out. 不知道谁更迷路-扁平人物或观众,其中近一半人走了出去。 0
0
Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. 尝试用黑白和巧妙的相机角度进行巧妙处理,这部电影令人失望-变得更加荒谬-由于表演不佳,情节和线条几乎不存在。 0
0
Very little music or anything to speak of. 几乎没有音乐或任何可谈论的东西。 0
0
output 产量
list of sentences: 句子列表:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man','Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out'] [“一部非常,非常,非常缓慢,漫无目的的电影,讲述一个心疼,漂泊的年轻人”,“不确定谁更迷失了–扁平人物或观众,其中近一半人走了出来”]
corresponding labels: 对应的标签:
['0','0'] [ '0', '0']
Assuming that the number after the last "."(dot) is the Label 假设最后一个“。”(点)之后的数字是Label
For the given example when stored in a file 'yourdata.txt' the following code should produce 2 lists sentence_list
and label_list
. 对于存储在文件中时给定的例子“yourdata.txt”下面的代码应该产生2所列出
sentence_list
和label_list
。 You can write the data in these lists to files separately then as requested by you. 您可以根据需要将这些列表中的数据分别写入文件。
fmov=open('yourdata.txt','r')
sentence_list=[]
label_list=[]
for f in fmov.readlines():
lineinfo=f.split('.')
sentenceline=".".join(lineinfo[0:-1])
sentence_list.append(sentenceline)
label_list.append(str(lineinfo[-1]).replace('\n',''))
print(sentence_list)
print(label_list)
OUT:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']
[' 0', ' 0', ' 0', ' 0']
Is '0' the label? 标签为“ 0”吗? If it's only one sentence, you can do a
string.split('.')
using a period as a delimiter. 如果只有一个句子,则可以使用句点作为分隔符来执行
string.split('.')
。 Though this might catch some errors if you have a sentence with something like 'Mr.' 如果您的句子带有“先生”之类的字词,这可能会引起一些错误。 or 'Mrs.'
或“太太” so you might need to add some if statements to handle those.
因此您可能需要添加一些if语句来处理这些。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.