![](/img/trans.png)
[英]How to save tokenization result for each file in a new separate text file?
[英]How to save each string to a separate file?
我有一個文件夾,其中包含一些要打開和閱讀的文件,分別從其中提取一些波斯語單詞,然后將每組單詞結合在一起組成一個句子。 最后,我想將每個句子保存到單獨的.txt文件中。 但是問題是最后一句話保存在所有文件中。 我該如何解決?
import os
import codecs
###opening the files from a folder in a directory
matches=[]
for root, dirs, files in os.walk("C:\\Users\\Maryam\\Desktop\\New Folder"):
for file in files:
if file.endswith(".pts"):
matches.append(os.path.join(root, file))
print(matches)
print(len(matches))
###reading files
for f in matches:
with codecs.open(f, "r", "utf-8") as fp:
text=fp.read().split('\n')
#print(text)
#print (len(text))
###converts one string to strings
for line in text:
line_list=line.split()
#print (line_list)
###extracting the persian words and removing the parantheses
list_persian_letters=['ا','آ', 'ب','پ','ت','ث','ج','چ','ح','خ','د','ذ','ر','ز','ژ','س','ش','ص','ض','ط','ظ','ع','غ','ف','ق','ک','گ','ل','م','ن','و','ه','ی','.','؟','،',':','!']
output_words = [word for word in line_list if (word[0] in list_persian_letters)]
output=[s.replace(')', '') for s in output_words]
#print (output)
###joining the words as as sentence
sentence=' '.join(output)
###saving each sentence in a separate file
for i in range(1,16):
with codecs.open ("F:\\New folder\\output%i.txt" %i, "w","utf-8") as text_file:
text_file.writelines(sentence)
在每次循環迭代中,所有文件都將被覆蓋。 因此,您只能看到最后一次迭代的結果。
將外部循環更改為:
for i, f in enumerate(matches):
和
for j, line in enumerate(text):
並擺脫1..16循環:
for i in range(1,16):
並修改:
with codecs.open ("F:\\New folder\\output%i_%i.txt" % (i,j), "w","utf-8") as text_file:
我希望您可以更改代碼以獲取所需的內容。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.