[英]Remove some words replace some other words from a txt file
I have a txt file (myText.txt) containing many lines of text. 我有一个包含许多行文本的txt文件(myText.txt)。
I would like to know : 我想知道 :
For instance if myText.txt is: 例如,如果myText.txt是:
The ancient Romans influenced countries and civilizations in the following centuries.
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.
You could always use a regex: 您可以随时使用正则表达式:
import re
st='''\
The ancient Romans influenced countries and civilizations in the following centuries.
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''
deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}
tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
tgt=r'\b{}\b'.format(word)
st=re.sub(tgt,repl[word],st)
print st
This should do the trick. 这应该可以解决问题。 You use a list to store the objects you want to delete, and then loop through the list and remove every element in the list from the contents string.
使用列表存储要删除的对象,然后遍历列表并从内容字符串中删除列表中的每个元素。 Then, you use a dictionary to store the words you have now and the words you want to replace them with.
然后,您使用字典来存储您现在拥有的单词以及您要替换为的单词。 You also loop over those and replace the current words with the replace ones.
您还可以遍历这些单词,并用替换单词替换当前单词。
def replace():
contents = ""
deleteWords = ["the ", "and ", "in "]
replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}
with open("meText.txt") as f:
contents = f.read()
for word in deleteWords:
contents = contents.replace(word,"")
for key, value in replaceWords.iteritems():
contents = contents.replace(key, value)
return contents
Use a list for deletion and dictionary for replacement. 使用列表进行删除,使用字典进行替换。 It should look something like this:
它应该看起来像这样:
def processTextFile(filename_in, filename_out, delWords, repWords):
with open(filename_in, "r") as sourcefile:
for line in sourcefile:
for item in delWords:
line = line.replace(item, "")
for key,value in repWords.items():
line = line.replace(key,value)
with open(filename_out, "a") as outfile:
outfile.write(line)
if __name__ == "__main__":
delWords = []
repWords = {}
delWords.extend(["the ", "and ", "in "])
repWords["ancient"] = "old"
repWords["month"] = "years"
repWords["centuries"] = "years"
processTextFile("myText.txt", "myOutText.txt", delWords, repWords)
Just a note, this is written for Python 3.3.2 which is why I am using items(). 请注意,这是为Python 3.3.2编写的,这就是为什么我使用items()的原因。 Use iteritems() if using Python 2.x as I think it is more efficient especially for large text files.
如果使用Python 2.x,请使用iteritems(),因为我认为它特别有效,特别是对于大型文本文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.