简体   繁体   English

删除一些单词替换txt文件中的其他单词

[英]Remove some words replace some other words from a txt file

I have a txt file (myText.txt) containing many lines of text. 我有一个包含许多行文本的txt文件(myText.txt)。

I would like to know : 我想知道 :

  • How to create a list of word that needs to be deleted (I want to set up the words myself) 如何创建需要删除的单词列表(我想自己设置单词)
  • How to create a list of word that needs to be replaced 如何创建需要替换的单词列表

For instance if myText.txt is: 例如,如果myText.txt是:

    The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month. 
  • I would like to remove "the" "and" "in" I would like to replace "ancient" by "old" 我想删除“the”和“in”我想用“旧”代替“古代”
  • I would like to replace "month" and "centuries" by "years" 我想用“年”代替“月”和“世纪”

You could always use a regex: 您可以随时使用正则表达式:

import re

st='''\
The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''

deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
    tgt=r'\b{}\b'.format(word)
    st=re.sub(tgt,repl[word],st)


print st

This should do the trick. 这应该可以解决问题。 You use a list to store the objects you want to delete, and then loop through the list and remove every element in the list from the contents string. 使用列表存储要删除的对象,然后遍历列表并从内容字符串中删除列表中的每个元素。 Then, you use a dictionary to store the words you have now and the words you want to replace them with. 然后,您使用字典来存储您现在拥有的单词以及您要替换为的单词。 You also loop over those and replace the current words with the replace ones. 您还可以遍历这些单词,并用替换单词替换当前单词。

def replace():
    contents = ""
    deleteWords = ["the ", "and ", "in "]
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}

    with open("meText.txt") as f:
    contents = f.read()
    for word in deleteWords:
    contents = contents.replace(word,"")

    for key, value in replaceWords.iteritems():
    contents = contents.replace(key, value)
    return contents

Use a list for deletion and dictionary for replacement. 使用列表进行删除,使用字典进行替换。 It should look something like this: 它应该看起来像这样:

 def processTextFile(filename_in, filename_out, delWords, repWords):


    with open(filename_in, "r") as sourcefile:
        for line in sourcefile:
            for item in delWords:
                line = line.replace(item, "")
            for key,value in repWords.items():
                line = line.replace(key,value)

            with open(filename_out, "a") as outfile:
                outfile.write(line)



if __name__ == "__main__":
    delWords = []
    repWords = {}

    delWords.extend(["the ", "and ", "in "])
    repWords["ancient"] = "old"
    repWords["month"] = "years"
    repWords["centuries"] = "years"

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords)

Just a note, this is written for Python 3.3.2 which is why I am using items(). 请注意,这是为Python 3.3.2编写的,这就是为什么我使用items()的原因。 Use iteritems() if using Python 2.x as I think it is more efficient especially for large text files. 如果使用Python 2.x,请使用iteritems(),因为我认为它特别有效,特别是对于大型文本文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM