简体   繁体   English

python:用其他文件中的单词替换文件中的单词

[英]python: replace words in file with words from other file

I have a large text-file in which there are words I want to replace. 我有一个很大的文本文件,其中有一些我想替换的单词。 I put those words in a csv-file, because I'm constantly adding and changing words and do not want to put the words in the python script itself. 我将这些单词放入一个csv文件中,因为我不断添加和更改单词,并且不想将这些单词放在python脚本本身中。 On each line is a word I want to replace, followed by the word I want to replace it with. 在每一行上都有一个我要替换的单词,然后是我要替换为的单词。 Like this: 像这样:

A_old,A_new
another word,another new word
something old,something new
hello,bye

I know how to replace single words in files with python with the string replace function, but I don't know how to do this when the words are listed in a different file. 我知道如何使用字符串替换功能用python替换文件中的单个单词,但是当单词在另一个文件中列出时,我不知道该怎么做。 I tried my best, but I can't wrap my head around how to work with dictionaries/lists/tuples. 我已尽力而为,但我无法集中精力处理字典/列表/元组的工作方式。 I am rather new to python, and until now I managed with examples from around the internet, but this is beyond my capabilities. 我对python相当陌生,直到现在我还是从Internet上使用示例进行管理,但这超出了我的能力范围。 I got all kinds of errors like 'unhashable type: list' and 'expected a character buffer object'. 我遇到了各种错误,例如“ unhashable type:list”和“ expected a character buffer object”。 The last thing I tried was the most succesful in that I didn't get any errors, but then nothing happened either. 我尝试的最后一件事是最成功的事情,因为我没有遇到任何错误,但是也没有任何反应。 This is the code. 这是代码。 I'm sure it's ugly, but I hope it's not entirely hopeless. 我确定这很丑陋,但我希望它并非完全没有希望。

reader = csv.reader(open('words.csv', 'r'))
d = {}
for row in reader:
    key, value = row
    d[key] = value

newwords = str(d.keys())
oldwords = str(d.values())

with open('new.txt', 'wt') as outfile:
    with open('old.txt', 'rt') as infile:
        for line in infile:
            outfile.write(line.replace(oldwords,newwords))

The reason I am doing this is because I'm working on a cookbook with an ingredient based index, and I don't want an index with both 'carrot' and 'carrots', instead I want to change 'carrot' into 'carrots', and so on for all the other ingredients. 我这样做的原因是因为我正在使用基于成分的索引编写食谱,并且我不希望同时包含“胡萝卜”和“胡萝卜”的索引,而是希望将“胡萝卜”更改为“胡萝卜” ',以此类推。 Thanks a bunch for a nudge in the right direction! 感谢一群人朝着正确的方向前进!

First you make a list of pairs (old_word, new_word) from 'word.csv' : 首先,您从'word.csv'中列出对(old_word,new_word)的列表:

old_new = [i.strip().split(',') for i in open('words.csv')]

Then, you can replace line by line : 然后,您可以逐行替换:

with open('new.txt', 'w') as outfile, open('old.txt') as infile:
    for line in infile:
        for oldword, newword in old_new:
            line = line.replace(oldword, newword)
        outfile.write(line)

or in the whole file at once : 或一次在整个文件中:

with open('new.txt', 'w') as outfile, open('old.txt') as infile:
    txt = infile.read()
    for oldword, newword in old_new:
        txt = txt.replace(oldword, newword)    
    outfile.write(txt)

but you have to replace one word at a time. 但您一次只能替换一个字。

In your code example you read the replacement word pairs into a dictionary, and then into two lists with keys and values. 在您的代码示例中,您将替换单词对读入字典中,然后读入具有键和值的两个列表中。 I'm not sure why. 我不知道为什么。

I propose to read the replacement words into a list of tuples. 我建议将替换词读入元组列表。

with open('words.csv', 'rb') as rep_words:
    rep_list = []
    for rep_line in rep_words:
        rep_list.append(tuple(rep_line.strip().split(',')))

Then you can open the old.txt and new.txt files and perform the replacement using a nested for loop 然后,您可以打开old.txtnew.txt文件,并使用嵌套的for循环执行替换

with open('old.txt', 'rb') as old_text:
    with open('new.txt', 'wb') as new_text:
        for read_line in old_text:
            new_line = read_line
            for old_word, new in rep_list:
                new_line = new_line.replace(old_word, new_word))
            new_text.write(new_line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM