简体   繁体   English

使用 CSV 列在文本文件中搜索和替换

[英]Using CSV columns to Search and Replace in a text file

Background背景

I have a two column CSV file like this:我有一个像这样的两列 CSV 文件:

Find寻找 Replace代替
is was曾是
A一种 one
b two

etc.等等。

First column is text to find and second is text to replace.第一列是要查找的文本,第二列是要替换的文本。

I have second file with some text like this:我有第二个文件,里面有一些这样的文字:

"This is A paragraph in a text file." “这是文本文件中的一段。” (Please note the case sensitivity) (请注意区分大小写)

My requirement:我的要求:

I want to use that csv file to search and replace in the text file with three conditions:-我想使用该csv 文件在具有三个条件的文本文件中搜索和替换:-

  1. whole word replacement.全字替换。
  2. case sensitive replacement.区分大小写的替换。
  3. Replace all instances of each entry in CSV替换 CSV 中每个条目的所有实例

Script tried:脚本尝试:

with open(CSV_file.csv', mode='r') as infile:
    reader = csv.reader(infile)
    mydict = {(r'\b' + rows[0] + r'\b'): (r'\b' + rows[1]+r'\b') for rows in reader}<--Requires Attention
    print(mydict)

with open('find.txt') as infile, open(r'resul_out.txt', 'w') as outfile:
    for line in infile:
        for src, target in mydict.items():
            line = re.sub(src, target, line)  <--Requires Attention
            # line = line.replace(src, target)
        outfile.write(line)

Description of script I have loaded my csv into a python dictionary and use regex to find whole words.脚本说明我已将 csv 加载到 python 字典中,并使用正则表达式查找整个单词。

Problems问题

I used r'\\b' to make word boundry in order to make whole word replacement but output gives me "\\\\b" in the dictionary instead of '\\b' ??我使用 r'\\b' 来制作单词边界以进行整个单词替换,但输出在字典中给了我 "\\\\b" 而不是 '\\b' ??

using REPLACE function gives like:使用 REPLACE 函数给出如下:

"Thwas was one paragraph in a text file." “这是文本文件中的一个段落。”

secondly I don't know how to make replacement case sensitive in regex pattern?其次,我不知道如何在正则表达式模式中使替换区分大小写

If anyone know better solution than this script or can improve the script?如果有人知道比这个脚本更好的解决方案或者可以改进脚本?

Thanks for help if any..有的话谢谢帮忙。。

I'd just put pure strings into mydict so it looks like我只是将纯字符串放入mydict所以它看起来像

{'is': 'was', 'A': 'one', ...}

and replace this line:并替换此行:

# line = re.sub(src, target, line) # old
line = re.sub(r'\b' + src + r'\b', target, line) # new

Note that you don't need \\b in the replacement pattern.请注意,替换模式中不需要\\b Regarding your other questions,关于你的其他问题,

  • regular expressions are case-sensitive by default,正则表达式默认区分大小写,
  • changing '\\b' to '\\\\b' is exactly what the r'' does.'\\b'更改为 '\\\\b' 正是r''所做的。 You can omit the r and write '\\\\b' , but that quickly gets ugly with more complex regexs.您可以省略r并编写'\\\\b' ,但是使用更复杂的正则表达式会很快变得丑陋。

Here's a more cumbersome approach (more code) but which is easier to read and does not rely on regular expressions.这是一种更麻烦的方法(更多代码),但更易于阅读并且不依赖于正则表达式。 In fact, given the very simple nature of your CSV control file, I wouldn't normally bother using the csv module at all:-事实上,鉴于您的 CSV 控制文件非常简单,我通常根本不会费心使用 csv 模块:-

import csv

with open('temp.csv', newline='') as c:
    reader = csv.DictReader(c, delimiter=' ')
    D = {}
    for row in reader:
        D[row['Find']] = row['Replace']
    with open('input.txt', newline='') as infile:
        with open('output.txt', 'w') as outfile:
            for line in infile:
                tokens = line.split()
                for i, t in enumerate(tokens):
                    if t in D:
                        tokens[i] = D[t]
                outfile.write(' '.join(tokens)+'\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM