简体   繁体   中英

Using CSV columns to Search and Replace in a text file

Background

I have a two column CSV file like this:

Find Replace
is was
A one
b two

etc.

First column is text to find and second is text to replace.

I have second file with some text like this:

"This is A paragraph in a text file." (Please note the case sensitivity)

My requirement:

I want to use that csv file to search and replace in the text file with three conditions:-

  1. whole word replacement.
  2. case sensitive replacement.
  3. Replace all instances of each entry in CSV

Script tried:

with open(CSV_file.csv', mode='r') as infile:
    reader = csv.reader(infile)
    mydict = {(r'\b' + rows[0] + r'\b'): (r'\b' + rows[1]+r'\b') for rows in reader}<--Requires Attention
    print(mydict)

with open('find.txt') as infile, open(r'resul_out.txt', 'w') as outfile:
    for line in infile:
        for src, target in mydict.items():
            line = re.sub(src, target, line)  <--Requires Attention
            # line = line.replace(src, target)
        outfile.write(line)

Description of script I have loaded my csv into a python dictionary and use regex to find whole words.

Problems

I used r'\\b' to make word boundry in order to make whole word replacement but output gives me "\\\\b" in the dictionary instead of '\\b' ??

using REPLACE function gives like:

"Thwas was one paragraph in a text file."

secondly I don't know how to make replacement case sensitive in regex pattern?

If anyone know better solution than this script or can improve the script?

Thanks for help if any..

I'd just put pure strings into mydict so it looks like

{'is': 'was', 'A': 'one', ...}

and replace this line:

# line = re.sub(src, target, line) # old
line = re.sub(r'\b' + src + r'\b', target, line) # new

Note that you don't need \\b in the replacement pattern. Regarding your other questions,

  • regular expressions are case-sensitive by default,
  • changing '\\b' to '\\\\b' is exactly what the r'' does. You can omit the r and write '\\\\b' , but that quickly gets ugly with more complex regexs.

Here's a more cumbersome approach (more code) but which is easier to read and does not rely on regular expressions. In fact, given the very simple nature of your CSV control file, I wouldn't normally bother using the csv module at all:-

import csv

with open('temp.csv', newline='') as c:
    reader = csv.DictReader(c, delimiter=' ')
    D = {}
    for row in reader:
        D[row['Find']] = row['Replace']
    with open('input.txt', newline='') as infile:
        with open('output.txt', 'w') as outfile:
            for line in infile:
                tokens = line.split()
                for i, t in enumerate(tokens):
                    if t in D:
                        tokens[i] = D[t]
                outfile.write(' '.join(tokens)+'\n')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM