Replace a pattern in CSV file Python

Question

I have multiple CSV files that could represent similar things in multiple ways. For instance, 15 years can be written either as age: 15, age (years): 15, age: 15 years (these are all the patterns I've seen till now). I'd like to replace all those with 15 years. I know how to do it when I know the actual age or the column number, but the age is definitely different for each occurrence and the column is not fixed. The csv files could be like below:

CSV1:

h1,h2,h3
A1,age:15,hh
B3,age:10,fg

Desired CSV1

h1,h2,h3
A1,15 years,hh
B3,10 years,fg

When ever its just age: 15, its definitely years and not months or any other unit.

Answer 1

Use re.sub like below,

re.sub(r'(,|^)(?:age\s*(?:\(years\))?:\s*(\d+)\s*(?:years)?)(?=,|$)',
       r'\1\2 years', string)

DEMO

Example:

import re
import csv
with open('file') as f:
    reader = csv.reader(f)
    for i in reader:
        print(re.sub(r'(,|^)(?:age\s*(?:\(years\))?:\s*(\d+)\s*(?:years)?)(?=,|$)', r'\1\2 years', ','.join(i)))

Output:

h1,h2,h3
A1,15 years,hh
B3,10 years,fg

OR

for i in reader:
    print(re.sub(r'(,|^)[^,\n]*age\s*:[^,\n]*\b(\d+)\b[^,\n]*', r'\1\2 years', ','.join(i)))

Answer 2

Use the translate table methods in the string module.

import csv
from string import maketrans
from string import ascii_uppercase, ascii_lowercase
delete = ascii_uppercase + ascii_lowercase + ":"
tran = maketrans("", "")

with open("infile.csv", "rb") as infile, open("output.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    for row in reader:
        #assuming the second field here
        row[1] = row[1].translate(tran, delete) + " years"
        writer.writerow(row)

I generally prefer string.translate over regex where applicable as it's easier to follow and debug.

Answer 3

Its a guessing game, but if the rule is that you want to convert anything that has the word "year" and some decimal number, this should do.

import re

_is_age_search = re.compile(r"year|age", re.IGNORECASE).search
_find_num_search = re.compile(r"(\d+)").search

outdir = '/some/dir'
for filename in csv_filenames:
    with open(filename) as f_in, open(os.path.join(outdir, filename), 'w') as f_out:
        writer = csv.writer(f_out)
        for row in csv.reader(f_in):
            for i, val in enumerate(row):
                if _is_age_search(val):
                    search = _find_num_search(val)
                    if search:
                        row[i] = "%d years" % search.groups()
            writer.writerow(row)

Replace a pattern in CSV file Python

Question

3 answers

solution1
1 ACCPTED 2015-01-16 00:54:33

solution2
1 2015-01-16 01:21:48

solution3
0 2015-01-16 01:23:14

Replace a pattern in CSV file Python

Question

3 answers

solution1 1 ACCPTED 2015-01-16 00:54:33

solution2 1 2015-01-16 01:21:48

solution3 0 2015-01-16 01:23:14

solution1
1 ACCPTED 2015-01-16 00:54:33

solution2
1 2015-01-16 01:21:48

solution3
0 2015-01-16 01:23:14