简体   繁体   中英

Python find matching string in each line

I would like to read each row of the csv file and match each word in the row with a list of strings. If any of the strings appears in the row, then write that string at the end of the line separated by comma. The code below doesn't give me what I want.

file = 'test.csv'    
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
name=''
with open('testnew2.csv','a') as f:
    for line in text_lines:
        line=str(line)
        #words = line.split()
        with open('names.csv', 'r') as fd:
            reader = csv.reader(fd, delimiter=',')
            for row in reader:
                if row[0] in line:
                   name=row
                   print(name)
                   f.write(line+","+name[0]+'\n')

A sample of test.csv would look like this:

A,B,C,D
ABCD,,,
Total,Robert,,
Name,Annie,,
Total,Robert,,

And the names.csv would look:

Robert
Annie
Amanda

The output I want is:

A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert

Currently the code will get rid of lines that don't result in a match, so I got:

Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert

Process each line by testing row[1] and appending the 5th column, then writing it. The name list isn't really a csv. If it's really long use a set for lookup. Read it only once for efficiency as well.

import csv

with open('names.txt') as f:
    names = set(f.read().strip().splitlines())

# newline='' per Python 3 csv documentation...
with open('input.csv',newline='') as inf:
    with open('output.csv','w',newline='') as outf:
        r = csv.reader(inf)
        w = csv.writer(outf)
        for row in r:
            row.append(row[1] if row[1] in names else '')
            w.writerow(row)

Output:

A,B,C,D,
ABCD,,,,
Total,Robert,,,Robert
Name,Annie,,,Annie
Total,Robert,,,Robert

I think the problem is you're only writing when the name is in the row. To fix that move the writing call outside of the if conditional:

if row[0] in line:
               name=row
               print(name)
f.write(line+","+name[0]+'\n')

I'm guessing that print statement is for testing purposes?

EDIT: On second thought, you may need to move name='' inside the loop as well so it is reset after each row is written, that way you don't get names from matched rows bleeding into unmatched rows.

EDIT: Decided to show an implementation that should avoid the (possible) problem of two matched names in a row:

EDIT: Changed name=row and the call of name[0] in f.write() to name=row[0] and a call of name in f.write()

file = 'test.csv'    
read_files = open(file)
lines=read_files.read()
text_lines = lines.split("\n")
with open('testnew2.csv','a') as f:
    for line in text_lines:
        name=''
        line=str(line)
        #words = line.split()
        with open('names.csv', 'r') as fd:
            reader = csv.reader(fd, delimiter=',')
            match=False
            while match == False:
                for row in reader:
                    if row[0] in line:
                       name=row[0]
                       print(name)
                       match=True
                    f.write(line+","+name+'\n')

Try this as well:

import csv

myFile = open('testnew2.csv', 'wb+')

writer = csv.writer(myFile)

reader2 = open('names.csv').readlines()

with open('test.csv') as File1:

reader1 = csv.reader(File1)

for row in reader1:

    name = ""

    for record in reader2:

        record = record.replace("\n","")

        if record in row:

            row.append(record)

        writer.writerow(row)

        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM