简体   繁体   中英

How to delete and replace columns in a csv file by comparing it to other csv files in python?

I am writing a python code to search,delete and replace columns in a csv file I have 3 files.

Input.csv:

aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx

delete.csv:

aaaaaaaa
eeeeeeee
uuuuuuuu

replace.csv:

iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444

here is my code:

input_file='input.csv'
new_array=[]
for line in open(input_file):
    data=line.split(',')
    a==data[0]
    b=data[1]
    c=data[2]
    d=data[3]
    for line2 in open(delete):
        if (name in line2)==True:
            break
        else:
            for line1 in open(replace):
                data1=line1.split(',')
                aa=data1[0]
                replaced_a=data1[1]
                repalced_b=data1[2]


            if (data[0]==data1[0]):

                data[0]=data1[1]
                data[2]=data1[2]
                new_array=data
                print(new_array)

            else:   
                new_array=data

my logic is:

1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.

so my desired output is :

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

so right now i am facing the following problems: 1)lines that are not present in either replace.csv or delete.csv dont get printed 2)there is a possibility that my input.csv may contain newlines within one entry so reading line by line is a problem,however it is a certainty that the data distributed on the different lines is between quotes. eg:

aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444

any help in bringing the code and my logic together is appreciated.

I would suggest changing this up a bit:

  • read the things you want to replace in a dictionary
    • set key to what is in your datas 0th spot, set value as what to replace the 0th and 1st spot of your data with
  • read the things you want to delete into a set
    • if your data-row start with it: skip row else add it to the output.

Loop over your data and use the both lookups to "do the right thing".

I changed your data a bit to incorperate the mentioned "escaped" data including newlines:

File creation:

with open("i.csv","w") as f: 
    f.write("""
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
"mmmm
mmmm",nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx""")

with open ("d.csv","w") as f: 
    f.write("""
aaaaaaaa
eeeeeeee
uuuuuuuu""")

with open ("r.csv","w") as f: 
    f.write("""
iiiiiiii,11111111,22222222
"mmmm
mmmm",33333333,44444444""")

Programm:

import csv

def read_file(fn):
    rows = [] 
    with open(fn) as f:
        reader = csv.reader(f, quotechar='"',delimiter=",")
        for row in reader:
            if row:                     # eliminate empty rows from data read
                rows.append(row)
    return rows 

# create a dict for the replace stuff        
replace = {x[0]:x[1:] for x in read_file("r.csv")}

# create a set for the delete stuff
delete = set( (row[0] for row in read_file("d.csv")) )  

# collect what we need to write back
result = []

# https://docs.python.org/3/library/csv.html
with open("i.csv") as f:
    reader = csv.reader(f, quotechar='"')
    for row in reader:
        if row:
            if row[0] in delete:
                continue                                   # skip data row
            elif row[0] in replace:
                # replace with mapping, add rest of row
                result.append(replace[row[0]] + row[2:])   # replace data
            else:
                result.append(row)                         # use as is

# write result back into file
with open ("done.csv", "w", newline="") as f:
    w = csv.writer(f,quotechar='"', delimiter= ",")
    w.writerows(result)

Check result:

with open ("done.csv") as f:
    print(f.read()) 

Output:

11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt

Doku:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM