I am writing a python code to search,delete and replace columns in a csv file I have 3 files.
Input.csv:
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
mmmmmmmm,nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx
delete.csv:
aaaaaaaa
eeeeeeee
uuuuuuuu
replace.csv:
iiiiiiii,11111111,22222222
mmmmmmmm,33333333,44444444
here is my code:
input_file='input.csv'
new_array=[]
for line in open(input_file):
data=line.split(',')
a==data[0]
b=data[1]
c=data[2]
d=data[3]
for line2 in open(delete):
if (name in line2)==True:
break
else:
for line1 in open(replace):
data1=line1.split(',')
aa=data1[0]
replaced_a=data1[1]
repalced_b=data1[2]
if (data[0]==data1[0]):
data[0]=data1[1]
data[2]=data1[2]
new_array=data
print(new_array)
else:
new_array=data
my logic is:
1)open input.csv read line by line
2)load elements into an array
3)compare first element with entire delete.csv
4)if found in delete.csv then do nothing and take next line in array
5)if not found in delete.csv then compare with replace.csv
6)if the first element is found in the first column of replace.csv then replace the element by the corresponding second column of replace.csv and the second element with the corresponding 3rd third column of repalce.csv.
7)load this array into a bigger 10 element array.
so my desired output is :
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
so right now i am facing the following problems: 1)lines that are not present in either replace.csv or delete.csv dont get printed 2)there is a possibility that my input.csv may contain newlines within one entry so reading line by line is a problem,however it is a certainty that the data distributed on the different lines is between quotes. eg:
aaaaa,bbbb,ccccc,"ddddddddddd
ddddddd"
11111,2222,3333,4444
any help in bringing the code and my logic together is appreciated.
I would suggest changing this up a bit:
replace
in a dictionary
delete
into a set
Loop over your data and use the both lookups to "do the right thing".
I changed your data a bit to incorperate the mentioned "escaped" data including newlines:
File creation:
with open("i.csv","w") as f:
f.write("""
aaaaaaaa,bbbbbb,cccccc,ddddddd
eeeeeeee,ffffff,gggggg,hhhhhhh
iiiiiiii,jjjjjj,kkkkkk,lllllll
"mmmm
mmmm",nnnnnn,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
uuuuuuuu,vvvvvv,wwwwww,xxxxxxx""")
with open ("d.csv","w") as f:
f.write("""
aaaaaaaa
eeeeeeee
uuuuuuuu""")
with open ("r.csv","w") as f:
f.write("""
iiiiiiii,11111111,22222222
"mmmm
mmmm",33333333,44444444""")
Programm:
import csv
def read_file(fn):
rows = []
with open(fn) as f:
reader = csv.reader(f, quotechar='"',delimiter=",")
for row in reader:
if row: # eliminate empty rows from data read
rows.append(row)
return rows
# create a dict for the replace stuff
replace = {x[0]:x[1:] for x in read_file("r.csv")}
# create a set for the delete stuff
delete = set( (row[0] for row in read_file("d.csv")) )
# collect what we need to write back
result = []
# https://docs.python.org/3/library/csv.html
with open("i.csv") as f:
reader = csv.reader(f, quotechar='"')
for row in reader:
if row:
if row[0] in delete:
continue # skip data row
elif row[0] in replace:
# replace with mapping, add rest of row
result.append(replace[row[0]] + row[2:]) # replace data
else:
result.append(row) # use as is
# write result back into file
with open ("done.csv", "w", newline="") as f:
w = csv.writer(f,quotechar='"', delimiter= ",")
w.writerows(result)
Check result:
with open ("done.csv") as f:
print(f.read())
Output:
11111111,22222222,kkkkkk,lllllll
33333333,44444444,oooooo,ppppppp
qqqqqqqq,rrrrrr,ssssss,ttttttt
Doku:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.