I have two csv files and I want to create a third csv from the a merge of the two. Here's how my files look:
Num | status
1213 | closed
4223 | open
2311 | open
and another file has this:
Num | code
1002 | 9822
1213 | 1891
4223 | 0011
So, here is my little code that I was trying to loop through but it does not print the output with the third column added matching the correct values.
def links():
first = open('closed.csv')
csv_file = csv.reader(first)
second = open('links.csv')
csv_file2 = csv.reader(second)
for row in csv_file:
for secrow in csv_file2:
if row[0] == secrow[0]:
print row[0]+"," +row[1]+","+ secrow[0]
time.sleep(1)
so what I want is something like:
Num | status | code
1213 | closed | 1891
4223 | open | 0011
2311 | open | blank no match
If you decide to use pandas
, you can do it in only five lines.
import pandas as pd
first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')
merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)
You could read the values of the second file into a dictionary and then add them to the first.
Code = {}
for row in csv_file2:
Code[row[0]] = row[1]
for row in csv_file1:
row.append(Code.get(row[0], "blank no match"))
The problem is that you could iterate over a csv reader only once, so that csv_file2 does not work after the first iteration. To solve that you should save the output of csv_file2 and iterate over the saved list. It could look like that:
import time, csv
def links():
first = open('closed.csv')
csv_file = csv.reader(first, delimiter="|")
second = open('links.csv')
csv_file2 = csv.reader(second, delimiter="|")
list=[]
for row in csv_file2:
list.append(row)
for row in csv_file:
match=False
for secrow in list:
if row[0].replace(" ","") == secrow[0].replace(" ",""):
print row[0] + "," + row[1] + "," + secrow[1]
match=True
if not match:
print row[0] + "," + row[1] + ", blank no match"
time.sleep(1)
Output:
Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match
This code will do it for you:
import csv
def links():
# open both files
with open('closed.csv') as closed, open('links.csv') as links:
# using DictReader instead to be able more easily access information by num
csv_closed = csv.DictReader(closed)
csv_links = csv.DictReader(links)
# create dictionaries out of the two CSV files using dictionary comprehensions
num_dict = {row['num']:row['status'] for row in csv_closed}
link_dict = {row['num']:row['code'] for row in csv_links}
# print header, each column has width of 8 characters
print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))
# print the information
for num, status in num_dict.items():
# note this call to link_dict.get() - we are getting values out of the link dictionary,
# but specifying a default return value of an empty string if num is not found in it
# to avoid an exception
print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))
links()
In it, I'm taking advantage of dictionaries, which let you access information by keys. I'm also using implicit loops (the dictionary comprehensions) which tend to be faster and require less code.
There are two quirks of this code that you should be aware of, that your example suggests are fine:
Last note: I made some assumptions about how your input files are formatted since you called them "CSV" files. This is what my input files looked like for this code:
closed.csv
num,status
1213,closed
4223,open
2311,open
links.csv
num,code
1002,9822
1213,1891
4223,0011
Given those input files, the result looks like this:
Num | Status | Code
1213 | closed | 1891
2311 | open |
4223 | open | 0011
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.