[英]Merge Two CSV files in Python
I have two csv files and I want to create a third csv from the a merge of the two. 我有两个csv文件,我想从两个合并中创建第三个csv。 Here's how my files look: 这是我的文件的外观:
Num | Num | status 状态
1213 | 1213 | closed 关闭
4223 | 4223 | open 打开
2311 | 2311 | open 打开
and another file has this: 另一个文件有这个:
Num | Num | code 码
1002 | 1002 | 9822 9822
1213 | 1213 | 1891 1891年
4223 | 4223 | 0011 0011
So, here is my little code that I was trying to loop through but it does not print the output with the third column added matching the correct values. 所以,这是我试图循环的小代码,但它没有打印输出,第三列添加了匹配正确的值。
def links():
first = open('closed.csv')
csv_file = csv.reader(first)
second = open('links.csv')
csv_file2 = csv.reader(second)
for row in csv_file:
for secrow in csv_file2:
if row[0] == secrow[0]:
print row[0]+"," +row[1]+","+ secrow[0]
time.sleep(1)
so what I want is something like: 所以我想要的是:
Num | Num | status | 状态| code 码
1213 | 1213 | closed | 关闭| 1891 1891年
4223 | 4223 | open | 打开| 0011 0011
2311 | 2311 | open | 打开| blank no match 空白不匹配
If you decide to use pandas
, you can do it in only five lines. 如果你决定使用pandas
,你可以只用五行。
import pandas as pd
first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')
merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)
You could read the values of the second file into a dictionary and then add them to the first. 您可以将第二个文件的值读入字典,然后将它们添加到第一个文件中。
Code = {}
for row in csv_file2:
Code[row[0]] = row[1]
for row in csv_file1:
row.append(Code.get(row[0], "blank no match"))
The problem is that you could iterate over a csv reader only once, so that csv_file2 does not work after the first iteration. 问题是你只能在csv阅读器上迭代一次,这样csv_file2在第一次迭代后就不起作用了。 To solve that you should save the output of csv_file2 and iterate over the saved list. 要解决这个问题,您应该保存csv_file2的输出并迭代保存的列表。 It could look like that: 它可能看起来像这样:
import time, csv
def links():
first = open('closed.csv')
csv_file = csv.reader(first, delimiter="|")
second = open('links.csv')
csv_file2 = csv.reader(second, delimiter="|")
list=[]
for row in csv_file2:
list.append(row)
for row in csv_file:
match=False
for secrow in list:
if row[0].replace(" ","") == secrow[0].replace(" ",""):
print row[0] + "," + row[1] + "," + secrow[1]
match=True
if not match:
print row[0] + "," + row[1] + ", blank no match"
time.sleep(1)
Output: 输出:
Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match
This code will do it for you: 这段代码将为您完成:
import csv
def links():
# open both files
with open('closed.csv') as closed, open('links.csv') as links:
# using DictReader instead to be able more easily access information by num
csv_closed = csv.DictReader(closed)
csv_links = csv.DictReader(links)
# create dictionaries out of the two CSV files using dictionary comprehensions
num_dict = {row['num']:row['status'] for row in csv_closed}
link_dict = {row['num']:row['code'] for row in csv_links}
# print header, each column has width of 8 characters
print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))
# print the information
for num, status in num_dict.items():
# note this call to link_dict.get() - we are getting values out of the link dictionary,
# but specifying a default return value of an empty string if num is not found in it
# to avoid an exception
print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))
links()
In it, I'm taking advantage of dictionaries, which let you access information by keys. 在其中,我正在利用字典,它允许您通过键访问信息。 I'm also using implicit loops (the dictionary comprehensions) which tend to be faster and require less code. 我也使用隐式循环(字典理解),它往往更快,需要更少的代码。
There are two quirks of this code that you should be aware of, that your example suggests are fine: 您应该注意这个代码有两个怪癖,您的示例建议很好:
Last note: I made some assumptions about how your input files are formatted since you called them "CSV" files. 最后一点:由于您将输入文件称为“CSV”文件,因此我对输入文件的格式进行了一些假设。 This is what my input files looked like for this code: 这是我的输入文件对于此代码的样子:
closed.csv closed.csv
num,status NUM,状态
1213,closed 1213,收
4223,open 4223,开
2311,open 2311,开
links.csv links.csv
num,code NUM,代码
1002,9822 1002,9822
1213,1891 1213,1891
4223,0011 4223,0011
Given those input files, the result looks like this: 给定这些输入文件,结果如下所示:
Num | Status | Code
1213 | closed | 1891
2311 | open |
4223 | open | 0011
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.