简体   繁体   中英

Merge Two CSV files in Python

I have two csv files and I want to create a third csv from the a merge of the two. Here's how my files look:

Num | status
1213 | closed
4223 | open
2311 | open

and another file has this:

Num | code
1002 | 9822
1213 | 1891
4223 | 0011

So, here is my little code that I was trying to loop through but it does not print the output with the third column added matching the correct values.

def links():
    first = open('closed.csv')
    csv_file = csv.reader(first)

    second = open('links.csv')
    csv_file2 = csv.reader(second)

    for row in csv_file:  
        for secrow in csv_file2:                             
            if row[0] == secrow[0]:
                print row[0]+"," +row[1]+","+ secrow[0]
                time.sleep(1)

so what I want is something like:

Num | status | code
1213 | closed | 1891
4223 | open | 0011
2311 | open | blank no match

This is definitely a job for pandas . You can easily read in both csv files as DataFrames and use either merge or concat. It'll be way faster and you can do it in just a few lines of code.

If you decide to use pandas , you can do it in only five lines.

import pandas as pd

first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')

merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)

You could read the values of the second file into a dictionary and then add them to the first.

Code = {}
for row in csv_file2:
    Code[row[0]] = row[1]

for row in csv_file1:
    row.append(Code.get(row[0], "blank no match"))

The problem is that you could iterate over a csv reader only once, so that csv_file2 does not work after the first iteration. To solve that you should save the output of csv_file2 and iterate over the saved list. It could look like that:

import time, csv


def links():
    first = open('closed.csv')
    csv_file = csv.reader(first, delimiter="|")


    second = open('links.csv')
    csv_file2 = csv.reader(second, delimiter="|")

    list=[]
    for row in csv_file2:
        list.append(row)


    for row in csv_file:
        match=False  
        for secrow in list:                             
            if row[0].replace(" ","") == secrow[0].replace(" ",""):
                print row[0] + "," + row[1] + "," + secrow[1]
                match=True
        if not match:
            print row[0] + "," + row[1] + ", blank no match" 
        time.sleep(1)

Output:

Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match

This code will do it for you:

import csv

def links():

    # open both files
    with open('closed.csv') as closed, open('links.csv') as links:

        # using DictReader instead to be able more easily access information by num
        csv_closed = csv.DictReader(closed)
        csv_links = csv.DictReader(links)

         # create dictionaries out of the two CSV files using dictionary comprehensions
        num_dict = {row['num']:row['status'] for row in csv_closed}
        link_dict = {row['num']:row['code'] for row in csv_links}   

    # print header, each column has width of 8 characters
    print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))

    # print the information
    for num, status in num_dict.items():

        # note this call to link_dict.get() - we are getting values out of the link dictionary,
        # but specifying a default return value of an empty string if num is not found in it
        # to avoid an exception
        print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))

links()

In it, I'm taking advantage of dictionaries, which let you access information by keys. I'm also using implicit loops (the dictionary comprehensions) which tend to be faster and require less code.

There are two quirks of this code that you should be aware of, that your example suggests are fine:

  1. Order is not preserved (because we're using dictionaries)
  2. Num entries that are in links.csv but not closed.csv are not included in the printout

Last note: I made some assumptions about how your input files are formatted since you called them "CSV" files. This is what my input files looked like for this code:

closed.csv

num,status
1213,closed
4223,open
2311,open

links.csv

num,code
1002,9822
1213,1891
4223,0011

Given those input files, the result looks like this:

Num      | Status   | Code  
1213     | closed   | 1891  
2311     | open     |  
4223     | open     | 0011  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM