简体   繁体   中英

How to merge multiple csv files into one file with specific columns on pandas, python?

I have 4 different csv files.

csv1:

ID      Fruit
1001    Apple
1002    Banana
1003    Kiwi

csv2:

ID      Color
1001    Green
1005    Red
1006    Orange
1007    Yellow

csv3:

ID      Size
1001    Large
1008    Small
1009    Medium
1010    Large

csv4:

ID      Price
1002    20
1009    40
1010    30
1011    50

And this is a master csv file that I want to make:

Number  ID      Fruit   Color   Size    Price

1       1001    Apple   Green   Large   
2       1002    Banana                  20
3       1003    Kiwi            
4       1005            Red     
5       1006            Orange      
6       1007            Yellow      
7       1008            Small   
8       1009            Medium          40
9       1010            Large           30
10      1011                            50

I think using pandas will be easier to make it, but I have no idea on Python .

As each csv file has different columns, how can I choose the column and paste all of them together on master csv file? If there is no information, I want to make it NULL or N/A value.

You can see the each of csv files and the master: Click here for image

I spent 6 hours already but I have no idea how to do this.

Thank you in advance.

reduce + combine_first

The key is to set 'ID' as the index that way we get the proper alignment across both axes. I've assumed all DataFrames are in memory, but if not you can read them into a list, or do the reading in the reduce step.

from functools import reduce

my_dfs = [df.set_index('ID') for df in [df1, df2, df3, df4]]
#my_dfs = [pd.read_csv(file).set_index('ID') for file in your_list_of_files]

reduce(lambda l,r: l.combine_first(r), my_dfs)

       Color   Fruit  Price    Size
ID                                 
1001   Green   Apple    NaN   Large
1002     NaN  Banana   20.0     NaN
1003     NaN    Kiwi    NaN     NaN
1005     Red     NaN    NaN     NaN
1006  Orange     NaN    NaN     NaN
1007  Yellow     NaN    NaN     NaN
1008     NaN     NaN    NaN   Small
1009     NaN     NaN   40.0  Medium
1010     NaN     NaN   30.0   Large
1011     NaN     NaN   50.0     NaN

Something like this should work:

import pandas as pd

list_of_csv_filenames = ['csv1.csv', 'csv2.csv', 'csv3.csv', 'csv4.csv']
all_dfs = []
for i in range(1, 5):
    temp = pd.read_csv(list_of_csv_filesnames[i-1])
    temp['Number'] = i
    all_dfs.append(temp)
full_df = pd.concat(all_dfs)
full_df.to_csv('output_filename.csv', index=False)

Do like below in three simple steps.

**STEP-1: Import packages and set the working directory **

Change “/mydir” to your desired working directory.

import os
import glob
import pandas as pd
os.chdir("/mydir")

**STEP-2: Use glob to match the pattern 'csv' **

Match the pattern ('csv') and save the list of file names in the 'all_filenames' variable. You can check out this link to learn more about regular expression matching.

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

Step 3: Combine all files in the list and export as CSV

Use pandas to concatenate all files in the list and export as CSV. The output file is named “combined_csv.csv” located in your working directory.

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')

encoding = 'utf-8-sig' is added to overcome the issue when exporting 'Non-English' languages.

And…it's done!

Either:

fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
    fout.write(line)
# now the rest:    
for num in range(2,201):
    f = open("sh"+str(num)+".csv")
    f.next() # skip the header
    for line in f:
         fout.write(line)
    f.close() # not really needed
fout.close()

You can use merge:

import pandas as pd

df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')
df = df1.merge(df2).merge(df3).merge(df4)
df.to_csv('result.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM