Merge specific column data from multiple csv files

Question

I have multiple (large) csv files, let they be 1.csv and 2.csv . Both have the same unique identifier column. For example, with the identifier name :

1.csv                     2.csv

name,age,height           name,gender
john,34,176               john,male
mary,19,183               kim,female
kim,27,157

from these csv files i create two dataframes df1 and df2 .

The goal is to merge some of the data (not all columns). Condition is that the person exist in both csv files:

result

name,age,gender
john,34,male
kim,27,female

To achieve this, i did the following:

names = df1['name'].tolist()

result_rows = []
for name_iter in names :
    age_df =    df1[df1['name'] == name_iter ][['age']]
    gender_df = df2[df2['name'] == name_iter ][['gender']]

    if gender_df.empty:
        continue

    age = age_df.values[0][0]
    gender = gender_df.values[0][0]
    row = [name, age, gender]

    result_rows.append(row)

After that i have a list of lists (result_rows) which i write to a csv file with the python build-in module.

I think the code is hard to read/understand. Is there any simpler solution, ie to avoid putting the data from the dataframes in a list structure for this task?

Answer 1

Consider using the pandas merge function.

import pandas as pd

# If 'name' is the only identifier in both DFs:
df3 = df1.merge(df2, on="name")

# Else if 'name', 'age', and 'gender' are available in both DFs:
df3 = df1.merge(df2, on=["name", "age", "gender"])

Answer 2

df1=pd.DataFrame({'name':['john','mary','kim'],'age':[34,19,27],'height':[176,183,157]})
df2=pd.DataFrame({'name':['john','kim'],'gender':['male','female']})
df=df2.merge(df1,on='name')
del df['height']

edit:if you dont want to del this specific column, just show which columns you want to use:

df=df[['gender','name','age']]
print(df)
   gender  name  age
0    male  john   34
1  female   kim   27

Merge specific column data from multiple csv files

Question

2 answers

solution1
2 ACCPTED 2019-06-26 12:11:12

solution2
2 2019-06-26 12:11:43

Merge specific column data from multiple csv files

Question

2 answers

solution1 2 ACCPTED 2019-06-26 12:11:12

solution2 2 2019-06-26 12:11:43

solution1
2 ACCPTED 2019-06-26 12:11:12

solution2
2 2019-06-26 12:11:43