简体   繁体   中英

Combining csv files using pandas (merging and duplication)

The task I want to do is a bit complicated so i'll try to explain it in the best way I can.

I have two cv files in the following format:

CSV1:

     Name     Var2 Var3
     John      6    7
     John      7    8
     Mike      5    6

CSV2:

    Name     Var4 Var5
    John      8    8
    John      9    9
    Mike      1    1
    Mike      2    2

What I essentially want to do is merge the files but in the following format:

    Name    Var2 Var3 Var4 Var5
    John      6    7   8    8
    John      6    7   9    9
    John      7    8   8    8
    John      7    8   9    9
    Mike      5    6   1    1
    Mike      5    6   2    2

It essentially duplicates every Name entry in the first csv depending on how many same name entries there are in the second csv and adds the respective columns from the second csv

I can check if the name in csv1 matches the name in csv 2 but from there on i'm not sure how to go on about it

You can use pd.merge

In [19]: df1
Out[19]: 
   Name  Var2  Var3
0  John     6     7
1  John     7     8
2  Mike     5     6

In [20]: df2
Out[20]: 
   Name  Var4  Var5
0  John     8     8
1  John     9     9
2  Mike     1     1
3  Mike     2     2

In [21]: df1.merge(df2, how='right', on='Name')
Out[21]: 
   Name  Var2  Var3  Var4  Var5
0  John     6     7     8     8
1  John     7     8     8     8
2  John     6     7     9     9
3  John     7     8     9     9
4  Mike     5     6     1     1
5  Mike     5     6     2     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM