简体   繁体   中英

MERGE dataframes on multiple columns - Python

I have two dataframes. df1:

    ID    Date                     Name      Volume Up
0   1   2019-02-01 to 2019-03-15   Call       50
1   1   2019-02-01 to 2019-03-15   Email      60.5
2   1   2019-02-01 to 2019-03-15   Radio      20
3   2   2019-02-01 to 2019-03-15   Call       5.5
4   2   2019-02-01 to 2019-03-15   Email      6.4
5   2   2019-02-01 to 2019-03-15   Radio      15

df2:

    ID          Date               Name      Volume Down
0   1   2019-02-01 to 2019-03-15   Call       66
1   1   2019-02-01 to 2019-03-15   Email      50
2   1   2019-02-01 to 2019-03-15   Radio      40
3   2   2019-02-01 to 2019-03-15   Call       10
4   2   2019-02-01 to 2019-03-15   Email      12.2
5   2   2019-02-01 to 2019-03-15   Radio      7

I would like to merge both files on ID, Date and Name column.

Currently, i am using

merged = df1.merged(df2, on=['ID','Date','Name']

But it is only returning 'Call'

Expected output:

   ID    Date                     Name      Volume Up Volume Down
0   1   2019-02-01 to 2019-03-15   Call       50         66
1   1   2019-02-01 to 2019-03-15   Email      60.5       50
2   1   2019-02-01 to 2019-03-15   Radio      20         40
3   2   2019-02-01 to 2019-03-15   Call       5.5        10
4   2   2019-02-01 to 2019-03-15   Email      6.4        12.2
5   2   2019-02-01 to 2019-03-15   Radio      15         7

What is the best way to go about this?

Update: Your new sample data still merges fine for me, so it's possible your real columns have spurious whitespace. Try to strip() them before merging:

df1.Date = df1.Date.str.strip()
df2.Date = df2.Date.str.strip()
df1.Name = df1.Name.str.strip()
df2.Name = df2.Name.str.strip()

merged = df1.merge(df2, on=['ID', 'Date', 'Name'])

#    ID                      Date   Name  Volume Up  Volume Down
# 0   1  2019-02-01 to 2019-03-15   Call       50.0         66.0
# 1   1  2019-02-01 to 2019-03-15  Email       60.5         50.0
# 2   1  2019-02-01 to 2019-03-15  Radio       20.0         40.0
# 3   2  2019-02-01 to 2019-03-15   Call        5.5         10.0
# 4   2  2019-02-01 to 2019-03-15  Email        6.4         12.2
# 5   2  2019-02-01 to 2019-03-15  Radio       15.0          7.0

You can give merge() a list of columns to merge on :

out = df1.merge(df2, on=['ID', 'Date', 'Name'])

#    ID        Date     Name  Volume Up  Volume Down
# 0   1  2019-01-01   Amazon       50.0         33.0
# 1   1  2019-02-01  Netflix       60.5         67.0
# 2   2  2019-01-01   Amazon        5.5          3.5
# 3   2  2019-02-01  Netflix       20.0         47.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM