简体   繁体   中英

How to combine two pandas dataframes with different length based on their row values

I have the following two pandas dataframes:

dataframe #1:

    user_id       animals    
0         1         'dog'
1         1         'cat'
2         1         'cow'
3         2         'dog'
4         2         'cat'
5         2         'cow'
...

dataframe #2: (column_D is not important in this task)

    location     column_D
0       'CA'            1
1       'MA'            1
2       'AZ'            1
3       'CT'            1
...

I hope to create a new dataframe #3 based on #1 and #2:

dataframe #3:

    user_id       animals       location
0         1         'dog'           'MA'
1         1         'cat'           'MA'
2         1         'cow'           'MA'
3         2         'dog'           'AZ'
4         2         'cat'           'AZ'
5         2         'cow'           'AZ'
...

The first and second columns of dataframe #3 are identical to dataframe #1. For the third column, I hope to assign a location based on its user_id and the index in dataframe #2. For example, for row 0 in dataframe #3, since its user_id = 1, I will check the location in dataframe #2 with index = 1, then assign that location ('MA' in this example) to the user.

I've search for examples that use functions such as concat, map, merge, but couldn't find similar examples to this case. Is there a way to achieve this task?

Thank you so much!

Try map :

df["location"] = df1.user_id.map(df2.location)

    user_id animals location
0      1    'dog'   'MA'
1      1    'cat'   'MA'
2      1    'cow'   'MA'
3      2    'dog'   'AZ'
4      2    'cat'   'AZ'
5      2    'cow'   'AZ'

Based on your question, I've had to make the assumption that you want to drop column_D after the merge. The following code works.

#Import Pandas
import pandas as pd

#Create dataframes: df1 and df2
df1 = pd.DataFrame({'user_id':{0: 1, 1:1, 2:1, 3:2, 4:2, 5:2}, 'animal':{0:'dog', 1:'cat', 2:'cow', 3:'dog', 4:'cat', 5:'cow'}})
df2 = pd.DataFrame({'location':{0:'CA', 1:'MA', 2:'AZ', 3:'CT'}, 'column_D':{0:1, 1:1, 2:1, 3:1}})
    
#Combine matching items from df2 to df1 by focus on 'user_id'column and drop column_D
df3 = df1.join(df2, on='user_id').drop('column_D', 1)

#Display new dataframe
df3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM