简体   繁体   English

如何根据行值组合两个不同长度的熊猫数据框

[英]How to combine two pandas dataframes with different length based on their row values

I have the following two pandas dataframes:我有以下两个熊猫数据框:

dataframe #1:数据框 #1:

    user_id       animals    
0         1         'dog'
1         1         'cat'
2         1         'cow'
3         2         'dog'
4         2         'cat'
5         2         'cow'
...

dataframe #2: (column_D is not important in this task)数据框 #2:(column_D 在此任务中不重要)

    location     column_D
0       'CA'            1
1       'MA'            1
2       'AZ'            1
3       'CT'            1
...

I hope to create a new dataframe #3 based on #1 and #2:我希望基于 #1 和 #2 创建一个新的数据框 #3:

dataframe #3:数据框 #3:

    user_id       animals       location
0         1         'dog'           'MA'
1         1         'cat'           'MA'
2         1         'cow'           'MA'
3         2         'dog'           'AZ'
4         2         'cat'           'AZ'
5         2         'cow'           'AZ'
...

The first and second columns of dataframe #3 are identical to dataframe #1.数据帧#3 的第一列和第二列与数据帧#1 相同。 For the third column, I hope to assign a location based on its user_id and the index in dataframe #2.对于第三列,我希望根据其 user_id 和数据帧 #2 中的索引分配一个位置。 For example, for row 0 in dataframe #3, since its user_id = 1, I will check the location in dataframe #2 with index = 1, then assign that location ('MA' in this example) to the user.例如,对于数据帧 #3 中的第 0 行,由于其 user_id = 1,我将检查数据帧 #2 中索引 = 1 的位置,然后将该位置(在本例中为“MA”)分配给用户。

I've search for examples that use functions such as concat, map, merge, but couldn't find similar examples to this case.我搜索了使用 concat、map、merge 等函数的示例,但找不到与此案例类似的示例。 Is there a way to achieve this task?有没有办法完成这个任务?

Thank you so much!非常感谢!

Try map :尝试地图

df["location"] = df1.user_id.map(df2.location)

    user_id animals location
0      1    'dog'   'MA'
1      1    'cat'   'MA'
2      1    'cow'   'MA'
3      2    'dog'   'AZ'
4      2    'cat'   'AZ'
5      2    'cow'   'AZ'

Based on your question, I've had to make the assumption that you want to drop column_D after the merge.根据您的问题,我不得不假设您要在合并后删除 column_D。 The following code works.以下代码有效。

#Import Pandas
import pandas as pd

#Create dataframes: df1 and df2
df1 = pd.DataFrame({'user_id':{0: 1, 1:1, 2:1, 3:2, 4:2, 5:2}, 'animal':{0:'dog', 1:'cat', 2:'cow', 3:'dog', 4:'cat', 5:'cow'}})
df2 = pd.DataFrame({'location':{0:'CA', 1:'MA', 2:'AZ', 3:'CT'}, 'column_D':{0:1, 1:1, 2:1, 3:1}})
    
#Combine matching items from df2 to df1 by focus on 'user_id'column and drop column_D
df3 = df1.join(df2, on='user_id').drop('column_D', 1)

#Display new dataframe
df3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM