简体   繁体   English

熊猫:根据另一个数据框中的值在数据框中添加新列

[英]Pandas: Add a new column in a data frame based on a value in another data frame

I have two data frames one with a userId, and gender and another data frame that has online activities of these users. 我有两个数据框,其中一个具有userId和gender,另一个数据框具有这些用户的在线活动。

First Data Frame (df1) 第一个数据帧(df1)

userId, gender
001, F
002, M
003, F
004, M
005, M
006, M

Second data frame (df2) 第二个数据帧(df2)

userId, itemClicked, ItemBought, date
001, 123182, 123212, 02/02/2016
003, 234256, 123182, 05/02/2016
005, 986834, 234256, 04/19/2016
004, 787663, 787663, 05/12/2016
020, 465738, 465738, 03/20/2016
004, 787223, 787663, 07/12/2016

I want to add gender column to the second data frame by looking up the first data frame based on the userId. 我想通过基于userId查找第一个数据框将性别列添加到第二个数据框。 df2 might have multiple rows per user since its a click data where same user may have click multiple items. df2每个用户可能有多行,因为df2是一个点击数据,同一用户可能有多个项目。

This is very easy to do in MySql but I am trying to figure out to do it in pandas. 这在MySql中很容易做到,但是我试图在熊猫中做到这一点。

for index, row in df2.iterrows():
    user_id = row['userId']
    if user_id in df1['userId']:
        t = df1.loc[df1['userId'] == user_id]
        pdb.set_trace()

Is this the pandas way to so such a task? 这是熊猫的方式吗?

print (df1)
   userId gender
0       1      F
1       2      M
2       3      F
3       4      M
4       5      M
5       6      M

print (df2)
   userId  itemClicked  ItemBought        date
0       1       123182      123212  02/02/2016
1       3       234256      123182  05/02/2016
2       5       986834      234256  04/19/2016
3       4       787663      787663  05/12/2016
4      20       465738      465738  03/20/2016
5       4       787223      787663  07/12/2016

You can use map : 您可以使用map

df2['gender'] = df2.userId.map(df1.set_index('userId')['gender'].to_dict())

print (df2)
   userId  itemClicked  ItemBought        date gender
0       1       123182      123212  02/02/2016      F
1       3       234256      123182  05/02/2016      F
2       5       986834      234256  04/19/2016      M
3       4       787663      787663  05/12/2016      M
4      20       465738      465738  03/20/2016    NaN
5       4       787223      787663  07/12/2016      M

Another solution with merge and left join, parameter on can be omit if only column gender is same in both DataFrames: 用另一种解决方案merge和左的连接,参数on可以忽略,如果只列gender在两个DataFrames相同:

df = pd.merge(df2, df1, how='left')

print (df)
   userId  itemClicked  ItemBought        date gender
0       1       123182      123212  02/02/2016      F
1       3       234256      123182  05/02/2016      F
2       5       986834      234256  04/19/2016      M
3       4       787663      787663  05/12/2016      M
4      20       465738      465738  03/20/2016    NaN
5       4       787223      787663  07/12/2016      M

Timings : 时间

#len(df2) = 600k
df2 = pd.concat([df2]*100000).reset_index(drop=True)

def f(df1,df2):
    df2['gender'] = df2.userId.map(df1.set_index('userId')['gender'].to_dict())
    return df2


In [43]: %timeit f(df1,df2)
10 loops, best of 3: 34.2 ms per loop

In [44]: %timeit (pd.merge(df2, df1, how='left'))
10 loops, best of 3: 102 ms per loop

如果user_id是索引,则可以使用:

df2.join(df1)

You can try this: 您可以尝试以下方法:

for index, row in df1.iterrows():
   for ind,r in df2.iterrows():
      if r['userId']==row['userId']:
         df2.set_value(ind,'Gender',row['gender'])
         break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一列的值向python pandas数据框添加一列 - Adding a column to a python pandas data frame based on the value of another column 根据数据框中另一列的值创建新列 - Create new column based on a value of another column in a data-frame Pandas 将列添加到关联字符串值的新数据帧? - Pandas add column to new data frame at associated string value? Pandas:比较数据框的列并根据条件添加新列和值 - Pandas : Compare the columns of a data frame and add a new column & value based on a condition 在熊猫数据框中添加新列 - Add new column in pandas data frame 在带有类别的 Pandas 数据框中添加新列 - add new column in Pandas Data frame with categories 获取 Pandas 数据框中列的值,将数字添加到值并另存为新数据框 - Take values of a column in a Pandas data frame, add numbers to value and save as new data frame 如果值使用pandas落在另一个数据框的范围内,则从另一个数据框添加列 - add column from another data frame if the value falls under the range from the other data frame using pandas “是否存在一个熊猫函数,用于基于数据帧的另一列的某些值添加新列?” - “Is there an pandas function for adding a new column based on certain values of another column of the data frame?” 根据 pandas 数据框中另一列中的条件对一列求和 - Summing a column based on a condition in another column in a pandas data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM