根据另一个数据框中的数据将值分配给pandas列

Question

I have 2 dataframes 我有2个数据框

df1
ID ID2 NUMBER
1 2 null

df2
ID ID2 NUMBER 
1 2 1
1 2 2
1 2 3

So when doing merge between df1 and df2 usin ID and ID2 I get duplicated columns because df1 has 3 matches in df2. 因此，当在ID1和ID2中进行df1和df2合并时，我会得到重复的列，因为df1在df2中有3个匹配项。 I'd like to assign a random number to df1 and use it for merging, this way I always get 1 to 1 merge. 我想为df1分配一个随机数，然后将其用于合并，这样我总是得到1对1的合并。 The problem is that my dataset is rather big and sometimes I have only 1 row in df2 (so merge works properly) and sometimes I have 10+ rows in df2. 问题是我的数据集很大，有时df2中只有1行（因此合并工作正常），有时df2中有10+行。 I'd like to assign a number to df1 using: 我想使用以下方式为df1分配一个数字：

rand(1,len(df1[(df1.ID=1) & (df1.ID2=2]))

Answer 1

I think I found a solution I'm posting it here so others can tell me if there is a better way. 我想我找到了一个解决方案，可以在这里发布，这样其他人可以告诉我是否有更好的方法。

def select_random_row(grp):
    ID= grp.ID.iloc[0]
    ID2= grp.ID2.iloc[0] 
    return random.randint(1, len(df1[(df1.ID== ID) & (df1.ID2 == ID2)]))

df2['g'] = df2.groupby(['ID','ID2']).apply(select_random_row)

EDIT: This is way to slow to do on large dataset... I decided to just use drop_duplicates before merging and keep 1st record. 编辑：这是对大型数据集进行处理的方法...我决定只在合并前使用drop_duplicates并保持第1条记录。 It isn't randomly like I wanted but it is better than nothing 这不是我想要的那样随意，但总比没有好

根据另一个数据框中的数据将值分配给pandas列

问题描述

1 个解决方案

解决方案1
0 2018-07-18 08:08:32

根据另一个数据框中的数据将值分配给pandas列

问题描述

1 个解决方案

解决方案1 0 2018-07-18 08:08:32

解决方案1
0 2018-07-18 08:08:32