[英]assign value to pandas column based on data in another dataframe
I have 2 dataframes 我有2个数据框
df1
ID ID2 NUMBER
1 2 null
df2
ID ID2 NUMBER
1 2 1
1 2 2
1 2 3
So when doing merge between df1 and df2 usin ID and ID2 I get duplicated columns because df1 has 3 matches in df2. 因此,当在ID1和ID2中进行df1和df2合并时,我会得到重复的列,因为df1在df2中有3个匹配项。 I'd like to assign a random number to df1 and use it for merging, this way I always get 1 to 1 merge. 我想为df1分配一个随机数,然后将其用于合并,这样我总是得到1对1的合并。 The problem is that my dataset is rather big and sometimes I have only 1 row in df2 (so merge works properly) and sometimes I have 10+ rows in df2. 问题是我的数据集很大,有时df2中只有1行(因此合并工作正常),有时df2中有10+行。 I'd like to assign a number to df1 using: 我想使用以下方式为df1分配一个数字:
rand(1,len(df1[(df1.ID=1) & (df1.ID2=2]))
I think I found a solution I'm posting it here so others can tell me if there is a better way. 我想我找到了一个解决方案,可以在这里发布,这样其他人可以告诉我是否有更好的方法。
def select_random_row(grp):
ID= grp.ID.iloc[0]
ID2= grp.ID2.iloc[0]
return random.randint(1, len(df1[(df1.ID== ID) & (df1.ID2 == ID2)]))
df2['g'] = df2.groupby(['ID','ID2']).apply(select_random_row)
EDIT: This is way to slow to do on large dataset... I decided to just use drop_duplicates before merging and keep 1st record. 编辑:这是对大型数据集进行处理的方法...我决定只在合并前使用drop_duplicates并保持第1条记录。 It isn't randomly like I wanted but it is better than nothing 这不是我想要的那样随意,但总比没有好
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.