简体   繁体   English

根据另一个数据框中的数据将值分配给pandas列

[英]assign value to pandas column based on data in another dataframe

I have 2 dataframes 我有2个数据框

df1
ID ID2 NUMBER
1 2 null

df2
ID ID2 NUMBER 
1 2 1
1 2 2
1 2 3

So when doing merge between df1 and df2 usin ID and ID2 I get duplicated columns because df1 has 3 matches in df2. 因此,当在ID1和ID2中进行df1和df2合并时,我会得到重复的列,因为df1在df2中有3个匹配项。 I'd like to assign a random number to df1 and use it for merging, this way I always get 1 to 1 merge. 我想为df1分配一个随机数,然后将其用于合并,这样我总是得到1对1的合并。 The problem is that my dataset is rather big and sometimes I have only 1 row in df2 (so merge works properly) and sometimes I have 10+ rows in df2. 问题是我的数据集很大,有时df2中只有1行(因此合并工作正常),有时df2中有10+行。 I'd like to assign a number to df1 using: 我想使用以下方式为df1分配一个数字:

rand(1,len(df1[(df1.ID=1) & (df1.ID2=2]))

I think I found a solution I'm posting it here so others can tell me if there is a better way. 我想我找到了一个解决方案,可以在这里发布,这样其他人可以告诉我是否有更好的方法。

def select_random_row(grp):
    ID= grp.ID.iloc[0]
    ID2= grp.ID2.iloc[0] 
    return random.randint(1, len(df1[(df1.ID== ID) & (df1.ID2 == ID2)]))

df2['g'] = df2.groupby(['ID','ID2']).apply(select_random_row)

EDIT: This is way to slow to do on large dataset... I decided to just use drop_duplicates before merging and keep 1st record. 编辑:这是对大型数据集进行处理的方法...我决定只在合并前使用drop_duplicates并保持第1条记录。 It isn't randomly like I wanted but it is better than nothing 这不是我想要的那样随意,但总比没有好

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe:对于给定的行,尝试基于在另一列中查找值来分配特定列中的值 - Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in another column Pandas:在 dataframe 中创建列,并通过查看另一个 dataframe 为该列分配值 - Pandas: Create column in dataframe and assign value to the column by looking into another dataframe 熊猫矢量化根据日期分配列值,给定另一个具有值和开始日期的数据框 - Pandas vectorization to assign column value based on date, given another dataframe with value and start date 根据来自另一个数据框的数据将值分配给Pandas数据框中的列 - Assign values to columns in Pandas Dataframe based on data from another dataframe 根据pandas数据框中的多个条件将值分配给列 - Assign value to column based on multiple condition in pandas dataframe 根据字符串条件将值分配给pandas dataframe列 - Assign value to a pandas dataframe column based on string condition 如何根据不同的条件为 pandas dataframe 中的特定列赋值? - How to assign value to particular column in pandas dataframe based on different conditions? 根据同一pandas数据框中的其他列为列分配值 - Assign value to a column based of other columns from the same pandas dataframe 根据 pandas 中的条件将一列值分配给另一列 - assign one column value to another column based on condition in pandas Pandas:如何根据另一列将一列值分配给变量? - Pandas: How to assign one column value to a variable, based on another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM