简体   繁体   English

如何通过对另一个 dataframe 的 groupby 查询为每个组分配一个值?

[英]How to assign a value to each group by a groupby query on another dataframe?

Let the following dfs:让以下dfs:

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    "k1": [1, 1, 2, 2, 3, 3, 4, 4, 4],
})

df2 = pd.DataFrame({
    "k2": [1, 1, 2, 2, 3, 4, 4],
    "v2": np.random.rand(7)
})

print(df1)
print("_______")
print(df2)
print("_______")

out:出去:

   k1
0   1
1   1
2   2
3   2
4   3
5   3
6   4
7   4
8   4
_______
   k2        v2
0   1  0.260026
1   1  0.474951
2   2  0.695962
3   2  0.158575
4   3  0.396015
5   4  0.740344
6   4  0.293410
_______

I want to create a new column for df1 such that for every key k1 , a corresponding value will be applied such that if k1 == k2 , the value will be a function (say max) of v2 of the group in df2 whose key is k2 ( k1 ).我想为df1创建一个新列,以便对于每个键k1 ,将应用相应的值,这样如果k1 == k2 ,则该值将是df2中组的v2的 function (例如最大值),其键是k2 ( k1 )。

Required output for above case:上述案例所需的 output:

   k1  result
0   1  0.474951
1   1  0.474951
2   2  0.695962
3   2  0.695962
4   3  0.396015
5   3  0.396015
6   4  0.740344
7   4  0.740344
8   4  0.740344

It can be assumed that all keys present in k1 are also in k2 .可以假设k1中存在的所有键也在k2中。


This is probably done with two groupby operations, one for query and one for assignment, but I can't figure out how to tie together the output of one to the input of the other.这可能是通过两个 groupby 操作完成的,一个用于查询,一个用于分配,但我不知道如何将一个的 output 与另一个的输入联系在一起。


Edit:编辑:
Please notice the example k1 and k2 are sorted for clarity, but are not guaranteed to be.请注意示例k1k2为清楚起见进行了排序,但不保证如此。 I also don't want to sort because of o(nlogn) time, and this can be done in o(n)由于o(nlogn)时间,我也不想排序,这可以在o(n)中完成

We can try map and groupby我们可以试试mapgroupby

df1['result'] = df1['k1'].map(df2.groupby('k2')['v2'].max())

   k1    result
0   1  0.474951
1   1  0.474951
2   2  0.695962
3   2  0.695962
4   3  0.396015
5   3  0.396015
6   4  0.740344
7   4  0.740344
8   4  0.740344

First, you can sort on k2 and v2 columns in df2 to ensure that the bigger value in column v2 stay on first.首先,您可以对df2中的k2v2列进行排序,以确保列v2中较大的值首先保留。 Then drop duplicates on k2 to keep the first which is the max.然后在k2上删除重复项以保留第一个是最大值。 At last, map v2 column in k2 to df1 .最后,从k2df1中的map v2列。

df1['result'] = df1['k1'].map(df2.sort_values(['k2', 'v2'], ascending=[True, False]).drop_duplicates('k2', keep='first').set_index('k2')['v2'])
print(df1)

   k1        result
0   1  0.303764
1   1  0.303764
2   2  0.026024
3   2  0.026024
4   3  0.213834
5   3  0.213834
6   4  0.757031
7   4  0.757031
8   4  0.757031

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 dataframe 合并到每个 groupby 组? - How to merge dataframe to each groupby group? 将从 id groupby 派生的值分配给 pandas dataframe 的每个 id - assign value dervied from an id groupby to each id of a pandas dataframe 在pandas groupby之后为组中的每个唯一值分配唯一ID - assign unique ID to each unique value in group after pandas groupby 创建一个新列并使用groupby开始为每个组分配值 - Create a new column and assign value for each group starting using groupby 如何为DataFrame的每一行分配组名? - How to assign a group name to each row of a DataFrame? 使用 pandas.DataFrame.groupby 从每组中获取最大值 - Get the max value from each group with pandas.DataFrame.groupby 如何根据 groupby 中每个组的值最后更改的日期自动填充 dataframe 中的数据 - How to autofill data in a dataframe based on the date a value last changed for each group in a groupby Pandas:根据 groupby sum 结果与另一列中的值的比较来修改每组中最后一个单元格的值 - Pandas: Modify the value of last cell in each group based on how the groupby sum result compares to the value in another column Pandas:为 groupby 标识的每个组分配一个索引 - Pandas: assign an index to each group identified by groupby Groupby并将运算结果分配给各组 - Groupby and assign operation result to each group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM