简体   繁体   English

如何强制 pandas dataframe 的第 2 级加起来达到第 1 级?

[英]How to enforce 2nd level of pandas dataframe to add up to 1st level?

I'm trying to do something very similar to this question .我正在尝试做一些与这个问题非常相似的事情。 The difference is that I have a pre-defined rollup value (total_by_metric_A) and values in metric, that may or may not add up to the total_by_metric_A.不同之处在于,我有一个预定义的汇总值 (total_by_metric_A) 和度量值,这些值可能会或可能不会加到 total_by_metric_A。

What i want to do is create something that distributes any "residual" (total_by_metric_A - metric) across the metric values so that the rollup works.我想要做的是创建一些东西,将任何“剩余”(total_by_metric_A - 度量)分布在度量值之间,以便汇总工作。

I have not figured out a way to do this besides looping through and comparing the sum of each metric to the total_by_metric_A value.除了遍历并将每个指标的总和与 total_by_metric_A 值进行比较之外,我还没有想出一种方法来执行此操作。 I am hoping to find a way that is not reliant on looping.我希望找到一种不依赖循环的方法。 Does anyone have any thoughts on this?有没有人对此有任何想法? I have modified the example used in that question here to fit mine.我在这里修改了该问题中使用的示例以适合我的。

import pandas as pd
df=pd.DataFrame({"A":[1,1,2],"B":["a","b","c"],"metric":[4,5,2], "total_by_metric_A": [10, 10, 2]})

output: output:

| A | B | metric | total_by_metric_A|
| 1 | a | 4      | 10               |
| 1 | b | 5      | 10               |
| 2 | c | 2      | 2                |

desired output (forcing a/b to distribute the remaining 1):所需的 output(强制 a/b 分配剩余的 1):

| A | B | metric | total_by_metric_A|
| 1 | a | 4.5    | 10               |
| 1 | b | 5.5    | 10               |
| 2 | c | 2      | 2                |

You only need GroupBy.transform你只需要GroupBy.transform

g = df.groupby('A')['metric']
df['metric'] += (df['total_by_metric_A'].sub(g.transform('sum'))
                                        .div(g.transform('size'))
                )
print(df)

Output Output

   A  B  metric  total_by_metric_A
0  1  a     4.5                 10
1  1  b     5.5                 10
2  2  c     2.0                  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何删除具有特定 1 级和 2 级索引的多行? - How to drop multiple rows with certain 1st level and 2nd level index? 如何在pandas DataFrame中将二级索引更改为二级列? - How to change 2nd level index into 2nd level column in pandas DataFrame? Pandas:来自字典第二级的数据框 - Pandas: Dataframe from the 2nd level of a dictionary 选择 Pandas DataFrame 的第二个 MultiIndex Level 作为索引器 - Selecting the 2nd MultiIndex Level of Pandas DataFrame as an Indexer Python Pandas-使用第一数据从第二数据框中获取位置 - Python Pandas - Get Location from 2nd dataframe using 1st data 如何根据第一级最大值过滤MultiIndex数据帧? - How to filter MultiIndex dataframe based on 1st level max values? Pandas 增加一列表示第1和第2位,根据行值 - Pandas to add a column to indicate the 1st and 2nd places, according to row values pandas 将第一个多索引转换为行索引,将第二个多索引转换为列索引 - pandas transform 1st mutliindex to rowindex and 2nd multiindex to columnindex 将 pandas 数据框列中的每个值与第二个数据框列的所有值相乘并将每个第一个数据框值替换为结果数组 - Multiply each value in a pandas dataframe column with all values of 2nd dataframe column & replace each 1st dataframe value with resulting array 如何打印第 1 名、第 2 名等直至第 5 名? - How do I print 1st place, 2nd place, etc. up to 5th place?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM