简体   繁体   中英

How to enforce 2nd level of pandas dataframe to add up to 1st level?

I'm trying to do something very similar to this question . The difference is that I have a pre-defined rollup value (total_by_metric_A) and values in metric, that may or may not add up to the total_by_metric_A.

What i want to do is create something that distributes any "residual" (total_by_metric_A - metric) across the metric values so that the rollup works.

I have not figured out a way to do this besides looping through and comparing the sum of each metric to the total_by_metric_A value. I am hoping to find a way that is not reliant on looping. Does anyone have any thoughts on this? I have modified the example used in that question here to fit mine.

import pandas as pd
df=pd.DataFrame({"A":[1,1,2],"B":["a","b","c"],"metric":[4,5,2], "total_by_metric_A": [10, 10, 2]})

output:

| A | B | metric | total_by_metric_A|
| 1 | a | 4      | 10               |
| 1 | b | 5      | 10               |
| 2 | c | 2      | 2                |

desired output (forcing a/b to distribute the remaining 1):

| A | B | metric | total_by_metric_A|
| 1 | a | 4.5    | 10               |
| 1 | b | 5.5    | 10               |
| 2 | c | 2      | 2                |

You only need GroupBy.transform

g = df.groupby('A')['metric']
df['metric'] += (df['total_by_metric_A'].sub(g.transform('sum'))
                                        .div(g.transform('size'))
                )
print(df)

Output

   A  B  metric  total_by_metric_A
0  1  a     4.5                 10
1  1  b     5.5                 10
2  2  c     2.0                  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM