简体   繁体   English

获取正值的总和和负值的总和

[英]Get sum of positive values and sum of negative values

I have a DataFrame as follows.我有一个 DataFrame 如下。

d = {}
d['Model'] = ['M1', 'M2'] * 4 * 3 * 5
d['Support'] = ['S1', 'S1', 'S2', 'S2'] * 2 * 3 * 5
d['Group'] = ['G1', 'G2', 'G2'] * 2 * 4 * 5
d['Case'] = ['C11', 'C21', 'C22', 'C31', 'C32'] * 2 * 4 * 3
val1 = []
val2 = []
random.seed(0)
for i in range (2*4*3*5):
    val1.append(random.randrange(-10, 11))
    val2.append(random.randrange(-10, 11))
d['val1'] = val1
d['val2'] = val2
df = pd.DataFrame(d)

在此处输入图像描述

I am looking for the maximum value of the sum of the positive values for each Support, Group and Case and with reference to the Model and the sum of the minimum values of the sum of the negative values.我正在寻找每个支持、组和案例的正值之和的最大值,并参考 Model 和负值之和的最小值之和。

This is my attempt:这是我的尝试:

df1 = df.groupby(['Model', 'Support', 'Group', 'Case'])[['val1', 'val2']].sum()

df2 = df1.groupby(['Model', 'Support', 'Group', 'Case']).agg([
    ('max' , lambda x : x[x > 0].sum()),
    ('min' , lambda x : x[x < 0].sum())
])

df3 = df2.groupby(['Support', 'Group', 'Case']).agg([
    ('max' , max),
    ('Model', lambda x: x.idxmax()[0]),
    ('min' , min),
    ('Model', lambda x: x.idxmin()[0]),
])

(See edition of 14 Jan 2021) (见 2021 年 1 月 14 日版本)

在此处输入图像描述

The results of the last DataFrame df3 are fine, but this is not the output format I want.最后一个DataFrame df3的结果还可以,但这不是我想要的output格式。 I need to filter the last DataFrame df3 to get the results this way:我需要过滤最后一个 DataFrame df3以通过这种方式获得结果:

在此处输入图像描述

Edited 14 Jan 2021 2021 年 1 月 14 日编辑

Reviewing the @aneroid results I have seen that the df3 values are not as expected.查看@aneroid 结果,我发现 df3 值与预期不符。

For Support S1 , Group G1 , Case C11 this will be the result:对于 Support S1 , Group G1 , Case C11这将是结果:

df_M1S1G1_1 = df.loc[df['Model'] == 'M1']
df_M1S1G1_2 = df_M1S1G1_1.loc[df['Support'] == 'S1']
df_M1S1G1_3 = df_M1S1G1_2.loc[df['Group'] == 'G1']

    Model   Support Group   Case    val1    val2
0   M1      S1      G1      C11     2       3
12  M1      S1      G1      C22     -8      0
24  M1      S1      G1      C32     5       0
36  M1      S1      G1      C21     7       -4
48  M1      S1      G1      C31     -6      -9
60  M1      S1      G1      C11     10      0
72  M1      S1      G1      C22     10      -4
84  M1      S1      G1      C32     -8      -3
96  M1      S1      G1      C21     -5      0
108 M1      S1      G1      C31     -9      7

df_M2S1G1_1 = df.loc[df['Model'] == 'M2']
df_M2S1G1_2 = df_M2S1G1_1.loc[df['Support'] == 'S1']
df_M2S1G1_3 = df_M2S1G1_2.loc[df['Group'] == 'G1']

    Model   Support Group   Case    val1    val2
9   M2      S1      G1      C32     -2      7
21  M2      S1      G1      C21     -10     -8
33  M2      S1      G1      C31     -1      7
45  M2      S1      G1      C11     9       -2
57  M2      S1      G1      C22     -8      0
69  M2      S1      G1      C32     10      7
81  M2      S1      G1      C21     -10     7
93  M2      S1      G1      C31     5       8
105 M2      S1      G1      C11     -6      7
117 M2      S1      G1      C22     -8      -10

Therefore:所以:

0  M1 S1 G1 C11 val1 = 2  val2 = 3
60 M1 S1 G1 C11 val1 = 10 val2 = 0

val1_sum_pos = 12
val1_sum_neg = 0

val2_sum_pos = 3
val2_sum_neg = 0

45  M2 S1 G1 C11 val1 = 9  val2 = -2
105 M2 S1 G1 C11 val1 = -6 val2 = 7

val1_sum_pos = 9
val1_sum_neg = -6

val2_sum_pos = 7
val2_sum_neg = -2

And as a result:结果:

                                                      val1                     
     val2
                            max  Model_max  min  Model_min  max  Model_max  min  Model_min
    Support  Group  Case    
    S1       G1     C11     12   M1         -6   M2         7     M2        -2      
M2

These results are in line with the solution proposed by @aneroid.这些结果与@aneroid 提出的解决方案一致。

Here's a different approach from my previous answer .这是与我之前的回答不同的方法。 Requires a little more "prep-code" but fewer steps to massage the dataframe afterwards.之后需要更多的“准备代码”但更少的步骤来按摩 dataframe。

As before, this is based on starting at your step2 (skipping step1):和以前一样,这是基于从您的第 2 步开始(跳过第 1 步):

df1 = df.groupby(['Model', 'Support', 'Group', 'Case']).agg([
    ('sum_pos', lambda x: x[x > 0].sum()),
    ('sum_neg', lambda x: x[x < 0].sum())
])

First, a template dataframe for the results:首先,模板 dataframe 为结果:

res = pd.DataFrame(
    columns=pd.MultiIndex.from_product(
        [('val1', 'val2'),
         ('sum_pos_max', 'Model_max', 'sum_neg_min', 'Model_min'),
]))

This function will be used for apply (again), with the difference that we'll first create an empty DataFrame with the column structure needed.这个 function 将(再次)用于应用,不同之处在于我们将首先创建一个具有所需列结构的空 DataFrame。 (And optionally, can be modified to use with a for group in grouped loop, without using apply, where each record is appended to it.) (并且可选地,可以修改for group in grouped一起使用,而不使用 apply,其中每条记录都附加到它。)

def model_minmax_opt(gr):
    # start with a deep copy of the result template
    tf = res.copy(deep=True)
    for val in ['val1', 'val2']:
        max_pos = gr[(val, 'sum_pos')].idxmax()
        min_neg = gr[(val, 'sum_neg')].idxmin()
        tf.loc[0, [(val, 'sum_pos_max'), (val, 'Model_max')]] = gr.loc[max_pos, [(val, 'sum_pos'), ('Model', '')]].values
        tf.loc[0, [(val, 'sum_neg_min'), (val, 'Model_min')]] = gr.loc[min_neg, [(val, 'sum_neg'), ('Model', '')]].values
    return tf

Then create a grouping and apply it:然后创建一个分组并应用它:

group = df1.reset_index().groupby(['Support', 'Group', 'Case'])
df2 = group.apply(model_minmax_opt)
df2.reset_index(level=3, drop=True, inplace=True)  # get rid of the added 0-index

Result df2 , same as the other one, with better column ordering:结果df2 ,与另一个相同,具有更好的列排序:

                                                          val1                                        val2
                   sum_pos_max Model_max sum_neg_min Model_min sum_pos_max Model_max sum_neg_min Model_min
Support Group Case                                                                                        
S1      G1    C11           12        M1          -6        M2           7        M2          -2        M2
              C21            7        M1         -20        M2           7        M2          -8        M2
              C22           10        M1         -16        M2           0        M1         -10        M2
              C31            5        M2         -15        M1          15        M2          -9        M1
              C32           10        M2          -8        M1          14        M2          -3        M1
        G2    C11            9        M1         -23        M1          14        M1          -9        M1
              C21           22        M1         -13        M2          21        M1          -9        M1
              C22           24        M2          -6        M1          14        M2         -12        M2
              C31           30        M2         -28        M1          28        M2          -6        M1
              C32           11        M2         -10        M2          11        M2         -16        M1
S2      G1    C11            9        M1          -8        M1           1        M2         -15        M1
              C21            6        M1          -9        M1          15        M2         -13        M1
              C22           11        M2          -3        M1           0        M1          -8        M2
              C31            5        M1          -7        M2           4        M1          -5        M1
              C32           17        M1           0        M1           0        M1         -13        M2
        G2    C11           20        M1         -10        M2          11        M1         -14        M1
              C21            5        M1         -19        M1           6        M2         -18        M2
              C22           11        M1         -23        M2          24        M1         -15        M2
              C31           26        M1         -22        M2          10        M2         -12        M1
              C32           11        M2         -20        M1          13        M2         -11        M1

Here's another two solutions, which are more comparable to your original code and final dataframe in your post.这是另外两个解决方案,它们与您的原始代码和您帖子中的最终 dataframe 更具可比性。

The first step has a slight modification to the order of the groupby .第一步对groupby的顺序稍作修改。 The minmax() function determines whether it's been given a column of positive or negative sums; minmax() function 确定它是否被赋予一列正数或负数; and applies idxmin/idxmax as required.并根据需要应用idxmin/idxmax And whether the return should be the value or the Model , is passed in as a parameter ( y ).并且返回应该是还是Model ,作为参数( y )传入。 Using agg can generate additional columns under the existing column index, as another level.使用agg可以在现有列索引下生成额外的列,作为另一个级别。 This gives the same format as the one in your post.这提供了与您帖子中的格式相同的格式。

df1 = df.groupby(['Support', 'Group', 'Case', 'Model']).agg([
    ('sum_pos', lambda x: x[x > 0].sum()),
    ('sum_neg', lambda x: x[x < 0].sum())
])

# Model is the last column of the index, so `-1`
def minmax(x, y):
    """x is the series, y='val' or 'Model'"""
    min = x.min()
    if min >= 0:  # or check x.name
        # postive column, return max's; or both vals are 0
        max_pos = x.idxmax()
        return x[max_pos] if y == 'val' else max_pos[-1]
    # negative column, return min's
    return min if y == 'val' else x.idxmin()[-1]

# use `minmax` to generate columns in an aggregate:
df1.groupby(['Support', 'Group', 'Case']).agg([
    ('val', lambda x: minmax(x, 'val')),
    ('Model', lambda x: minmax(x, 'Model')),
])

Result is the same as my previous two answers , with an identical structure to your post (3-level-index columns):结果与我之前两个答案相同,与您的帖子具有相同的结构(3 级索引列):

                                          val1                        val2
                         sum_pos       sum_neg       sum_pos       sum_neg
                       val Model     val Model     val Model     val Model
Support Group Case                                                        
S1      G1    C11       12    M1      -6    M2       7    M2      -2    M2
              C21        7    M1     -20    M2       7    M2      -8    M2
              C22       10    M1     -16    M2       0    M1     -10    M2
              C31        5    M2     -15    M1      15    M2      -9    M1
              C32       10    M2      -8    M1      14    M2      -3    M1
        G2    C11        9    M1     -23    M1      14    M1      -9    M1
              C21       22    M1     -13    M2      21    M1      -9    M1
              C22       24    M2      -6    M1      14    M2     -12    M2
              C31       30    M2     -28    M1      28    M2      -6    M1
              C32       11    M2     -10    M2      11    M2     -16    M1
S2      G1    C11        9    M1      -8    M1       1    M2     -15    M1
              C21        6    M1      -9    M1      15    M2     -13    M1
              C22       11    M2      -3    M1       0    M1      -8    M2
              C31        5    M1      -7    M2       4    M1      -5    M1
              C32       17    M1       0    M1       0    M1     -13    M2
        G2    C11       20    M1     -10    M2      11    M1     -14    M1
              C21        5    M1     -19    M1       6    M2     -18    M2
              C22       11    M1     -23    M2      24    M1     -15    M2
              C31       26    M1     -22    M2      10    M2     -12    M1
              C32       11    M2     -20    M1      13    M2     -11    M1

An even shorter solution but generates a tuple of (min/max val, Model) instead of having them in different columns:一个更短的解决方案,但会生成一个(min/max val, Model)的元组,而不是将它们放在不同的列中:

df1 = df.groupby(['Support', 'Group', 'Case', 'Model']).agg([
    ('sum_pos', lambda x: x[x > 0].sum()),
    ('sum_neg', lambda x: x[x < 0].sum())
])

df1.groupby(['Support', 'Group', 'Case']).agg({
    ('val1', 'sum_pos'): lambda x: (x.max(), x.idxmax()[-1]),
    ('val1', 'sum_neg'): lambda x: (x.min(), x.idxmin()[-1]),
    ('val2', 'sum_pos'): lambda x: (x.max(), x.idxmax()[-1]),
    ('val2', 'sum_neg'): lambda x: (x.min(), x.idxmin()[-1]),
})

Result sample:结果样本:

                                   val1                 val2
                     sum_pos    sum_neg   sum_pos    sum_neg
Support Group Case                                          
S1      G1    C11   (12, M1)   (-6, M2)   (7, M2)   (-2, M2)
              C21    (7, M1)  (-20, M2)   (7, M2)   (-8, M2)
              C22   (10, M1)  (-16, M2)   (0, M1)  (-10, M2)
              C31    (5, M2)  (-15, M1)  (15, M2)   (-9, M1)
...

You've said "looking for the maximum value of the sum of the positive values" and "sum of the minimum values of the sum of the negative values" but then in your first step, you've applied sum() to the entire groupby without distinguishing between +ve or -ve values.您已经说过“寻找正值之和的最大值”“负值之和的最小值之和”,但是在第一步中,您已将sum()应用于整个groupby 不区分 +ve 或 -ve 值。 To me, your second step df2 = df1.groupby(...).agg(...) should actually be your first step.对我来说,您的第二步df2 = df1.groupby(...).agg(...)实际上应该是您的第一步。

Setting up df as per your original code, then:根据您的原始代码设置df ,然后:

# doing your 2nd step as the first step
df1 = df.groupby(['Model', 'Support', 'Group', 'Case']).agg([
    ('sum_pos', lambda x: x[x > 0].sum()),
    ('sum_neg', lambda x: x[x < 0].sum())
])

# Btw, some sum's are `0` in `df1`

# stacking `val1` and `val2` into a column
df2 = df1.stack(level=0)
df2.index.names = df2.index.names[:-1] + ['val']

Create a function which will calculate the model associated with the "max of positive" and "min of negative" values, per val and include that value in the return:创建一个 function ,它将计算 model 与每个val的“正数最大值”和“负数最小值”相关联,并将该值包含在返回中:

def model_minmax(gr):
    """modifies both rows of the group object passed in"""
    gr[['sum_pos_max', 'Model_max']] = gr.loc[gr['sum_pos'].idxmax(), ['sum_pos', 'Model']]
    gr[['sum_neg_min', 'Model_min']] = gr.loc[gr['sum_neg'].idxmin(), ['sum_neg', 'Model']]
    return gr

Create the groups and apply() the function above:创建组并apply()上面的 function:

group = df2.reset_index().groupby(['Support', 'Group', 'Case', 'val'])
df3 = group.apply(model_minmax)
# sort values and drop every alternate row
df3 = df3.sort_values(['Support', 'Group', 'Case', 'val', 'Model'])[::2]

What df3.head(4) looks like:什么df3.head(4)看起来像:

   Model Support Group Case   val  sum_neg  sum_pos  sum_pos_max Model_max  sum_neg_min Model_min
0     M1      S1    G1  C11  val1        0       12           12        M1           -6        M2
1     M1      S1    G1  C11  val2        0        3            7        M2           -2        M2
2     M1      S1    G1  C21  val1       -5        7            7        M1          -20        M2
3     M1      S1    G1  C21  val2       -4        0            7        M2           -8        M2

Drop the 'Model', sum_neg', and 'sum_pos' columns and some steps to get the data in a similar format to the one you have:删除 'Model'、sum_neg' 和 'sum_pos' 列以及一些步骤以获取与您所拥有的格式相似的数据:

df3.drop(['Model', 'sum_neg', 'sum_pos'], axis=1, inplace=True)

df4 = df3.pivot(index=['Support', 'Group', 'Case'],
                columns='val',
                values=['sum_pos_max', 'Model_max', 'sum_neg_min', 'Model_min'],
)
df4 = df4.swaplevel(0, 1, axis=1).sort_index(axis=1, level=0)

Result:结果:

               val                                        val1                                        val2
                   Model_max Model_min sum_neg_min sum_pos_max Model_max Model_min sum_neg_min sum_pos_max
Support Group Case                                                                                        
S1      G1    C11         M1        M2          -6          12        M2        M2          -2           7
              C21         M1        M2         -20           7        M2        M2          -8           7
              C22         M1        M2         -16          10        M1        M2         -10           0
              C31         M2        M1         -15           5        M2        M1          -9          15
              C32         M2        M1          -8          10        M2        M1          -3          14
        G2    C11         M1        M1         -23           9        M1        M1          -9          14
              C21         M1        M2         -13          22        M1        M1          -9          21
              C22         M2        M1          -6          24        M2        M2         -12          14
              C31         M2        M1         -28          30        M2        M1          -6          28
              C32         M2        M2         -10          11        M2        M1         -16          11
S2      G1    C11         M1        M1          -8           9        M2        M1         -15           1
              C21         M1        M1          -9           6        M2        M1         -13          15
              C22         M2        M1          -3          11        M1        M2          -8           0
              C31         M1        M2          -7           5        M1        M1          -5           4
              C32         M1        M1           0          17        M1        M2         -13           0
        G2    C11         M1        M2         -10          20        M1        M1         -14          11
              C21         M1        M1         -19           5        M2        M2         -18           6
              C22         M1        M2         -23          11        M1        M2         -15          24
              C31         M1        M2         -22          26        M2        M1         -12          10
              C32         M2        M1         -20          11        M2        M1         -11          13

Note that these values will differ from yours due to your first step, as mentioned above.请注意,如上所述,由于您的第一步,这些值将与您的不同。 If you're sure those are correct, it can be added before the one's I've given.如果您确定这些是正确的,可以在我给出的之前添加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM