简体   繁体   中英

How to perform Row wise sum based on column condition and add Class Wise specific value as Column?

Cluster Class   Value
0   0   10       1
1   0   11       1
2   0   14       3
3   0   18       1
4   0   26       1
5   0   29       1
6   0   30       1
7   1   0        2
8   1   19       1
9   1   20       1
10  1   21       2
11  1   36       1
12  1   26       1
13  1   27       1
14  1   37       2
15  1   33       1

This table is based on Which class falls under which Cluster. Like Class 10, 11, 14 and so on have fallen into Cluster 0. And Value column indicates how many of class member is there. Like 3 member of Class 14 have fallen into Cluster 0.

Now my desired output is like this:

    Cluster Class   Value   Cluster_Sum

    0   0   10      1               9
    1   0   11      1               9
    2   0   14      3               9
    3   0   18      1               9
    4   0   26      1               9
    5   0   29      1               9
    6   0   30      1               9

Same for other Clusters too. My final aim to make a column 'Precision' which is
df['Precision'] = df['Value']/ df['Cluster_Sum'] for each row.

How can I do that using python?

EDIT :- It works perfectly fine. Thanks for your help.


Ultimately this is My GOAL . For each class it's number is fixed. Like Class 1: 10, Class 2:12.... so on. I need to add a Column like 'Class_Sum. Which consists the data of the total of class. Then I am able to find the Recall by

`df['Recall'] = df['Value']/ df['Class_Sum']`

But my question is how can I append this my information

Class 1     10
Class 2     12
Class 3     23
Class 4     11
Class 5     17
Class 6     13
Class 7     16
Class 8     15
Class 9     14
Class 10    18
Class 11    09
Class 12    07
Class 13    16
Class 14    21
Class 15    17
Class 16    23
Class 17    10
Class 18    21
Class 19    12
Class 20    45
Class 21    12
Class 22    12
Class 23    15
Class 24    11
Class 25    09
Class 26    11
Class 27    08
Class 28    10
Class 29    11
Class 30    19
Class 31    17
Class 32    15
Class 33    12
Class 34    07
Class 35    06
Class 36    14
Class 37    13
Class 38    16

to my Dataframe like this

 Cluster   Class          Class_SUm  Value ClusSum Precision RCll 
          10                  18
          11                  09
          14                  21
          18                  21
          26                  11
          29                  11
          30                  19

How can it be done?

Try with groupby :

df["Cluster_Sum"] = df.groupby("Cluster")["Value"].transform("sum")

>>> df
    Cluster  Class  Value  Cluster_Sum
0         0     10      1            9
1         0     11      1            9
2         0     14      3            9
3         0     18      1            9
4         0     26      1            9
5         0     29      1            9
6         0     30      1            9
7         1      0      2           12
8         1     19      1           12
9         1     20      1           12
10        1     21      2           12
11        1     36      1           12
12        1     26      1           12
13        1     27      1           12
14        1     37      2           12
15        1     33      1           12

groupby + transform("sum") is your friend here:

df['Precision'] = df["Value"] / df.groupby("Cluster")["Value"].transform("sum")

Output:

>>> df
    Cluster  Class  Value  Precision
0         0     10      1   0.111111
1         0     11      1   0.111111
2         0     14      3   0.333333
3         0     18      1   0.111111
4         0     26      1   0.111111
5         0     29      1   0.111111
6         0     30      1   0.111111
7         1      0      2   0.166667
8         1     19      1   0.083333
9         1     20      1   0.083333
10        1     21      2   0.166667
11        1     36      1   0.083333
12        1     26      1   0.083333
13        1     27      1   0.083333
14        1     37      2   0.166667
15        1     33      1   0.083333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM