简体   繁体   中英

How to calculate percentage of column using pandas pivot_table()

I am attempting to get the frequency of objects per version expressed as a percentage.

Input dfsof = pd.read_clipboard()

file     version     object
path1    1.0         name
path1    1.0         session
path1    1.0         sequence
path2    2.01        name
path2    2.01        session
path2    2.01        sequence
path3    2.01        name
path3    2.01        session
path3    2.01        earthworm

Using the following, I am able to get frequency of each file .

dfsof.pivot_table(index=['object'], values=['file'], columns=['version'], aggfunc=len, fill_value=0, margins=True)

             file
version      1.0    2.01    All
object          
 earthworm   0      1       1
 name        1      2       3
 sequence    1      1       2
 session     1      2       3
All          3      6       9

I want to divide each count per object/version by the total number of distinct files for that version. Using the expected return table as an example, earthworms shows up in the input only once for version 2.01 , so I expect 0% for version 1.0 and 50% for version 2.01 since only one of the files has that value.

Using dfsof.groupby('version')['file'].nunique() returns the frequency of files per version, which is the denominator for each of object/version in the table above. What I am struggling with is how to apply the denominator values to the pivot_table. I have seen examples of this using grand totals and subtotals but I can't seem to find nor figure out how to divide by the unique number of files per version. Any help would be greatly appreciated.

version
1.00    1
2.01    2

Expected return

             path
version      1.0    2.01    All
object          
 earthworm   0%     50%     1
 name        100%   100%    3
 sequence    100%   50%     2
 session     100%   100%    3
All          3      6       9

IIUC, you need to aggregate as 'nunique' , then perform some recomputation:

# aggregate using nunique
out = dfsof.pivot_table(index=['object'], values=['file'], columns=['version'],
                        aggfunc='nunique', fill_value=0, margins=True)

# compute and save sum of nunique
All = out.iloc[:-1].sum()
# update values as percentage
out.update(out.iloc[:-1, :-1].div(out.iloc[-1, :-1]).mul(100))

# assign sum of nunique
out.loc['All'] = All

print(out)

Output:

            file           
version      1.0   2.01 All
object                     
earthworm    0.0   50.0   1
name       100.0  100.0   3
sequence   100.0   50.0   2
session    100.0  100.0   3
All          3.0    6.0   9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM