简体   繁体   中英

How to use pivot table to show percent of common values between multiple columns?

I have 6 columns - each representing a different company. I then have 600+ rows of either 1's or 0's in the 6 columns that indicates whether or not a part is contracted to each company (ie, column). These parts can be shared across all companies (ie, each column can have a value of 1 meaning there is a 100% commonality for that part across all companies).

How do I visually represent this in excel? I'm new to pivot tables and essentially want to break down each row and see what the distribution of common parts are between each company.

I have summed each row ('Total' Column) as well as percent of hits ('1') next to each company. I want subset by company what common parts are relative to other companies.

import pandas as pd
df=pd.DataFrame({'Comp_A':[1,1,1,1,0,1],
                 'Comp_B':[1,1,1,1,1,1],
                 'Comp_C':[1,1,1,1,1,1],
                 'Comp_D':[0,1,1,1,0,1],
                 'Comp_E':[1,0,1,1,0,1],
                 'Comp_F':[1,1,0,1,1,0],
                 'Sum': [df.sum()]})

For each row across the 6 companies - I want to visually represent the amount of 1's and 0's found. This will tell me there are x amount of parts that are 100% common across all companies, only in Comp_B, C, and D, etc..

I am open for both Excel or Python.

Sample DataFrame

import pandas as pd
df=pd.DataFrame({'Comp_A':[1,1,1,1,0,1],
                 'Comp_B':[1,1,1,1,1,1],
                 'Comp_C':[1,1,1,1,1,1],
                 'Comp_D':[0,1,1,1,0,1],
                 'Comp_E':[1,0,1,1,0,1],
                 'Comp_F':[1,1,0,1,1,0],
                 })
print(df)

   Comp_A  Comp_B  Comp_C  Comp_D  Comp_E  Comp_F
0       1       1       1       0       1       1
1       1       1       1       1       0       1
2       1       1       1       1       1       0
3       1       1       1       1       1       1
4       0       1       1       0       0       1
5       1       1       1       1       1       0

Using DataFrame.apply + Series.value_counts :

count_df=df.apply(lambda x: x.value_counts(),axis=1).fillna(0)
print(count_df)

     0    1
0  1.0  5.0
1  1.0  5.0
2  1.0  5.0
3  0.0  6.0
4  3.0  3.0
5  1.0  5.0

import matplotlib.pyplot as plt
%matplotlib inline
count_df.plot(kind='bar')

Output image:

在此处输入图像描述


as you can see row 3 is common to all companies


Percentages:

percentages_comun=(df.sum(axis=1)/len(df.columns))*100
print(percentages_comun)
0     83.333333
1     83.333333
2     83.333333
3    100.000000
4     50.000000
5     83.333333
dtype: float64

percentages_comun.plot(kind='bar')

Output image:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM