简体   繁体   English

如何使用 pivot 表来显示多列之间公共值的百分比?

[英]How to use pivot table to show percent of common values between multiple columns?

I have 6 columns - each representing a different company.我有 6 列 - 每列代表不同的公司。 I then have 600+ rows of either 1's or 0's in the 6 columns that indicates whether or not a part is contracted to each company (ie, column).然后,我在 6 列中有 600 多行 1 或 0,表示零件是否与每家公司签订合同(即列)。 These parts can be shared across all companies (ie, each column can have a value of 1 meaning there is a 100% commonality for that part across all companies).这些部分可以在所有公司之间共享(即,每列的值可以为 1,这意味着该部分在所有公司中具有 100% 的通用性)。

How do I visually represent this in excel?我如何在 excel 中直观地表示这一点? I'm new to pivot tables and essentially want to break down each row and see what the distribution of common parts are between each company.我是 pivot 表的新手,基本上想分解每一行,看看每家公司之间公共部分的分布情况。

I have summed each row ('Total' Column) as well as percent of hits ('1') next to each company.我已经总结了每一行(“总”列)以及每家公司旁边的命中百分比(“1”)。 I want subset by company what common parts are relative to other companies.我希望按公司划分哪些共同部分相对于其他公司。

import pandas as pd
df=pd.DataFrame({'Comp_A':[1,1,1,1,0,1],
                 'Comp_B':[1,1,1,1,1,1],
                 'Comp_C':[1,1,1,1,1,1],
                 'Comp_D':[0,1,1,1,0,1],
                 'Comp_E':[1,0,1,1,0,1],
                 'Comp_F':[1,1,0,1,1,0],
                 'Sum': [df.sum()]})

For each row across the 6 companies - I want to visually represent the amount of 1's and 0's found.对于 6 家公司的每一行 - 我想直观地表示找到的 1 和 0 的数量。 This will tell me there are x amount of parts that are 100% common across all companies, only in Comp_B, C, and D, etc..这将告诉我有 x 数量的零件在所有公司中 100% 通用,仅在 Comp_B、C 和 D 等。

I am open for both Excel or Python.我对 Excel 或 Python 都开放。

Sample DataFrame样品 DataFrame

import pandas as pd
df=pd.DataFrame({'Comp_A':[1,1,1,1,0,1],
                 'Comp_B':[1,1,1,1,1,1],
                 'Comp_C':[1,1,1,1,1,1],
                 'Comp_D':[0,1,1,1,0,1],
                 'Comp_E':[1,0,1,1,0,1],
                 'Comp_F':[1,1,0,1,1,0],
                 })
print(df)

   Comp_A  Comp_B  Comp_C  Comp_D  Comp_E  Comp_F
0       1       1       1       0       1       1
1       1       1       1       1       0       1
2       1       1       1       1       1       0
3       1       1       1       1       1       1
4       0       1       1       0       0       1
5       1       1       1       1       1       0

Using DataFrame.apply + Series.value_counts :使用DataFrame.apply + Series.value_counts

count_df=df.apply(lambda x: x.value_counts(),axis=1).fillna(0)
print(count_df)

     0    1
0  1.0  5.0
1  1.0  5.0
2  1.0  5.0
3  0.0  6.0
4  3.0  3.0
5  1.0  5.0

import matplotlib.pyplot as plt
%matplotlib inline
count_df.plot(kind='bar')

Output image: Output 图像:

在此处输入图像描述


as you can see row 3 is common to all companies如您所见,第 3 行对所有公司都是通用的


Percentages:百分比:

percentages_comun=(df.sum(axis=1)/len(df.columns))*100
print(percentages_comun)
0     83.333333
1     83.333333
2     83.333333
3    100.000000
4     50.000000
5     83.333333
dtype: float64

percentages_comun.plot(kind='bar')

Output image: Output 图像:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM