计算组内的唯一值，然后将两个值转为类别

Question

我有以下数据集


    Package Document    bool
0   Pkg1    DocumentA   True
1   Pkg1    DocumentA   True
2   Pkg1    DocumentB   True
3   Pkg1    DocumentC   True
4   Pkg2    DocumentA   False
5   Pkg2    DocumentB   True
6   Pkg2    DocumentB   True
7   Pkg2    DocumentC   True
8   Pkg3    DocumentA   False
9   Pkg3    DocumentB   True
10  Pkg3    DocumentD   False
11  Pkg3    DocumentD   True
12  Pkg4    DocumentB   True
13  Pkg4    DocumentC   True
14  Pkg5    DocumentB   False
15  Pkg5    DocumentC   False

我需要计算一个文件被识别的包裹数量。 然后计算它们各自的真假，条件是即使该包类别中有一个假，它也只会是假。 这样 DocumentA 位于 3 个包中，其中一次为 True，两次为 False。 希望这可以帮助

Document Count True False
DocumentA 3 1 2
DocumentB 5 4 1
DocumentC 4 3 1
DocumentD 1 0 1

我能够进行基于组的计数，但无法通过使用获得那些 True False 列

df.groupby("Document")["Package"].nunique()

这给了我

Document
DocumentA    3
DocumentB    5
DocumentC    4
DocumentD    1

但我还需要那些额外的列

Answer 1

首先使用GroupBy.transform与GroupBy.all ，然后通过枢DataFrame.pivot_table ，在去年加计列DataFrame.insert ：

print (df.dtypes)
Package     object
Document    object
bool          bool
dtype: object

df["bool"] = df.groupby(["Document",'Package'])["bool"].transform('all')
print (df)
   Package   Document   bool
0     Pkg1  DocumentA   True
1     Pkg1  DocumentA   True
2     Pkg1  DocumentB   True
3     Pkg1  DocumentC   True
4     Pkg2  DocumentA  False
5     Pkg2  DocumentB   True
6     Pkg2  DocumentB   True
7     Pkg2  DocumentC   True
8     Pkg3  DocumentA  False
9     Pkg3  DocumentB   True
10    Pkg3  DocumentD  False
11    Pkg3  DocumentD  False
12    Pkg4  DocumentB   True
13    Pkg4  DocumentC   True
14    Pkg5  DocumentB  False
15    Pkg5  DocumentC  False

df= df.pivot_table(index='Document', 
                   columns='bool', 
                   values='Package', 
                   aggfunc='nunique', 
                   fill_value=0)
df.insert(0, 'count', df.sum(axis=1))
print (df)
bool       count  False  True
Document                     
DocumentA      3      2     1
DocumentB      5      1     4
DocumentC      4      1     3
DocumentD      1      1     0

计算组内的唯一值，然后将两个值转为类别

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-29 09:06:35

计算组内的唯一值，然后将两个值转为类别

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-29 09:06:35

解决方案1
1 已采纳 2020-09-29 09:06:35