使用 Pandas 中的条件计算多列中每个组的唯一值

Question

UPDATED THE SAMPLE DATASET更新了样本数据集

I have the following data:我有以下数据：

location ID  Value
A        1   1 
A        1   1
A        1   1 
A        1   1
A        1   2 
A        1   2
A        1   2 
A        1   2
A        1   3 
A        1   4 
A        2   1 
A        2   2 
A        3   1 
A        3   2
B        4   1 
B        4   2 
B        5   1
B        5   1 
B        5   2
B        5   2 
B        6   1 
B        6   1
B        6   1
B        6   1 
B        6   1
B        6   2
B        6   2
B        6   2   
B        7   1

I want to count unique Values (only if value is equals to 1 or 2) for each location and for each ID for the following output.我想为以下输出的每个位置和每个 ID 计算唯一值（仅当值等于 1 或 2 时）。

location ID_Count  Value_Count
A        3         6
B        4         7

I tried using df.groupby(['location'])['ID','value'].nunique() , but I am getting only the unique count of values, like for I am getting value_count for A as 4 and for B as 2.我尝试使用df.groupby(['location'])['ID','value'].nunique() ，但我只得到值的唯一计数，就像我得到 A 的 value_count 为 4 和 for B 为 2。

Answer 1

Try agg with slice on ID on True values.尝试在True值的ID上使用切片进行agg 。

For your updated sample, you just need to drop duplicates before processing.对于更新后的样本，您只需要在处理前删除重复项。 The rest is the same其余都是一样的

df = df.drop_duplicates(['location', 'ID', 'Value'])

df_agg = (df.Value.isin([1,2]).groupby(df.location)
                              .agg(ID_count=lambda x: df.loc[x[x].index, 'ID'].nunique(), 
                                   Value_count='sum'))

Out[93]:
          ID_count  Value_count
location
A                3            6
B                4            7

Answer 2

IIUC, You can try series.isin with groupby.agg IIUC，您可以尝试series.isin与groupby.agg

out = (df.assign(Value_Count=df['Value'].isin([1,2])).groupby("location",as_index=False)
                                   .agg({"ID":'nunique',"Value_Count":'sum'}))

print(out)

  location  ID  Value_Count
0        A   3          6.0
1        B   4          7.0

Answer 3

Roughly same as anky, but then using Series.where and named aggregations so we can rename the columns while creating them in the groupby.与 anky 大致相同，但随后使用Series.where和named aggregations因此我们可以在 groupby 中创建列时重命名列。

grp = df.assign(Value=df['Value'].where(df['Value'].isin([1, 2]))).groupby('location')
grp.agg(
    ID_count=('ID', 'nunique'),
    Value_count=('Value', 'count')
).reset_index()

  location  ID_count  Value_count
0        A         3            6
1        B         4            7

Answer 4

Let's try a very similar approach to other answers.让我们尝试一种与其他答案非常相似的方法。 This time we filter first:这次我们先过滤：

(df[df['Value'].isin([1,2])]
   .groupby(['location'],as_index=False)
   .agg({'ID':'nunique', 'Value':'size'})
)

Output:输出：

  location  ID  Value
0        A   3      6
1        B   4      7

使用 Pandas 中的条件计算多列中每个组的唯一值

问题描述

4 个解决方案

解决方案1
4 2020-09-03 17:08:42

解决方案2
3 2020-09-03 17:00:28

解决方案3
3 2020-09-03 17:07:23

解决方案4
3 2020-09-03 17:12:45

使用 Pandas 中的条件计算多列中每个组的唯一值

问题描述

4 个解决方案

解决方案1 4 2020-09-03 17:08:42

解决方案2 3 2020-09-03 17:00:28

解决方案3 3 2020-09-03 17:07:23

解决方案4 3 2020-09-03 17:12:45

解决方案1
4 2020-09-03 17:08:42

解决方案2
3 2020-09-03 17:00:28

解决方案3
3 2020-09-03 17:07:23

解决方案4
3 2020-09-03 17:12:45