简体   繁体   English

使用 Pandas 中的条件计算多列中每个组的唯一值

[英]Count unique values for each group in multi column with criteria in Pandas

UPDATED THE SAMPLE DATASET更新了样本数据集

I have the following data:我有以下数据:

location ID  Value
A        1   1 
A        1   1
A        1   1 
A        1   1
A        1   2 
A        1   2
A        1   2 
A        1   2
A        1   3 
A        1   4 
A        2   1 
A        2   2 
A        3   1 
A        3   2
B        4   1 
B        4   2 
B        5   1
B        5   1 
B        5   2
B        5   2 
B        6   1 
B        6   1
B        6   1
B        6   1 
B        6   1
B        6   2
B        6   2
B        6   2   
B        7   1 

I want to count unique Values (only if value is equals to 1 or 2) for each location and for each ID for the following output.我想为以下输出的每个位置和每个 ID 计算唯一值(仅当值等于 1 或 2 时)。

location ID_Count  Value_Count
A        3         6
B        4         7

I tried using df.groupby(['location'])['ID','value'].nunique() , but I am getting only the unique count of values, like for I am getting value_count for A as 4 and for B as 2.我尝试使用df.groupby(['location'])['ID','value'].nunique() ,但我只得到值的唯一计数,就像我得到 A 的 value_count 为 4 和 for B 为 2。

Try agg with slice on ID on True values.尝试在True值的ID上使用切片进行agg

For your updated sample, you just need to drop duplicates before processing.对于更新后的样本,您只需要在处理前删除重复项。 The rest is the same其余都是一样的

df = df.drop_duplicates(['location', 'ID', 'Value'])

df_agg = (df.Value.isin([1,2]).groupby(df.location)
                              .agg(ID_count=lambda x: df.loc[x[x].index, 'ID'].nunique(), 
                                   Value_count='sum'))

Out[93]:
          ID_count  Value_count
location
A                3            6
B                4            7

IIUC, You can try series.isin with groupby.agg IIUC,您可以尝试series.isingroupby.agg

out = (df.assign(Value_Count=df['Value'].isin([1,2])).groupby("location",as_index=False)
                                   .agg({"ID":'nunique',"Value_Count":'sum'}))

print(out)

  location  ID  Value_Count
0        A   3          6.0
1        B   4          7.0

Roughly same as anky, but then using Series.where and named aggregations so we can rename the columns while creating them in the groupby.与 anky 大致相同,但随后使用Series.wherenamed aggregations因此我们可以在 groupby 中创建列时重命名列。

grp = df.assign(Value=df['Value'].where(df['Value'].isin([1, 2]))).groupby('location')
grp.agg(
    ID_count=('ID', 'nunique'),
    Value_count=('Value', 'count')
).reset_index()
  location  ID_count  Value_count
0        A         3            6
1        B         4            7

Let's try a very similar approach to other answers.让我们尝试一种与其他答案非常相似的方法。 This time we filter first:这次我们先过滤:

(df[df['Value'].isin([1,2])]
   .groupby(['location'],as_index=False)
   .agg({'ID':'nunique', 'Value':'size'})
)

Output:输出:

  location  ID  Value
0        A   3      6
1        B   4      7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM