[英]Count unique values for each group in multi column with criteria in Pandas
UPDATED THE SAMPLE DATASET更新了样本数据集
I have the following data:我有以下数据:
location ID Value
A 1 1
A 1 1
A 1 1
A 1 1
A 1 2
A 1 2
A 1 2
A 1 2
A 1 3
A 1 4
A 2 1
A 2 2
A 3 1
A 3 2
B 4 1
B 4 2
B 5 1
B 5 1
B 5 2
B 5 2
B 6 1
B 6 1
B 6 1
B 6 1
B 6 1
B 6 2
B 6 2
B 6 2
B 7 1
I want to count unique Values (only if value is equals to 1 or 2) for each location and for each ID for the following output.我想为以下输出的每个位置和每个 ID 计算唯一值(仅当值等于 1 或 2 时)。
location ID_Count Value_Count
A 3 6
B 4 7
I tried using df.groupby(['location'])['ID','value'].nunique()
, but I am getting only the unique count of values, like for I am getting value_count for A as 4 and for B as 2.我尝试使用df.groupby(['location'])['ID','value'].nunique()
,但我只得到值的唯一计数,就像我得到 A 的 value_count 为 4 和 for B 为 2。
Try agg
with slice on ID
on True
values.尝试在True
值的ID
上使用切片进行agg
。
For your updated sample, you just need to drop duplicates before processing.对于更新后的样本,您只需要在处理前删除重复项。 The rest is the same其余都是一样的
df = df.drop_duplicates(['location', 'ID', 'Value'])
df_agg = (df.Value.isin([1,2]).groupby(df.location)
.agg(ID_count=lambda x: df.loc[x[x].index, 'ID'].nunique(),
Value_count='sum'))
Out[93]:
ID_count Value_count
location
A 3 6
B 4 7
IIUC, You can try series.isin
with groupby.agg
IIUC,您可以尝试series.isin
与groupby.agg
out = (df.assign(Value_Count=df['Value'].isin([1,2])).groupby("location",as_index=False)
.agg({"ID":'nunique',"Value_Count":'sum'}))
print(out)
location ID Value_Count
0 A 3 6.0
1 B 4 7.0
Roughly same as anky, but then using Series.where
and named aggregations
so we can rename the columns while creating them in the groupby.与 anky 大致相同,但随后使用Series.where
和named aggregations
因此我们可以在 groupby 中创建列时重命名列。
grp = df.assign(Value=df['Value'].where(df['Value'].isin([1, 2]))).groupby('location')
grp.agg(
ID_count=('ID', 'nunique'),
Value_count=('Value', 'count')
).reset_index()
location ID_count Value_count
0 A 3 6
1 B 4 7
Let's try a very similar approach to other answers.让我们尝试一种与其他答案非常相似的方法。 This time we filter first:这次我们先过滤:
(df[df['Value'].isin([1,2])]
.groupby(['location'],as_index=False)
.agg({'ID':'nunique', 'Value':'size'})
)
Output:输出:
location ID Value
0 A 3 6
1 B 4 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.