[英]Check if all elements in a group are equal using pandas GroupBy
Is there a pythonic way to group by a field and check if all elements of each resulting group have the same value? 是否存在按字段分组的pythonic方法,并检查每个结果组的所有元素是否具有相同的值?
Sample data: 样本数据:
datetime rating signal
0 2018-12-27 11:33:00 IG 0
1 2018-12-27 11:33:00 HY -1
2 2018-12-27 11:49:00 IG 0
3 2018-12-27 11:49:00 HY -1
4 2018-12-27 12:00:00 IG 0
5 2018-12-27 12:00:00 HY -1
6 2018-12-27 12:49:00 IG 0
7 2018-12-27 12:49:00 HY -1
8 2018-12-27 14:56:00 IG 0
9 2018-12-27 14:56:00 HY -1
10 2018-12-27 15:12:00 IG 0
11 2018-12-27 15:12:00 HY -1
12 2018-12-20 15:14:00 IG 0
13 2018-12-20 15:14:00 HY -1
14 2018-12-20 15:50:00 IG -1
15 2018-12-20 15:50:00 HY -1
16 2018-12-27 13:26:00 IG 0
17 2018-12-27 13:26:00 HY -1
18 2018-12-27 13:44:00 IG 0
19 2018-12-27 13:44:00 HY -1
20 2018-12-27 15:06:00 IG 0
21 2018-12-27 15:06:00 HY -1
22 2018-12-20 15:48:00 IG 0
23 2018-12-20 15:48:00 HY -1
The grouping part can be done by 分组部分可以通过
df.groupby([datetime.dt.date,'rating'])
However, I'm sure there must be a simple way to leverage the grouper and use a transform statement to return 1 if all the values from signal
are the same. 但是,我确信必须有一种简单的方法来利用分组器,如果signal
中的所有值都相同,则使用transform语句返回1。
Desired output 期望的输出
2018-12-20 HY True
IG False
2018-12-27 HY True
IG True
Use groupby
and nunique
, and check whether the result is 1: 使用groupby
和nunique
,检查结果是否为1:
df.groupby([df.datetime.dt.date, 'rating']).signal.nunique().eq(1)
datetime rating
2018-12-20 HY True
IG False
2018-12-27 HY True
IG True
Name: signal, dtype: bool
Or, similarly, using apply
with set
conversion: 或者,类似地,使用apply
with set
conversion:
(df.groupby([df.datetime.dt.date, 'rating']).signal
.apply(lambda x: len(set(x)) == 1))
datetime rating
2018-12-20 HY True
IG False
2018-12-27 HY True
IG True
Name: signal, dtype: bool
PS., you don't need to assign a temp column, groupby
takes arbitrary grouper arguments. PS。,你不需要分配临时列, groupby
需要任意的石斑鱼参数。
Try to find out alternative without using groupby
just for fun 尝试找出替代品而不使用groupby
只是为了好玩
df.datetime=df.datetime.dt.date
s=pd.crosstab(df.datetime,[df.rating,df.signal])
s.eq(s.sum(axis=1,level=0),1).any(level=0,axis=1).stack()
Out[556]:
datetime rating
2018-12-20 HY True
IG False
2018-12-27 HY True
IG True
dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.