[英]Applying groupby twice in pandas dataframe
I have some data as follows: 我有一些数据,如下所示:
+---------+--------+----------+------------+-------+-----+
| Comment | Reason | Location | Date | Event | Key |
+---------+--------+----------+------------+-------+-----+
| a | c | i2 | 2019-03-02 | 1 | a |
| a | b | i2 | 2019-03-02 | 1 | a |
| c | b | i2 | 2019-03-02 | 1 | a |
| c | d | i2 | 2019-03-04 | 1 | a |
| a | c | i2 | 2019-03-15 | 2 | b |
| a | b | i9 | 2019-02-22 | 2 | c |
| c | b | i9 | 2019-03-10 | 3 | d |
| c | d | i9 | 2019-03-10 | 3 | d |
| a | c | s8 | 2019-04-22 | 1 | e |
| a | b | s8 | 2019-04-25 | 1 | e |
| c | b | s8 | 2019-04-28 | 1 | e |
| c | d | t14 | 2019-05-13 | 3 | f |
+---------+--------+----------+------------+-------+-----+
Now, I don't really have the Keys
column formed. 现在,我实际上还没有形成“
Keys
列。 Whenever Location
or Event
(or both) changes, a new Key
is created. 每当“
Location
或“ Event
(或两者)都更改时,都会创建一个新的Key
。 I'm interested in counting the no. 我有兴趣数数。 of keys that have either
Comment
as a
or Reason
as b
or both. 键的
Comment
为a
或Reason
为b
或两者兼有。 I am wondering if I need to apply groupby
twice for each set of conditions. 我想知道是否需要针对每组条件两次应用
groupby
。 Or is there any other way? 还是还有其他方法?
Use the shift
and cumsum
trick with the two columns: 在两列中使用
shift
和cumsum
技巧:
df[['Location', 'Event']].ne(df[['Location', 'Event']].shift()).any(axis=1).cumsum()
0 1
1 1
2 1
3 1
4 2
5 3
6 4
7 4
8 5
9 5
10 5
11 6
dtype: int64
If you need characters, map the result to their equivalent ASCII codes: 如果需要字符,请将结果映射到其等效的ASCII代码:
(df[['Location', 'Event']]
.ne(df[['Location', 'Event']].shift())
.any(axis=1)
.cumsum()
.add(96)
.map(chr))
0 a
1 a
2 a
3 a
4 b
5 c
6 d
7 d
8 e
9 e
10 e
11 f
dtype: object
All together 全部一起
cols = ['Location', 'Event']
keys = df[cols].ne(df[cols].shift()).any(1).cumsum().map(lambda x: chr(x + 96))
df['Key'] = keys
df
Comment Reason Location Date Event Key
0 a c i2 2019-03-02 1 a
1 a b i2 2019-03-02 1 a
2 c b i2 2019-03-02 1 a
3 c d i2 2019-03-04 1 a
4 a c i2 2019-03-15 2 b
5 a b i9 2019-02-22 2 c
6 c b i9 2019-03-10 3 d
7 c d i9 2019-03-10 3 d
8 a c s8 2019-04-22 1 e
9 a b s8 2019-04-25 1 e
10 c b s8 2019-04-28 1 e
11 c d t14 2019-05-13 3 f
And 和
df.eval('Comment == "a" or Reason == "b"').groupby(keys).sum()
a 3.0
b 1.0
c 1.0
d 1.0
e 3.0
f 0.0
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.