在熊猫数据框中两次应用groupby

Question

I have some data as follows: 我有一些数据，如下所示：

+---------+--------+----------+------------+-------+-----+
| Comment | Reason | Location |    Date    | Event | Key |
+---------+--------+----------+------------+-------+-----+
| a       | c      | i2       | 2019-03-02 |     1 | a   |
| a       | b      | i2       | 2019-03-02 |     1 | a   |
| c       | b      | i2       | 2019-03-02 |     1 | a   |
| c       | d      | i2       | 2019-03-04 |     1 | a   |
| a       | c      | i2       | 2019-03-15 |     2 | b   |
| a       | b      | i9       | 2019-02-22 |     2 | c   |
| c       | b      | i9       | 2019-03-10 |     3 | d   |
| c       | d      | i9       | 2019-03-10 |     3 | d   |
| a       | c      | s8       | 2019-04-22 |     1 | e   |
| a       | b      | s8       | 2019-04-25 |     1 | e   |
| c       | b      | s8       | 2019-04-28 |     1 | e   |
| c       | d      | t14      | 2019-05-13 |     3 | f   |
+---------+--------+----------+------------+-------+-----+

Now, I don't really have the Keys column formed. 现在，我实际上还没有形成“ Keys列。 Whenever Location or Event (or both) changes, a new Key is created. 每当“ Location或“ Event （或两者）都更改时，都会创建一个新的Key 。 I'm interested in counting the no. 我有兴趣数数。 of keys that have either Comment as a or Reason as b or both. 键的Comment为a或Reason为b或两者兼有。 I am wondering if I need to apply groupby twice for each set of conditions. 我想知道是否需要针对每组条件两次应用groupby 。 Or is there any other way? 还是还有其他方法？

Answer 1

Use the shift and cumsum trick with the two columns: 在两列中使用shift和cumsum技巧：

df[['Location', 'Event']].ne(df[['Location', 'Event']].shift()).any(axis=1).cumsum()  

0     1
1     1
2     1
3     1
4     2
5     3
6     4
7     4
8     5
9     5
10    5
11    6
dtype: int64

If you need characters, map the result to their equivalent ASCII codes: 如果需要字符，请将结果映射到其等效的ASCII代码：

(df[['Location', 'Event']]
    .ne(df[['Location', 'Event']].shift())
    .any(axis=1)
    .cumsum()
    .add(96)
    .map(chr))                 

0     a
1     a
2     a
3     a
4     b
5     c
6     d
7     d
8     e
9     e
10    e
11    f
dtype: object

All together 全部一起

cols = ['Location', 'Event']
keys = df[cols].ne(df[cols].shift()).any(1).cumsum().map(lambda x: chr(x + 96))
df['Key'] = keys

df

   Comment Reason Location        Date  Event Key
0        a      c       i2  2019-03-02      1   a
1        a      b       i2  2019-03-02      1   a
2        c      b       i2  2019-03-02      1   a
3        c      d       i2  2019-03-04      1   a
4        a      c       i2  2019-03-15      2   b
5        a      b       i9  2019-02-22      2   c
6        c      b       i9  2019-03-10      3   d
7        c      d       i9  2019-03-10      3   d
8        a      c       s8  2019-04-22      1   e
9        a      b       s8  2019-04-25      1   e
10       c      b       s8  2019-04-28      1   e
11       c      d      t14  2019-05-13      3   f

And 和

df.eval('Comment == "a" or Reason == "b"').groupby(keys).sum()

a    3.0
b    1.0
c    1.0
d    1.0
e    3.0
f    0.0
dtype: float64

在熊猫数据框中两次应用groupby

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-06-03 20:11:39

在熊猫数据框中两次应用groupby

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-06-03 20:11:39

解决方案1
2 已采纳 2019-06-03 20:11:39