[英]How do I count # of changes in pandas dataframe by groupby?
I have a data that looks like:我有一个看起来像的数据:
df = pd.DataFrame({
'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'DATE': ['1/1/2015','1/2/2015', '1/3/2015','1/4/2015','1/5/2015','1/6/2015','1/7/2015','1/8/2015',
'1/9/2016','1/2/2015','1/3/2015','1/4/2015','1/5/2015','1/6/2015','1/7/2015'],
'CD': ['A','A','A','A','B','B','A','A','C','A','A','A','A','A','A']})
I would like to count # of changes that occurs by ID and CD.我想计算 ID 和 CD 发生的更改数量。 How can I get the desired result.
我怎样才能得到想要的结果。 When I tried cumcount, it will count same groupby and give it different numbers.
当我尝试 cumcount 时,它会计算相同的 groupby 并给它不同的数字。
What I get is:我得到的是:
What I am expecting is:我期待的是:
your count
column in desired output means group您在所需 output 中的
count
列表示组
make grouper
to divide group (changed bool to int for ease of viewing)制作
grouper
来划分组(为了便于查看,将 bool 更改为 int)
col = ['ID', 'CD']
grouper = df[col].ne(df[col].shift(1)).any(axis=1).astype('int')
grouper
0 1
1 0
2 0
3 0
4 1
5 0
6 1
7 0
8 1
9 1
10 0
11 0
12 0
13 0
14 0
dtype: int32
divide group in same ID (I made grouper
to count
column because had to create count
column anyway.)在同一 ID 中划分组(我将
grouper
设为count
列,因为无论如何都必须创建count
列。)
df.assign(count=grouper).groupby('ID')['count'].cumsum()
output: output:
0 1
1 1
2 1
3 1
4 2
5 2
6 3
7 3
8 4
9 1
10 1
11 1
12 1
13 1
14 1
Name: count, dtype: int32
make output to count column使 output 成为计数列
df.assign(count=df.assign(count=grouper).groupby('ID')['count'].cumsum())
result:结果:
ID DATE CD count
0 1 1/1/2015 A 1
1 1 1/2/2015 A 1
2 1 1/3/2015 A 1
3 1 1/4/2015 A 1
4 1 1/5/2015 B 2
5 1 1/6/2015 B 2
6 1 1/7/2015 A 3
7 1 1/8/2015 A 3
8 1 1/9/2016 C 4
9 2 1/2/2015 A 1
10 2 1/3/2015 A 1
11 2 1/4/2015 A 1
12 2 1/5/2015 A 1
13 2 1/6/2015 A 1
14 2 1/7/2015 A 1
Update full code更新完整代码
more simple full code with advice of @cottontail更简单的完整代码和@cottontail 的建议
col = ['ID', 'CD']
grouper = df[col].ne(df[col].shift(1)).any(axis=1).astype('int')
df.assign(count=grouper.groupby(df['ID']).cumsum())
Lets group on ID column and use shift on CD to check for changes then use cumsum to create sequential counter让我们在 ID 列上分组并在 CD 上使用 shift 检查更改,然后使用 cumsum 创建顺序计数器
df['count'] = df.groupby('ID')['CD'].apply(lambda s: s.ne(s.shift()).cumsum())
Result结果
ID DATE CD count
0 1 1/1/2015 A 1
1 1 1/2/2015 A 1
2 1 1/3/2015 A 1
3 1 1/4/2015 A 1
4 1 1/5/2015 B 2
5 1 1/6/2015 B 2
6 1 1/7/2015 A 3
7 1 1/8/2015 A 3
8 1 1/9/2016 C 4
9 2 1/2/2015 A 1
10 2 1/3/2015 A 1
11 2 1/4/2015 A 1
12 2 1/5/2015 A 1
13 2 1/6/2015 A 1
14 2 1/7/2015 A 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.