![](/img/trans.png)
[英]How to count consecutive string values of one column grouped by column values of another in a dataframe?
[英]How to compare string values in one column, one by one, grouped
這里是 go:
import pandas as pd
from calendar import month_abbr
months = [month.lower() for month in month_abbr[1:]]
df = pd.DataFrame({
'RB': [2335] * 4 + [3567] * 4 + [1245] * 4,
'time': months[:4] * 3,
'value': ['good day'] * 2 + ['may be'] * 2 + ['what'] * 4 + ['sure'] * 3 + ['no']
})
df['time'] = pd.Categorical(df['time'], categories=months, ordered=True)
df = df.sort_values(['RB', 'time'])
df['previous'] = df['value'].shift()
counts = df.groupby(['RB'], sort=False).count()['value']
counts.iloc[0] = 0
counts = counts.cumsum()
cols = df.columns.to_list()
df.iloc[counts, cols.index('previous')] = df.iloc[counts, cols.index('value')]
df['change'] = (df['previous'] != df['value']).astype(int)
cols = ['previous', 'value']
change_counts = df[df['change'] == 1][cols].groupby(cols).size().reset_index()
change_counts.columns = ['from', 'to', 'counts']
change_counts = change_counts.sort_values(by=['counts'], ascending=False)
df = df.drop('previous', axis=1)
print(df)
print()
print(change_counts)
Output:
RB time value change
8 1245 jan sure 0
9 1245 feb sure 0
10 1245 mar sure 0
11 1245 apr no 1
0 2335 jan good day 0
1 2335 feb good day 0
2 2335 mar may be 1
3 2335 apr may be 0
4 3567 jan what 0
5 3567 feb what 0
6 3567 mar what 0
7 3567 apr what 0
from to counts
0 good day may be 1
1 sure no 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.