簡體   English   中英

如何逐一比較分組中的字符串值

[英]How to compare string values in one column, one by one, grouped

我有這樣的數據框,我想跟蹤更改但在每個 RB 組內,所以如果有更改集 1,我想添加列,否則為 0

在此處輸入圖像描述

您可以創建一個新列作為差異的更改,但首先按它們[RB with value]分組。

df.sort_values(['RB'], inplace=True)
df['change'] = (df.value != df.value.shift()).astype(int)
print(df)

在此處輸入圖像描述

這里是 go:

import pandas as pd
from calendar import month_abbr

months = [month.lower() for month in month_abbr[1:]]
df = pd.DataFrame({
    'RB': [2335] * 4 +  [3567] * 4 + [1245] * 4,
    'time': months[:4] * 3,
    'value': ['good day'] * 2 + ['may be'] * 2 + ['what'] * 4 + ['sure'] * 3 + ['no']
})

df['time'] = pd.Categorical(df['time'], categories=months, ordered=True)
df = df.sort_values(['RB', 'time'])
df['previous'] = df['value'].shift()
counts = df.groupby(['RB'], sort=False).count()['value']
counts.iloc[0] = 0
counts = counts.cumsum()
cols = df.columns.to_list()
df.iloc[counts, cols.index('previous')] = df.iloc[counts, cols.index('value')]
df['change'] = (df['previous'] != df['value']).astype(int)
cols = ['previous', 'value']
change_counts = df[df['change'] == 1][cols].groupby(cols).size().reset_index()
change_counts.columns = ['from', 'to', 'counts']
change_counts = change_counts.sort_values(by=['counts'], ascending=False)
df = df.drop('previous', axis=1)
print(df)
print()
print(change_counts)

Output:

      RB time     value  change
8   1245  jan      sure       0
9   1245  feb      sure       0
10  1245  mar      sure       0
11  1245  apr        no       1
0   2335  jan  good day       0
1   2335  feb  good day       0
2   2335  mar    may be       1
3   2335  apr    may be       0
4   3567  jan      what       0
5   3567  feb      what       0
6   3567  mar      what       0
7   3567  apr      what       0

       from      to  counts
0  good day  may be       1
1      sure      no       1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM