熊猫数据框中重复位置的频率

Question

Hi I am working to find out repetitive position of the following data frame: 嗨，我正在努力找出以下数据框的重复位置：

data = pd.DataFrame()
data ['league'] =['A','A','A','A','A','A','B','B','B']
data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']
data ['week'] =[1,2,3,1,2,3,1,2,3]
data ['position']= [1,1,2,2,2,1,2,3,4]

I will compare the data for position from previous row, it is it the same, I will assign one. 我将比较上一行的位置数据，是否相同，我将分配一个。 If it is different previous row, I will assign as 1 如果与前一行不同，我将分配为1

My expected outcome will be as follow: 我的预期结果如下：

It means I will group by (League, Team and week) and work out the frequency. 这意味着我将按（联赛，球队和周）分组并确定频率。 Can anyone advise how to do that in Pandas 谁能建议在熊猫中做到这一点

Thanks, 谢谢，

Zep 泽普

Answer 1

Use diff and abs with fillna : 将diff和abs与fillna一起fillna ：

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

print(data)
  league Team  week  position  frequency
0      A    X     1         1          0
1      A    X     2         1          0
2      A    X     3         2          1
3      A    Y     1         2          0
4      A    Y     2         2          0
5      A    Y     3         1          1
6      B    Z     1         2          1
7      B    Z     2         3          1
8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe. 使用groupby会给出全零，因为您是在组内而不是在整个数据帧上进行比较。

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
Name: position, dtype: int64

Answer 2

Use diff , and compare against 0 : 使用diff ，并与0比较：

v = df.position.diff()
v[0] = 0
df['frequency'] = v.ne(0).astype(int)

print(df)
  league Team  week  position  frequency
0      A    X     1         1          0
1      A    X     2         1          0
2      A    X     3         2          1
3      A    Y     1         2          0
4      A    Y     2         2          0
5      A    Y     3         1          1
6      B    Z     1         2          1
7      B    Z     2         3          1
8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call. 出于性能原因，您应该尝试避免执行fillna调用。

df = pd.concat([df] * 100000, ignore_index=True)

%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
%%timeit
v = df.position.diff()
v[0] = 0
df['frequency'] = v.ne(0).astype(int)

83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby , use 要将此答案扩展为在groupby工作，请使用

v = df.groupby(['league', 'Team', 'week']).position.diff()
v[np.isnan(v)] = 0

df['frequency'] = v.ne(0).astype(int)

熊猫数据框中重复位置的频率

问题描述

2 个解决方案

解决方案1
1 2018-11-12 09:03:50

解决方案2
1 已采纳 2018-11-12 09:20:38

熊猫数据框中重复位置的频率

问题描述

2 个解决方案

解决方案1 1 2018-11-12 09:03:50

解决方案2 1 已采纳 2018-11-12 09:20:38

解决方案1
1 2018-11-12 09:03:50

解决方案2
1 已采纳 2018-11-12 09:20:38