[英]pandas data frame Podium crosstab frequency
我在熊猫中有一个数据框,它看起来像这样
Terrain Distance Rank
77 Dirt 100 1
15 Grass 120 1
82 Road 180 1
4 Rock 100 1
107 Rock 120 1
70 Rock 200 1
115 Rock 200 1
37 Snow 160 1
57 Snow 160 1
95 Snow 160 1
193 Track 100 1
32 Dirt 100 2
97 Grass 140 2
51 Road 160 2
125 Road 180 2
90 Rock 140 2
60 Snow 120 2
78 Track 100 2
205 Track 120 2
33 Dirt 100 3
17 Dirt 140 3
53 Grass 100 3
161 Grass 100 3
43 Grass 160 3
81 Grass 160 3
103 Road 120 3
208 Road 160 3
58 Road 180 3
44 Rock 120 3
66 Rock 140 3
101 Rock 140 3
88 Rock 180 3
122 Rock 180 3
119 Sand 140 3
5 Sand 160 3
84 Snow 140 3
21 Snow 160 3
111 Snow 180 3
140 Track 140 3
29 Track 180 3
39 Track 200 3
2 Dirt 100 4
31 Dirt 140 4
102 Grass 140 4
134 Grass 160 4
108 Road 120 4
118 Road 120 4
我可以使用此代码创建交叉表
### frequency table using crosstab()function
my_crosstab = pd.crosstab(index=df["Terrain"],
columns=df["Distance"],
margins=True) # Include row and column totals
my_crosstab
我的交叉表看起来像这样
Distance 100 120 140 160 180 200 All
Terrain
Dirt 12 5 9 5 4 5 40
Grass 4 5 8 8 2 6 33
Road 6 5 4 7 6 5 33
Rock 8 4 6 2 10 6 36
Sand 4 4 4 7 5 2 26
Snow 5 10 11 11 5 4 46
Track 9 6 4 6 6 4 35
All 48 39 46 46 38 32 249
基本上,我有 7 个地形和 6 个距离。 我想用我在表格中的每个单元格中获得第一名的次数填充交叉表
Distance 100 120 140 160 180 200
Terrain
Dirt 1
Grass 1
Road 1
Rock 1 1 2
Sand
Snow 3
Track 1
一种方法是根据Rank
为 ( eq ) 1
的位置过滤列:
mask = df['Rank'].eq(1) # Reuseable boolean index
my_crosstab = pd.crosstab(index=df.loc[mask, "Terrain"],
columns=df.loc[mask, "Distance"],
margins=True)
my_crosstab
:
Distance 100 120 160 180 200 All
Terrain
Dirt 1 0 0 0 0 1
Grass 0 1 0 0 0 1
Road 0 0 0 1 0 1
Rock 1 1 0 0 2 4
Snow 0 0 3 0 0 3
Track 1 0 0 0 0 1
All 3 2 3 1 2 11
如果要恢复所有可能的距离和地形值,我们可以重新索引索引和列:
mask = df['Rank'].eq(1)
my_crosstab = pd.crosstab(
index=df.loc[mask, "Terrain"],
columns=df.loc[mask, "Distance"],
margins=True
).reindex(
index=[*df['Terrain'].unique(), 'All'],
columns=[*np.sort(df['Distance'].unique()), 'All'], # Sorting just for aesthetics
fill_value=0
)
my_crosstab
:
Distance 100 120 140 160 180 200 All
Terrain
Dirt 1 0 0 0 0 0 1
Grass 0 1 0 0 0 0 1
Road 0 0 0 0 1 0 1
Rock 1 1 0 0 0 2 4
Snow 0 0 0 3 0 0 3
Track 1 0 0 0 0 0 1
Sand 0 0 0 0 0 0 0
All 3 2 0 3 1 2 11
如果想要空格而不是零可以跟进掩码调用:
mask = df['Rank'].eq(1)
my_crosstab = pd.crosstab(
index=df.loc[mask, "Terrain"],
columns=df.loc[mask, "Distance"],
margins=True
).reindex(
index=[*df['Terrain'].unique(), 'All'],
columns=[*np.sort(df['Distance'].unique()), 'All'],
fill_value=0
).mask(lambda df_: df_.eq(0), '')
my_crosstab
:
Distance 100 120 140 160 180 200 All
Terrain
Dirt 1 1
Grass 1 1
Road 1 1
Rock 1 1 2 4
Snow 3 3
Track 1 1
Sand
All 3 2 3 1 2 11
设置和版本:
import numpy as np # version 1.23.1
import pandas as pd # version 1.4.3
df = pd.DataFrame({
'Terrain': ['Dirt', 'Grass', 'Road', 'Rock', 'Rock', 'Rock', 'Rock', 'Snow',
'Snow', 'Snow', 'Track', 'Dirt', 'Grass', 'Road', 'Road',
'Rock', 'Snow', 'Track', 'Track', 'Dirt', 'Dirt', 'Grass',
'Grass', 'Grass', 'Grass', 'Road', 'Road', 'Road', 'Rock',
'Rock', 'Rock', 'Rock', 'Rock', 'Sand', 'Sand', 'Snow', 'Snow',
'Snow', 'Track', 'Track', 'Track', 'Dirt', 'Dirt', 'Grass',
'Grass', 'Road', 'Road'],
'Distance': [100, 120, 180, 100, 120, 200, 200, 160, 160, 160, 100, 100,
140, 160, 180, 140, 120, 100, 120, 100, 140, 100, 100, 160,
160, 120, 160, 180, 120, 140, 140, 180, 180, 140, 160, 140,
160, 180, 140, 180, 200, 100, 140, 140, 160, 120, 120],
'Rank': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4,
4, 4, 4]
})
df.Terrain = df.Terrain.astype('category')
df.Distance = df.Distance.astype('category')
out = (df[df.Rank.eq(1)]
.pivot_table(index='Terrain',
columns='Distance',
aggfunc='value_counts')
.reset_index(-1, drop=True))
print(out)
输出:
Distance 100 120 140 160 180 200
Terrain
Dirt 1 0 0 0 0 0
Grass 0 1 0 0 0 0
Road 0 0 0 0 1 0
Rock 1 1 0 0 0 2
Sand 0 0 0 0 0 0
Snow 0 0 0 3 0 0
Track 1 0 0 0 0 0
>>> out.replace(0, '')
Distance 100 120 140 160 180 200
Terrain
Dirt 1
Grass 1
Road 1
Rock 1 1 2
Sand
Snow 3
Track 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.