繁体   English   中英

熊猫数据框讲台交叉表频率

[英]pandas data frame Podium crosstab frequency

我在熊猫中有一个数据框,它看起来像这样

    Terrain  Distance  Rank
77     Dirt       100     1
15    Grass       120     1
82     Road       180     1
4      Rock       100     1
107    Rock       120     1
70     Rock       200     1
115    Rock       200     1
37     Snow       160     1
57     Snow       160     1
95     Snow       160     1
193   Track       100     1
32     Dirt       100     2
97    Grass       140     2
51     Road       160     2
125    Road       180     2
90     Rock       140     2
60     Snow       120     2
78    Track       100     2
205   Track       120     2
33     Dirt       100     3
17     Dirt       140     3
53    Grass       100     3
161   Grass       100     3
43    Grass       160     3
81    Grass       160     3
103    Road       120     3
208    Road       160     3
58     Road       180     3
44     Rock       120     3
66     Rock       140     3
101    Rock       140     3
88     Rock       180     3
122    Rock       180     3
119    Sand       140     3
5      Sand       160     3
84     Snow       140     3
21     Snow       160     3
111    Snow       180     3
140   Track       140     3
29    Track       180     3
39    Track       200     3
2      Dirt       100     4
31     Dirt       140     4
102   Grass       140     4
134   Grass       160     4
108    Road       120     4
118    Road       120     4

我可以使用此代码创建交叉表

### frequency table using crosstab()function
my_crosstab = pd.crosstab(index=df["Terrain"], 
                            columns=df["Distance"],
                             margins=True)   # Include row and column totals
my_crosstab

我的交叉表看起来像这样

Distance    100 120 140 160 180 200 All
Terrain                         
Dirt    12  5   9   5   4   5   40
Grass   4   5   8   8   2   6   33
Road    6   5   4   7   6   5   33
Rock    8   4   6   2   10  6   36
Sand    4   4   4   7   5   2   26
Snow    5   10  11  11  5   4   46
Track   9   6   4   6   6   4   35
All    48   39  46  46  38  32  249

基本上,我有 7 个地形和 6 个距离。 我想用我在表格中的每个单元格中获得第一名的次数填充交叉表

Distance    100 120 140 160 180 200 
Terrain                         
Dirt        1  
Grass            1
Road                         1
Rock        1    1               2                     
Sand    
Snow                    3 
Track       1

基于特定地形和距离的第一名频率

一种方法是根据Rank为 ( eq ) 1的位置过滤列:

mask = df['Rank'].eq(1)  # Reuseable boolean index
my_crosstab = pd.crosstab(index=df.loc[mask, "Terrain"],
                          columns=df.loc[mask, "Distance"],
                          margins=True)

my_crosstab

Distance  100  120  160  180  200  All
Terrain                               
Dirt        1    0    0    0    0    1
Grass       0    1    0    0    0    1
Road        0    0    0    1    0    1
Rock        1    1    0    0    2    4
Snow        0    0    3    0    0    3
Track       1    0    0    0    0    1
All         3    2    3    1    2   11

如果要恢复所有可能的距离和地形值,我们可以重新索引索引和列:

mask = df['Rank'].eq(1)
my_crosstab = pd.crosstab(
    index=df.loc[mask, "Terrain"],
    columns=df.loc[mask, "Distance"],
    margins=True
).reindex(
    index=[*df['Terrain'].unique(), 'All'],
    columns=[*np.sort(df['Distance'].unique()), 'All'],  # Sorting just for aesthetics
    fill_value=0
)

my_crosstab

Distance  100  120  140  160  180  200  All
Terrain                                    
Dirt        1    0    0    0    0    0    1
Grass       0    1    0    0    0    0    1
Road        0    0    0    0    1    0    1
Rock        1    1    0    0    0    2    4
Snow        0    0    0    3    0    0    3
Track       1    0    0    0    0    0    1
Sand        0    0    0    0    0    0    0
All         3    2    0    3    1    2   11

如果想要空格而不是零可以跟进掩码调用:

mask = df['Rank'].eq(1)
my_crosstab = pd.crosstab(
    index=df.loc[mask, "Terrain"],
    columns=df.loc[mask, "Distance"],
    margins=True
).reindex(
    index=[*df['Terrain'].unique(), 'All'],
    columns=[*np.sort(df['Distance'].unique()), 'All'],
    fill_value=0
).mask(lambda df_: df_.eq(0), '')

my_crosstab

Distance 100 120 140 160 180 200 All
Terrain                             
Dirt       1                       1
Grass          1                   1
Road                       1       1
Rock       1   1               2   4
Snow                   3           3
Track      1                       1
Sand                                
All        3   2       3   1   2  11

设置和版本:

import numpy as np  # version 1.23.1
import pandas as pd  # version 1.4.3

df = pd.DataFrame({
    'Terrain': ['Dirt', 'Grass', 'Road', 'Rock', 'Rock', 'Rock', 'Rock', 'Snow',
                'Snow', 'Snow', 'Track', 'Dirt', 'Grass', 'Road', 'Road',
                'Rock', 'Snow', 'Track', 'Track', 'Dirt', 'Dirt', 'Grass',
                'Grass', 'Grass', 'Grass', 'Road', 'Road', 'Road', 'Rock',
                'Rock', 'Rock', 'Rock', 'Rock', 'Sand', 'Sand', 'Snow', 'Snow',
                'Snow', 'Track', 'Track', 'Track', 'Dirt', 'Dirt', 'Grass',
                'Grass', 'Road', 'Road'],
    'Distance': [100, 120, 180, 100, 120, 200, 200, 160, 160, 160, 100, 100,
                 140, 160, 180, 140, 120, 100, 120, 100, 140, 100, 100, 160,
                 160, 120, 160, 180, 120, 140, 140, 180, 180, 140, 160, 140,
                 160, 180, 140, 180, 200, 100, 140, 140, 160, 120, 120],
    'Rank': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
             3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4,
             4, 4, 4]
})
df.Terrain = df.Terrain.astype('category')
df.Distance = df.Distance.astype('category')
out = (df[df.Rank.eq(1)]
         .pivot_table(index='Terrain', 
                      columns='Distance', 
                      aggfunc='value_counts')
         .reset_index(-1, drop=True))
print(out)

输出:

Distance  100  120  140  160  180  200
Terrain
Dirt        1    0    0    0    0    0
Grass       0    1    0    0    0    0
Road        0    0    0    0    1    0
Rock        1    1    0    0    0    2
Sand        0    0    0    0    0    0
Snow        0    0    0    3    0    0
Track       1    0    0    0    0    0
>>> out.replace(0, '')
Distance 100 120 140 160 180 200
Terrain
Dirt       1
Grass          1
Road                       1
Rock       1   1               2
Sand
Snow                   3
Track      1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM