简体   繁体   English

基于groupby中的多个列值创建分类列

[英]Creating categorical column based on multiple column values in groupby

print(df.groupby(['Step1', 'Step2', 'Step3']).size().reset_index(name='Freq'))

                       Step1                    Step2                   Step3         Freq
0                       6.0                     17.6                    28.60         135
1                       7.5                     22.0                    35.75         255
2                      10.5                     30.8                    50.05         129
3                      12.0                     35.2                    57.20         369
4                      13.5                     39.6                    64.35         249
5                      15.0                     44.0                    71.50         246
6                      16.5                     48.4                    78.65         246
7                      18.0                     52.8                    85.80         369
8                      21.0                     61.6                   100.10         375
9                      22.5                     66.0                   107.25         249
10                     25.5                     74.8                   121.55         123

The 'Step1', 'Step2', 'Step3' columns are constant input values. 'Step1'、'Step2'、'Step3' 列是常量输入值。 There are 10 unique combinations of input values from these columns (shown in the groupby).这些列中有 10 种不同的输入值组合(显示在 groupby 中)。 I am looking to delete the individual 'Step1', 'Step2', 'Step3' columns and create a single column "Step Type" that has a letter that represents the unique combinations of input values from these columns.我希望删除单独的“Step1”、“Step2”、“Step3”列,并创建一个“Step Type”列,其中有一个字母表示这些列中输入值的唯一组合。

Desired output:所需的 output:

                     Step Type   Freq
0                      A         135
1                      B         255
2                      C         129
3                      D         369
4                      E         249
5                      F         246
6                      G         246
7                      H         369
8                      J         375
9                      L         249
10                     M         123

Step Type A: Step1=6.0, Step2=17.6, Step3=28.60步骤类型A:Step1=6.0,Step2=17.6,Step3=28.60

How do I do this?我该怎么做呢?

As combinationss of the three steps are unique, I used the combinations as a key of Dictionary for Step Type.由于这三个步骤的组合是独一无二的,我将这些组合用作步骤类型字典的键。

Here, I pre-defined category value but it can be auto-generated by scanning the df if needed.在这里,我预定义了类别值,但如果需要,它可以通过扫描 df 自动生成。

# df
   Step1  Step2  Step3
0    6.0   17.6  28.60
1    7.5   22.0  35.75
2   10.5   30.8  50.05
3   12.0   35.2  57.20
4   13.5   30.6  64.35
category = {
    (6.0, 17.6, 28.60): 'A',
    (7.5, 22.0, 35.75): 'B',
    (10.5, 30.8, 50.05): 'C',
    (12, 35.2, 57.20): 'D',
    (13.5, 30.6, 64.35): 'E',
}

df['Step_Type'] = df.apply(lambda row: category[(row['Step1'], row['Step2'], row['Step3'])], axis=1)

df = df[['Step_Type', 'Freq']]
print(df)

#  Step_Type  Freq
#0         A   135
#1         B   255
#2         C   129
#3         D   369
#4         E   249

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM