[英]Creating categorical column based on multiple column values in groupby
print(df.groupby(['Step1', 'Step2', 'Step3']).size().reset_index(name='Freq'))
Step1 Step2 Step3 Freq
0 6.0 17.6 28.60 135
1 7.5 22.0 35.75 255
2 10.5 30.8 50.05 129
3 12.0 35.2 57.20 369
4 13.5 39.6 64.35 249
5 15.0 44.0 71.50 246
6 16.5 48.4 78.65 246
7 18.0 52.8 85.80 369
8 21.0 61.6 100.10 375
9 22.5 66.0 107.25 249
10 25.5 74.8 121.55 123
The 'Step1', 'Step2', 'Step3' columns are constant input values. 'Step1'、'Step2'、'Step3' 列是常量输入值。 There are 10 unique combinations of input values from these columns (shown in the groupby).
这些列中有 10 种不同的输入值组合(显示在 groupby 中)。 I am looking to delete the individual 'Step1', 'Step2', 'Step3' columns and create a single column "Step Type" that has a letter that represents the unique combinations of input values from these columns.
我希望删除单独的“Step1”、“Step2”、“Step3”列,并创建一个“Step Type”列,其中有一个字母表示这些列中输入值的唯一组合。
Desired output:所需的 output:
Step Type Freq
0 A 135
1 B 255
2 C 129
3 D 369
4 E 249
5 F 246
6 G 246
7 H 369
8 J 375
9 L 249
10 M 123
Step Type A: Step1=6.0, Step2=17.6, Step3=28.60步骤类型A:Step1=6.0,Step2=17.6,Step3=28.60
How do I do this?我该怎么做呢?
As combinationss of the three steps are unique, I used the combinations as a key of Dictionary for Step Type.由于这三个步骤的组合是独一无二的,我将这些组合用作步骤类型字典的键。
Here, I pre-defined category value but it can be auto-generated by scanning the df if needed.在这里,我预定义了类别值,但如果需要,它可以通过扫描 df 自动生成。
# df
Step1 Step2 Step3
0 6.0 17.6 28.60
1 7.5 22.0 35.75
2 10.5 30.8 50.05
3 12.0 35.2 57.20
4 13.5 30.6 64.35
category = {
(6.0, 17.6, 28.60): 'A',
(7.5, 22.0, 35.75): 'B',
(10.5, 30.8, 50.05): 'C',
(12, 35.2, 57.20): 'D',
(13.5, 30.6, 64.35): 'E',
}
df['Step_Type'] = df.apply(lambda row: category[(row['Step1'], row['Step2'], row['Step3'])], axis=1)
df = df[['Step_Type', 'Freq']]
print(df)
# Step_Type Freq
#0 A 135
#1 B 255
#2 C 129
#3 D 369
#4 E 249
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.