用于特征工程的分组和装箱数据

Question

I struggle to divide my data into bins for feature engineering.我努力将我的数据划分为特征工程的箱。 The data is Sale Price that I want to group by categorical data (Neighbourhood).数据是我想按分类数据（邻里）分组的销售价格。

What am i doing wrong - I got NaN values for all the rows?我做错了什么 - 我得到了所有行的NaN值？ Thanks!谢谢！

    pricy_location = train['SalePrice'].groupby(train['Neighborhood']).mean()
    label = ['rank1', 'rank2', 'rank3', 'rank4', 'rank5']
    train['Pricy_Loc'] = pd.qcut(pricy_location, 5, labels=label, precision=2)
    train['Pricy_Loc'].head()

Answer 1

I think the problem arises because you are creating a dataframe grouped by neighborhood (which is only 25 rows long) and then trying to use the categories created for that dataframe and applying it to a much longer dataframe that is 1460 rows long.我认为问题的出现是因为您正在创建一个按邻域分组的数据帧（只有 25 行长），然后尝试使用为该数据帧创建的类别并将其应用于一个更长的 1460 行数据帧。 You can simply get the summarized data in a new column of your train dataframe and then bin the result:您可以简单地在训练数据框的新列中获取汇总数据，然后对结果进行分类：

train['Pricy_loc'] = train.groupby('Neighborhood')['SalePrice'].transform('mean')
label = ['rank1', 'rank2', 'rank3', 'rank4', 'rank5']
train['Price_loc_cat'] = pd.qcut(train['Pricy_loc'], 5, labels=label, precision=2)

用于特征工程的分组和装箱数据

问题描述

1 个解决方案

解决方案1
0 2019-12-16 08:02:46

用于特征工程的分组和装箱数据

问题描述

1 个解决方案

解决方案1 0 2019-12-16 08:02:46

解决方案1
0 2019-12-16 08:02:46