简体   繁体   English

用于特征工程的分组和装箱数据

[英]Grouping and binning data for feature engineering

I struggle to divide my data into bins for feature engineering.我努力将我的数据划分为特征工程的箱。 The data is Sale Price that I want to group by categorical data (Neighbourhood).数据是我想按分类数据(邻里)分组的销售价格。

What am i doing wrong - I got NaN values for all the rows?我做错了什么 - 我得到了所有行的NaN值? Thanks!谢谢!

    pricy_location = train['SalePrice'].groupby(train['Neighborhood']).mean()
    label = ['rank1', 'rank2', 'rank3', 'rank4', 'rank5']
    train['Pricy_Loc'] = pd.qcut(pricy_location, 5, labels=label, precision=2)
    train['Pricy_Loc'].head()

I think the problem arises because you are creating a dataframe grouped by neighborhood (which is only 25 rows long) and then trying to use the categories created for that dataframe and applying it to a much longer dataframe that is 1460 rows long.我认为问题的出现是因为您正在创建一个按邻域分组的数据帧(只有 25 行长),然后尝试使用为该数据帧创建的类别并将其应用于一个更长的 1460 行数据帧。 You can simply get the summarized data in a new column of your train dataframe and then bin the result:您可以简单地在训练数据框的新列中获取汇总数据,然后对结果进行分类:

train['Pricy_loc'] = train.groupby('Neighborhood')['SalePrice'].transform('mean')
label = ['rank1', 'rank2', 'rank3', 'rank4', 'rank5']
train['Price_loc_cat'] = pd.qcut(train['Pricy_loc'], 5, labels=label, precision=2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM