简体   繁体   English

附加两个熊猫数据帧时的索引问题

[英]Indexing issue when appending two pandas dataframes

I'm working on dummifying a column of zipcodes in pandas so I can build a random forest model in sklearn. 我正在研究在熊猫中压缩邮政编码列,因此可以在sklearn中建立随机森林模型。 Here is my code: 这是我的代码:

forest_test_features = test_df[['sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated']] forest_test_features.append(pd.get_dummies(test_df['zipcode'])) forest_test_target = test_df['price']

I get a runtime warning, and then my model's R^2 score is much lower than when I simply leave zipcode in the model without dummifying, suggesting something went wrong. 我收到运行时警告,然后我的模型的R ^ 2得分远低于我在模型中不做任何修改的情况下简单地将邮政编码保留在模型中的水平,这表明出现了问题。 pd.get_dummies returns a dataframe, and I think the problem is in the fact that this dataframe and forest_test_features are in two different orders, but I am unsure of how to proceed. pd.get_dummies返回一个数据帧,我认为问题在于此数据帧和forest_test_features处于两个不同的顺序,但是我不确定如何进行。 The indexes are still correct (zipcode 98144 maps to a '1' in the '98144' column of get_dummies return.) 索引仍然正确(邮政编码98144映射到get_dummies返回的'98144'列中的'1'。)

I also get this warning: RuntimeWarning: '<' not supported between instances of 'str' and 'int', sort order is undefined for incomparable objects result = result.union(other) 我也收到此警告:RuntimeWarning:'str'和'int'的实例之间不支持'<',未定义对象的排序顺序为result = result.union(other)

You can specify the order. 您可以指定顺序。 I guess you have a forest_train_features dataframe. 我猜你有一个forest_train_features数据框。 you can do this: 你可以这样做:

feats = forest_test_features.keys()
model = RandomForestRegressor()
model.fit(forest_train_features[feats], forest_train_features['price'])
prediction = model.predict(forest_test_features[feats])

the columns should be in the same order then. 列的顺序应相同。 You can also do the same preprocessing for the train and test in the same DataFrame and then split it 您还可以对火车进行相同的预处理,并在同一DataFrame中进行测试,然后将其拆分

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM