简体   繁体   English

pandas python中数据帧的随机排列带来了不同的回归结果?

[英]Shuffle rows of a dataframe in pandas python brings about different regression results?

I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? 我试图在数据帧中随机化我的行 - 在应用线性回归之前的数据,但我意识到在行随机化之后回归结果不同,这不应该是这种情况? Codes which i have tried using: 我尝试使用的代码:

Without row randomisation: 
data 
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)

With row randomisation: 
Method 1: 
data = data.sample(frac=1)

Method 2:
data = data.sample(frac=1, axis=1)

Method 3: 
from sklearn.utils import shuffle
data = shuffle(data)

Method 4: 
data = data.sample(frac=1, axis=1).reset_index(drop=True)

Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. 在我尝试的4行随机化方法中,只有方法4给出了与未应用随机化的方法相同的结果。 I thought row randomisation does not affects the regression results in any case? 我认为行随机化在任何情况下都不会影响回归结果?

Methods 2 and 4 are identical? 方法2和4是相同的吗?

Regression results should not differ if you are applying the same type of regression to the same data (randomized or not). 如果您将相同类型的回归应用于相同的数据(随机或不随机),则回归结果不应该有所不同。 You should be using axis = 0 to randomize rows of dataframes, axis = 1 randomizes the columns. 您应该使用axis = 0来随机化数据帧行, axis = 1使列随机化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM