[英]Shuffle rows of a dataframe in pandas python brings about different regression results?
I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? 我试图在数据帧中随机化我的行 - 在应用线性回归之前的数据,但我意识到在行随机化之后回归结果不同,这不应该是这种情况? Codes which i have tried using:
我尝试使用的代码:
Without row randomisation:
data
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)
With row randomisation:
Method 1:
data = data.sample(frac=1)
Method 2:
data = data.sample(frac=1, axis=1)
Method 3:
from sklearn.utils import shuffle
data = shuffle(data)
Method 4:
data = data.sample(frac=1, axis=1).reset_index(drop=True)
Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. 在我尝试的4行随机化方法中,只有方法4给出了与未应用随机化的方法相同的结果。 I thought row randomisation does not affects the regression results in any case?
我认为行随机化在任何情况下都不会影响回归结果?
Methods 2 and 4 are identical? 方法2和4是相同的吗?
Regression results should not differ if you are applying the same type of regression to the same data (randomized or not). 如果您将相同类型的回归应用于相同的数据(随机或不随机),则回归结果不应该有所不同。 You should be using
axis = 0
to randomize rows of dataframes, axis = 1
randomizes the columns. 您应该使用
axis = 0
来随机化数据帧行, axis = 1
使列随机化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.