I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? Codes which i have tried using:
Without row randomisation:
data
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)
With row randomisation:
Method 1:
data = data.sample(frac=1)
Method 2:
data = data.sample(frac=1, axis=1)
Method 3:
from sklearn.utils import shuffle
data = shuffle(data)
Method 4:
data = data.sample(frac=1, axis=1).reset_index(drop=True)
Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. I thought row randomisation does not affects the regression results in any case?
Methods 2 and 4 are identical?
Regression results should not differ if you are applying the same type of regression to the same data (randomized or not). You should be using axis = 0
to randomize rows of dataframes, axis = 1
randomizes the columns.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.