简体   繁体   中英

Shuffle rows of a dataframe in pandas python brings about different regression results?

I am trying to randomise my rows in the dataframe - data before applying linear regression, but i realised the regression results differs after the rows are randomised which shouldn't be the case? Codes which i have tried using:

Without row randomisation: 
data 
X = data[feature_col]
y = data['median_price']
lr = LinearRegression()
lr.fit(X, y)

With row randomisation: 
Method 1: 
data = data.sample(frac=1)

Method 2:
data = data.sample(frac=1, axis=1)

Method 3: 
from sklearn.utils import shuffle
data = shuffle(data)

Method 4: 
data = data.sample(frac=1, axis=1).reset_index(drop=True)

Out of the 4 row randomisation methods i have tried, only Method 4 gives the same results as the one where no randomisation is applied. I thought row randomisation does not affects the regression results in any case?

Methods 2 and 4 are identical?

Regression results should not differ if you are applying the same type of regression to the same data (randomized or not). You should be using axis = 0 to randomize rows of dataframes, axis = 1 randomizes the columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM