简体   繁体   中英

sklearn MinMaxScaler - ValueError: Expected 2D array, got 1D array instead - data as series objects

I want to use MinMaxScaler from sklearn to scale test and training data before analyzing it.

I've been following a tutorial ( https://mc.ai/an-introduction-on-time-series-forecasting-with-simple-neura-networks-lstm/ ), but I get an error message ValueError: Expected 2D array, got 1D array instead .

I tried looking at Print predict ValueError: Expected 2D array, got 1D array instead , but I get an error message if I try train = train.reshape(-1, 1) or test = test.reshape(-1, 1) because they are series (error message AttributeError: 'Series' object has no attribute 'reshape' )

How do I best resolve this?

# Import libraries 
import pandas as pd 
from sklearn.preprocessing import MinMaxScaler 

# Create MWE dataset 
data = [['1981-11-03', 510], ['1982-11-03', 540], ['1983-11-03', 480],
   ['1984-11-03', 490], ['1985-11-03', 492], ['1986-11-03', 380],
   ['1987-11-03', 440], ['1988-11-03', 640], ['1989-11-03', 560], 
   ['1990-11-03', 660], ['1991-11-03', 610], ['1992-11-03', 480]] 

df = pd.DataFrame(data, columns = ['Date', 'Tickets']) 

# Set 'Date' to datetime data type 
df['Date'] = pd.to_datetime(df['Date'])

# Set 'Date to index   
df = df.set_index(['Date'], drop=True)

# Split dataset into train and test  
split_date = pd.Timestamp('1989-11-03')
df =  df['Tickets']
train = df.loc[:split_date]
test = df.loc[split_date:]

# Scale train and test data 
scaler = MinMaxScaler(feature_range=(-1, 1))
train_sc = scaler.fit_transform(train)
test_sc = scaler.transform(test)

X_train = train_sc[:-1]
y_train = train_sc[1:]
X_test = test_sc[:-1]
y_test = test_sc[1:]

# ERROR MESSAGE 
  ValueError: Expected 2D array, got 1D array instead:
  array=[510. 540. 480. 490. 492. 380. 440. 640. 560.].
  Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

The line

df =  df['Tickets']

converts your data into a pd.Series.

If you want to get a dataframe instead, you can use

df =  df[['Tickets']]

which should fix your problem; dataframes can be directly input into the scaler fit function, without the need for reshaping.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM