I have the following function to load data in my jupyter notebook
#function to load data
def load_dataset(x_path, y_path):
x = pd.read_csv(os.sep.join([DATA_DIR, x_path]),
dtype=DTYPES,
index_col="ID")
y = pd.read_csv(os.sep.join([DATA_DIR, y_path]))
return x, y
and the data has the below types defined
DTYPES = {
'ID':'int64',
'columnA':'str',
'columnB':'float32',
'columnC':'float64',
'columnD':'datetime64[ns]'}
The header of the above csv is as below
ID columnA columnB columnC columnD
941215 SALE 15000 56 10/1/2018
when I call the method in my notebook
from model import load_dataset
X_train, y_train = load_dataset("X_train.zip", "y_train.zip")
I get the below error
2055 raise TypeError("data type '{}' not understood".format(dtype))
2057 # Any invalid dtype (such as pd.Timestamp) should raise an error.
TypeError: data type ' int64' not understood
I think you need specify dtypes
in numpy
:
DTYPES = {
'ID':np.int64,
'columnA':'str',
'columnB':np.float32,
'columnC':np.float64}
For datetimes need different approach - parameter parse_dates
in read_csv
:
def load_dataset(x_path, y_path):
x = pd.read_csv(os.sep.join([DATA_DIR, x_path]),
dtype=DTYPES,
index_col="ID"
parse_dates='columnD')
y = pd.read_csv(os.sep.join([DATA_DIR, y_path]))
return x, y
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.