[英]ValueError: Some errors were detected when using np.genfromtxt
我正在尝试编写一个加载数据 function,它将文件作为参数并返回 X 和 y 作为 numpy arrays。
def load_student_data(filename):
X, y = None, None
rows = []
with open(filename) as file:
data = csv.reader(file)
for row in file:
rows.append(row)
print(rows)
# finding index of X
Xindex = ()
for i in X_COLUMN_NAMES:
Xindex += (rows[0].index(i),)
#finding index of y
yindex = rows[0].index(Y_COLUMN_NAME)
print(yindex)
X = np.array(np.genfromtxt(rows, delimiter = ",", usecols = Xindex))
y = np.array(np.genfromtxt(rows, delimiter = ",", usecols = yindex))
return X, y
但是,当我运行此代码时,出现以下错误:
我尝试将分隔符更改为“;” 但是我得到了 33 列而不是 13 列。我还尝试过使用列表和元组作为 usecols 参数的输入,但也无济于事。 我不知道错误来自哪里。
我要阅读的文件在这里给出:student-mat.csv from https://archive.ics.uci.edu/ml/datasets/Student+Performance
将不胜感激。 先感谢您。
我建议将此文件加载为结构化数组:
In [73]: data = np.genfromtxt('student-mat.csv',delimiter=';',dtype=None,names=True, encoding=None)
In [74]: data.shape
Out[74]: (395,)
In [75]: data.dtype
Out[75]: dtype([('school', '<U4'), ('sex', '<U3'), ('age', '<i4'), ('address', '<U3'), ('famsize', '<U5'), ('Pstatus', '<U3'), ('Medu', '<i4'), ('Fedu', '<i4'), ('Mjob', '<U10'), ('Fjob', '<U10'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4'), ('studytime', '<i4'), ('failures', '<i4'), ('schoolsup', '<U5'), ('famsup', '<U5'), ('paid', '<U5'), ('activities', '<U5'), ('nursery', '<U5'), ('higher', '<U5'), ('internet', '<U5'), ('romantic', '<U5'), ('famrel', '<i4'), ('freetime', '<i4'), ('goout', '<i4'), ('Dalc', '<i4'), ('Walc', '<i4'), ('health', '<i4'), ('absences', '<i4'), ('G1', '<U4'), ('G2', '<U4'), ('G3', '<i4')])
In [76]: data[:3]
Out[76]:
array([('"GP"', '"F"', 18, '"U"', '"GT3"', '"A"', 4, 4, '"at_home"', '"teacher"', '"course"', '"mother"', 2, 2, 0, '"yes"', '"no"', '"no"', '"no"', '"yes"', '"yes"', '"no"', '"no"', 4, 3, 4, 1, 1, 3, 6, '"5"', '"6"', 6),
('"GP"', '"F"', 17, '"U"', '"GT3"', '"T"', 1, 1, '"at_home"', '"other"', '"course"', '"father"', 1, 2, 0, '"no"', '"yes"', '"no"', '"no"', '"no"', '"yes"', '"yes"', '"no"', 5, 3, 3, 1, 1, 3, 4, '"5"', '"5"', 6),
('"GP"', '"F"', 15, '"U"', '"LE3"', '"T"', 1, 1, '"at_home"', '"other"', '"other"', '"mother"', 1, 2, 3, '"yes"', '"no"', '"yes"', '"no"', '"yes"', '"yes"', '"yes"', '"no"', 4, 3, 2, 2, 3, 3, 10, '"7"', '"8"', 10)],
dtype=[('school', '<U4'), ('sex', '<U3'), ('age', '<i4'), ('address', '<U3'), ('famsize', '<U5'), ('Pstatus', '<U3'), ('Medu', '<i4'), ('Fedu', '<i4'), ('Mjob', '<U10'), ('Fjob', '<U10'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4'), ('studytime', '<i4'), ('failures', '<i4'), ('schoolsup', '<U5'), ('famsup', '<U5'), ('paid', '<U5'), ('activities', '<U5'), ('nursery', '<U5'), ('higher', '<U5'), ('internet', '<U5'), ('romantic', '<U5'), ('famrel', '<i4'), ('freetime', '<i4'), ('goout', '<i4'), ('Dalc', '<i4'), ('Walc', '<i4'), ('health', '<i4'), ('absences', '<i4'), ('G1', '<U4'), ('G2', '<U4'), ('G3', '<i4')])
字段的子集:
In [78]: data[['school','sex']][:3]
Out[78]:
array([('"GP"', '"F"'), ('"GP"', '"F"'), ('"GP"', '"F"')],
dtype={'names':['school','sex'], 'formats':['<U4','<U3'], 'offsets':[0,16], 'itemsize':480})
这会将某些列加载为字符串,将其他列加载为整数。
或将其加载为字符串:
In [79]: data1 = np.genfromtxt('student-mat.csv',delimiter=';',dtype=str, encoding=None)
In [80]: data1.shape
Out[80]: (396, 33)
In [81]: data1[:3,:10]
Out[81]:
array([['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu',
'Fedu', 'Mjob', 'Fjob'],
['"GP"', '"F"', '18', '"U"', '"GT3"', '"A"', '4', '4',
'"at_home"', '"teacher"'],
['"GP"', '"F"', '17', '"U"', '"GT3"', '"T"', '1', '1',
'"at_home"', '"other"']], dtype='<U12')
您可以将usecols
添加到其中任何一个。
In [82]: data = np.genfromtxt('student-mat.csv',delimiter=';',dtype=None,names=True, encoding=None, usecols=[1,10,11,12])
In [83]: data.shape
Out[83]: (395,)
In [84]: data[:3]
Out[84]:
array([('"F"', '"course"', '"mother"', 2),
('"F"', '"course"', '"father"', 1),
('"F"', '"other"', '"mother"', 1)],
dtype=[('sex', '<U3'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4')])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.