ValueError：使用 np.genfromtxt 时检测到一些错误

Question

我正在尝试编写一个加载数据 function，它将文件作为参数并返回 X 和 y 作为 numpy arrays。

def load_student_data(filename):

X, y = None, None

rows = []
with open(filename) as file:
  data = csv.reader(file)
  for row in file:
    rows.append(row)
  print(rows)

# finding index of X
  Xindex = ()
  for i in X_COLUMN_NAMES:
    Xindex += (rows[0].index(i),)

#finding index of y
  yindex = rows[0].index(Y_COLUMN_NAME)
  print(yindex)

  X = np.array(np.genfromtxt(rows, delimiter = ",", usecols = Xindex))
  y = np.array(np.genfromtxt(rows, delimiter = ",", usecols = yindex))

return X, y

但是，当我运行此代码时，出现以下错误：

我尝试将分隔符更改为“;” 但是我得到了 33 列而不是 13 列。我还尝试过使用列表和元组作为 usecols 参数的输入，但也无济于事。 我不知道错误来自哪里。

我要阅读的文件在这里给出：student-mat.csv from https://archive.ics.uci.edu/ml/datasets/Student+Performance

将不胜感激。 先感谢您。

Answer 1

我建议将此文件加载为结构化数组：

In [73]: data = np.genfromtxt('student-mat.csv',delimiter=';',dtype=None,names=True, encoding=None)

In [74]: data.shape
Out[74]: (395,)

In [75]: data.dtype
Out[75]: dtype([('school', '<U4'), ('sex', '<U3'), ('age', '<i4'), ('address', '<U3'), ('famsize', '<U5'), ('Pstatus', '<U3'), ('Medu', '<i4'), ('Fedu', '<i4'), ('Mjob', '<U10'), ('Fjob', '<U10'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4'), ('studytime', '<i4'), ('failures', '<i4'), ('schoolsup', '<U5'), ('famsup', '<U5'), ('paid', '<U5'), ('activities', '<U5'), ('nursery', '<U5'), ('higher', '<U5'), ('internet', '<U5'), ('romantic', '<U5'), ('famrel', '<i4'), ('freetime', '<i4'), ('goout', '<i4'), ('Dalc', '<i4'), ('Walc', '<i4'), ('health', '<i4'), ('absences', '<i4'), ('G1', '<U4'), ('G2', '<U4'), ('G3', '<i4')])

In [76]: data[:3]
Out[76]: 
array([('"GP"', '"F"', 18, '"U"', '"GT3"', '"A"', 4, 4, '"at_home"', '"teacher"', '"course"', '"mother"', 2, 2, 0, '"yes"', '"no"', '"no"', '"no"', '"yes"', '"yes"', '"no"', '"no"', 4, 3, 4, 1, 1, 3,  6, '"5"', '"6"',  6),
       ('"GP"', '"F"', 17, '"U"', '"GT3"', '"T"', 1, 1, '"at_home"', '"other"', '"course"', '"father"', 1, 2, 0, '"no"', '"yes"', '"no"', '"no"', '"no"', '"yes"', '"yes"', '"no"', 5, 3, 3, 1, 1, 3,  4, '"5"', '"5"',  6),
       ('"GP"', '"F"', 15, '"U"', '"LE3"', '"T"', 1, 1, '"at_home"', '"other"', '"other"', '"mother"', 1, 2, 3, '"yes"', '"no"', '"yes"', '"no"', '"yes"', '"yes"', '"yes"', '"no"', 4, 3, 2, 2, 3, 3, 10, '"7"', '"8"', 10)],
      dtype=[('school', '<U4'), ('sex', '<U3'), ('age', '<i4'), ('address', '<U3'), ('famsize', '<U5'), ('Pstatus', '<U3'), ('Medu', '<i4'), ('Fedu', '<i4'), ('Mjob', '<U10'), ('Fjob', '<U10'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4'), ('studytime', '<i4'), ('failures', '<i4'), ('schoolsup', '<U5'), ('famsup', '<U5'), ('paid', '<U5'), ('activities', '<U5'), ('nursery', '<U5'), ('higher', '<U5'), ('internet', '<U5'), ('romantic', '<U5'), ('famrel', '<i4'), ('freetime', '<i4'), ('goout', '<i4'), ('Dalc', '<i4'), ('Walc', '<i4'), ('health', '<i4'), ('absences', '<i4'), ('G1', '<U4'), ('G2', '<U4'), ('G3', '<i4')])

字段的子集：

In [78]: data[['school','sex']][:3]
Out[78]: 
array([('"GP"', '"F"'), ('"GP"', '"F"'), ('"GP"', '"F"')],
      dtype={'names':['school','sex'], 'formats':['<U4','<U3'], 'offsets':[0,16], 'itemsize':480})

这会将某些列加载为字符串，将其他列加载为整数。

或将其加载为字符串：

In [79]: data1 = np.genfromtxt('student-mat.csv',delimiter=';',dtype=str, encoding=None)

In [80]: data1.shape
Out[80]: (396, 33)

In [81]: data1[:3,:10]
Out[81]: 
array([['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu',
        'Fedu', 'Mjob', 'Fjob'],
       ['"GP"', '"F"', '18', '"U"', '"GT3"', '"A"', '4', '4',
        '"at_home"', '"teacher"'],
       ['"GP"', '"F"', '17', '"U"', '"GT3"', '"T"', '1', '1',
        '"at_home"', '"other"']], dtype='<U12')

您可以将usecols添加到其中任何一个。

In [82]: data = np.genfromtxt('student-mat.csv',delimiter=';',dtype=None,names=True, encoding=None, usecols=[1,10,11,12])

In [83]: data.shape
Out[83]: (395,)

In [84]: data[:3]
Out[84]: 
array([('"F"', '"course"', '"mother"', 2),
       ('"F"', '"course"', '"father"', 1),
       ('"F"', '"other"', '"mother"', 1)],
      dtype=[('sex', '<U3'), ('reason', '<U12'), ('guardian', '<U8'), ('traveltime', '<i4')])

ValueError：使用 np.genfromtxt 时检测到一些错误

问题描述

1 个解决方案

解决方案1
0 2022-09-16 15:46:14

ValueError：使用 np.genfromtxt 时检测到一些错误

问题描述

1 个解决方案

解决方案1 0 2022-09-16 15:46:14

解决方案1
0 2022-09-16 15:46:14