简体   繁体   English

ValueError:设置具有序列的数组元素。 从csv读取的数据

[英]ValueError: setting an array element with a sequence. on data read from csv

I am trying to load data from csv by row, then create 2d array out of each row and store it inside array: 我试图按行从csv加载数据,然后从每行创建2d数组并将其存储在数组中:

loading: 正在加载:

with open('data_more.csv', newline='') as csvfile:
    data = list(csv.reader(csvfile))

parsing: 解析:

def getTrainingData():
    label_data = []
    for i in range( 0 , len(data) - 1):
        y = list(data[i][1:41:1])
        y = list(map(lambda x: list(map(lambda z: int(z),x)),y))
        y = create2Darray(y)
        label_data.append(y)
    labelY = np.array(label_data,dtype=float)

create2Darray func: create2Darray函数:

def create2Darray( arr ):
    final_arr = []
    index = 0
    while( index < len(arr)):
        temp = arr[index:index+4:1]
        final_arr.append(temp)
        index+=4
    return final_arr

This is simple task, yet i keep recieving erro: 这是简单的任务,但我一直收到错误提示:

ValueError: setting an array element with a sequence. ValueError:设置具有序列的数组元素。

I have read that its related to situation when the shape of elements isnt same. 我已经读到它与元素形状不同时的情况有关。 However when i print shape of all elements inside labelY it outputs same shape. 但是,当我打印labelY内所有元素的形状时,它会输出相同的形状。

What is causing this problem then? 那是什么引起这个问题呢? The problem occurs on this line 该行出现问题

labelY = np.array(label_data,dtype=float)

my csv has format 我的csv具有格式

number, number, number

basicly N numbers in row separated by "," example thanks for help. 行basicly N个相隔“” 例如感谢您的帮助。

Let's start from the beginning: 让我们从头开始:

  1. You seem to want to iterate through every line of your file to create an array. 您似乎想遍历文件的每一行以创建一个数组。 The iteration should be over range(0, len(data)) , not range(0, len(data) - 1) : the last element of the range is exclusive , so you are currently skipping the last line. 迭代应该在range(0, len(data))之上,而不是range(0, len(data) - 1) :范围的最后一个元素是Exclusive ,所以您当前正在跳过最后一行。 In fact, you can write simply range(len(data)) , or what is even more Pythonic, do 实际上,您可以简单地编写range(len(data)) ,或者甚至更像Python一样

     for y in data: y = y[1:41] 
  2. Based on what comes later, you want the 40 elements of y starting with the second element. 根据后面的内容,您希望y的40个元素从第二个元素开始。 In that case y[1:41] is correct (you don't need the trailing :1 ). 在这种情况下, y[1:41]是正确的(您不需要尾随:1 )。 If you didn't mean to skip the first element, use y[0:40] , or more Pythonically y[:40] . 如果您不想跳过第一个元素,请使用y[0:40] ,或者使用Python更好的y[:40] Remember that the indexing is zero-based and the stop index is exclusive . 请记住,索引是从零开始的,而停止索引是排他的

  3. Each element of your y list is not a number. y列表中的每个元素都不是数字。 It is a string, which you read from a file. 这是一个字符串,您可以从文件中读取。 Normally, you could convert it to a list of numbers using 通常,您可以使用以下命令将其转换为数字列表

     y = [float(x) for x in y] 

    OR 要么

     y = list(map(float, y)) 

    Your code is instead creating a nested list for each element, splitting it by its digits. 您的代码将为每个元素创建一个嵌套列表,并按数字对其进行拆分。 Is this really what you intend? 这真的是您想要的吗? It certainly does not seem that way from the rest of the question. 从其余的问题来看,肯定不是那样。

  4. create2Darray seems to expect a list of 4n numbers, and break it into a 2D list of size n-by-4 . create2Darray似乎期望包含4n数字的列表,并将其分解为大小为n-by-4的2D列表。 If you want to keep using pure Python at this point, you can shorten the code using range : 如果您现在想继续使用纯Python,可以使用range缩短代码:

     def create2Darray(arr): return [arr[i:i + 4] for i in range(0, len(arr), 4)] 
  5. The result of the 2D operation is appended to a 3D list with label_data.append(y) . 2D操作的结果通过label_data.append(y)附加到3D列表中。 Currently, because of the digit splitting, label_data is a 4D list with a ragged 4th dimension. 当前,由于数字拆分, label_data是第4维label_data的4D列表。 It is pretty inefficient to append a list that way. 这样添加列表是非常低效的。 You would do much better to have a small function containing the statements in the body of your for loop, and use that in a list comprehension. 对于在for循环主体中包含语句的小型函数,并在列表理解中使用它,您会做得更好。
  6. Finally, you convert your 4D array (which should probably be 3D), into a numpy array. 最后,将4D数组(可能应该是3D)转换为numpy数组。 This operation fails because your numbers don't all have the same number of digits. 此操作失败,因为您的数字并非都具有相同的数字。 Once you fix step #3, the error will go away. 一旦您解决了第3步,该错误就会消失。 There still remains the question of why you want dtype=np.float when you explicitly converted everything to an int , but that is for you to figure out. 仍然存在一个问题,当您将所有内容显式转换为int时,为什么要dtype=np.float ,但这是您要找出的。
  7. Don't forget to add a return value to getTrainingData ! 不要忘记为getTrainingData添加一个返回值!

TL;DR TL; DR

The simplest thing you can really do though, is to do all the transformations after you convert the file to a 2D numpy array. 但是,您真正能做的最简单的事情是在将文件转换为2D numpy数组后进行所有转换。 Your program could be rewritten as 您的程序可以重写为

with open('data_more.csv', newline='') as file:
    reader = csv.reader(file)
    data = [float(x) for x in line[1:] for line in reader]
data = np.array(data).reshape(data.shape[0], -1, 4)

With a copy-n-paste from your link: 从您的链接中复制n粘贴:

In [367]: txt="""frame_video_02_0.jpg,126,37,147,112,100,41,126,116,79,34,96,92,
     ...: 68,31,77,88,1
     ...: """
In [368]: txt=txt.splitlines()
In [369]: data =np.genfromtxt(txt, delimiter=',')

data is a 2d array of floats: data是浮点数的二维数组:

In [370]: data.shape
Out[370]: (3, 401)
In [371]: data[0,:10]
Out[371]: array([ nan, 126.,  37., 147., 112., 100.,  41., 126., 116.,  79.])

The first column is nan , because it's a text that can't be made into a float. 第一列是nan ,因为它是无法制成浮点数的文本。 I could remove it with data = data[:, 1:] 我可以用data = data[:, 1:]删除它

I can load the file names separately: 我可以分别加载文件名:

In [373]: labels = np.genfromtxt(txt, delimiter=',', usecols=[0],dtype=None,encoding=None)
In [374]: labels
Out[374]: 
array(['frame_video_02_0.jpg', 'frame_video_02_50.jpg',
       'frame_video_02_100.jpg'], dtype='<U22')

I haven't tried debug your code, though with a file like this, reading the numbers into a Python list of lists shouldn't be hard. 我没有尝试调试您的代码,尽管使用这样的文件,将数字读入Python列表列表并不难。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 收到“ ValueError:使用序列设置数组元素。” - receiving “ValueError: setting an array element with a sequence.” “ ValueError:使用序列设置数组元素。” TensorFlow - “ValueError: setting an array element with a sequence.” TensorFlow ValueError:使用序列设置数组元素。 熊猫 - ValueError: setting an array element with a sequence. for Pandas ValueError:使用序列设置数组元素。 在DBSCAN上,没有丢失的尺寸 - ValueError: setting an array element with a sequence. on DBSCAN, no missing dimensionality ValueError:使用序列设置数组元素。 在TensorFlow中sess.run() - ValueError: setting an array element with a sequence. in TensorFlow sess.run() ValueError:使用序列设置数组元素。 在 session.run 中 - ValueError: setting an array element with a sequence. in session.run 稀疏矩阵加法会产生“ ValueError:设置具有序列的数组元素。” - Sparse Matrix Addition yields 'ValueError: setting an array element with a sequence.' ValueError:使用序列设置数组元素。 无法解决 - ValueError: setting an array element with a sequence. cant be solved ValueError:使用序列设置数组元素。 - 特征工程 - ValueError: setting an array element with a sequence. - feature engineering 替换和过滤值错误:“ValueError:使用序列设置数组元素。” - Replace and filter value Error: 'ValueError: setting an array element with a sequence.'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM