[英]ValueError: setting an array element with a sequence. on data read from csv
I am trying to load data from csv by row, then create 2d array out of each row and store it inside array: 我试图按行从csv加载数据,然后从每行创建2d数组并将其存储在数组中:
loading: 正在加载:
with open('data_more.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
parsing: 解析:
def getTrainingData():
label_data = []
for i in range( 0 , len(data) - 1):
y = list(data[i][1:41:1])
y = list(map(lambda x: list(map(lambda z: int(z),x)),y))
y = create2Darray(y)
label_data.append(y)
labelY = np.array(label_data,dtype=float)
create2Darray func: create2Darray函数:
def create2Darray( arr ):
final_arr = []
index = 0
while( index < len(arr)):
temp = arr[index:index+4:1]
final_arr.append(temp)
index+=4
return final_arr
This is simple task, yet i keep recieving erro: 这是简单的任务,但我一直收到错误提示:
ValueError: setting an array element with a sequence.
ValueError:设置具有序列的数组元素。
I have read that its related to situation when the shape of elements isnt same. 我已经读到它与元素形状不同时的情况有关。 However when i print shape of all elements inside labelY it outputs same shape.
但是,当我打印labelY内所有元素的形状时,它会输出相同的形状。
What is causing this problem then? 那是什么引起这个问题呢? The problem occurs on this line
该行出现问题
labelY = np.array(label_data,dtype=float)
my csv has format 我的csv具有格式
number, number, number
basicly N numbers in row separated by "," example thanks for help. 行basicly N个相隔“” 例如感谢您的帮助。
Let's start from the beginning: 让我们从头开始:
You seem to want to iterate through every line of your file to create an array. 您似乎想遍历文件的每一行以创建一个数组。 The iteration should be over
range(0, len(data))
, not range(0, len(data) - 1)
: the last element of the range is exclusive , so you are currently skipping the last line. 迭代应该在
range(0, len(data))
之上,而不是range(0, len(data) - 1)
:范围的最后一个元素是Exclusive ,所以您当前正在跳过最后一行。 In fact, you can write simply range(len(data))
, or what is even more Pythonic, do 实际上,您可以简单地编写
range(len(data))
,或者甚至更像Python一样
for y in data: y = y[1:41]
Based on what comes later, you want the 40 elements of y
starting with the second element. 根据后面的内容,您希望
y
的40个元素从第二个元素开始。 In that case y[1:41]
is correct (you don't need the trailing :1
). 在这种情况下,
y[1:41]
是正确的(您不需要尾随:1
)。 If you didn't mean to skip the first element, use y[0:40]
, or more Pythonically y[:40]
. 如果您不想跳过第一个元素,请使用
y[0:40]
,或者使用Python更好的y[:40]
。 Remember that the indexing is zero-based and the stop index is exclusive . 请记住,索引是从零开始的,而停止索引是排他的 。
Each element of your y
list is not a number. y
列表中的每个元素都不是数字。 It is a string, which you read from a file. 这是一个字符串,您可以从文件中读取。 Normally, you could convert it to a list of numbers using
通常,您可以使用以下命令将其转换为数字列表
y = [float(x) for x in y]
OR 要么
y = list(map(float, y))
Your code is instead creating a nested list for each element, splitting it by its digits. 您的代码将为每个元素创建一个嵌套列表,并按数字对其进行拆分。 Is this really what you intend?
这真的是您想要的吗? It certainly does not seem that way from the rest of the question.
从其余的问题来看,肯定不是那样。
create2Darray
seems to expect a list of 4n
numbers, and break it into a 2D list of size n-by-4
. create2Darray
似乎期望包含4n
数字的列表,并将其分解为大小为n-by-4
的2D列表。 If you want to keep using pure Python at this point, you can shorten the code using range
: 如果您现在想继续使用纯Python,可以使用
range
缩短代码:
def create2Darray(arr): return [arr[i:i + 4] for i in range(0, len(arr), 4)]
label_data.append(y)
. label_data.append(y)
附加到3D列表中。 Currently, because of the digit splitting, label_data
is a 4D list with a ragged 4th dimension. label_data
是第4维label_data
的4D列表。 It is pretty inefficient to append a list that way. for
loop, and use that in a list comprehension. for
循环主体中包含语句的小型函数,并在列表理解中使用它,您会做得更好。 dtype=np.float
when you explicitly converted everything to an int
, but that is for you to figure out. int
时,为什么要dtype=np.float
,但这是您要找出的。 getTrainingData
! getTrainingData
添加一个返回值! TL;DR TL; DR
The simplest thing you can really do though, is to do all the transformations after you convert the file to a 2D numpy array. 但是,您真正能做的最简单的事情是在将文件转换为2D numpy数组后进行所有转换。 Your program could be rewritten as
您的程序可以重写为
with open('data_more.csv', newline='') as file:
reader = csv.reader(file)
data = [float(x) for x in line[1:] for line in reader]
data = np.array(data).reshape(data.shape[0], -1, 4)
With a copy-n-paste from your link: 从您的链接中复制n粘贴:
In [367]: txt="""frame_video_02_0.jpg,126,37,147,112,100,41,126,116,79,34,96,92,
...: 68,31,77,88,1
...: """
In [368]: txt=txt.splitlines()
In [369]: data =np.genfromtxt(txt, delimiter=',')
data
is a 2d array of floats: data
是浮点数的二维数组:
In [370]: data.shape
Out[370]: (3, 401)
In [371]: data[0,:10]
Out[371]: array([ nan, 126., 37., 147., 112., 100., 41., 126., 116., 79.])
The first column is nan
, because it's a text that can't be made into a float. 第一列是
nan
,因为它是无法制成浮点数的文本。 I could remove it with data = data[:, 1:]
我可以用
data = data[:, 1:]
删除它
I can load the file names separately: 我可以分别加载文件名:
In [373]: labels = np.genfromtxt(txt, delimiter=',', usecols=[0],dtype=None,encoding=None)
In [374]: labels
Out[374]:
array(['frame_video_02_0.jpg', 'frame_video_02_50.jpg',
'frame_video_02_100.jpg'], dtype='<U22')
I haven't tried debug your code, though with a file like this, reading the numbers into a Python list of lists shouldn't be hard. 我没有尝试调试您的代码,尽管使用这样的文件,将数字读入Python列表列表并不难。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.