python numpy：数组数组

Question

I'm trying to build a numpy array of arrays of arrays with the following code below. 我正在尝试使用下面的代码构建一个numpy数组的数组。

Which gives me a 这给了我一个

ValueError: setting an array element with a sequence.

My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure.. 我的猜测是，在numpy中，我需要从一开始就将数组声明为多维，但是我不确定。

How can I fix the the code below so that I can build array of array of arrays? 如何修复下面的代码，以便可以构建array of arrays？

from PIL import Image
import pickle
import os
import numpy

indir1 = 'PositiveResize'

trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)

i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
            try:
                    im = Image.open(os.path.join(root,f))
                    Imv=im.load()
                    x,y=im.size
                    pixelv = numpy.empty(6400)
                    ind=0
                    for i in range(x):
                            for j in range(y):
                                    temp=float(Imv[j,i])
                                    temp=float(temp/255.0)
                                    pixelv[ind]=temp
                                    ind+=1
                    if i<40000:
                            trainpixels[tr]=pixelv
                            tr+=1
                    elif i<45000:
                            validpixels[va]=pixelv
                            va+=1
                    else:
                            testpixels[te]=pixelv
                            te+=1
                    print str(i)+'\t'+str(f)
                    i+=1
            except IOError:
                    continue

trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels

Answer 1

Refer to the examples in numpy.empty : 请参考numpy.empty的示例：

>>> np.empty([2, 2])
array([[ -9.74499359e+001,   6.69583040e-309],
       [  2.13182611e-314,   3.06959433e-309]])         #random

Give your images a shape with the N dimensions: 给您的图像一个N维的形状：

testpixels = numpy.empty([96, 96])

Answer 2

Don't try to smash your entire object into a numpy array. 不要尝试将整个对象粉碎成一个numpy数组。 If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together. 如果您有不同的东西，请为每个对象使用一个numpy数组，然后使用适当的数据结构将它们结合在一起。

For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays. 例如，如果要对图像进行计算，则可能只想将像素和标签存储在单独的数组中。

trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
    trainpixels[i] = ...
    trainlabels[i] = ...

To access an individual image's data: 要访问单个图像的数据：

imagepixels = trainpixels[253]
imagelabel = trainlabels[253]

And you can easily do stuff like compute summary statistics over the images. 而且，您可以轻松地执行诸如在图像上计算摘要统计信息之类的工作。

meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)

If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. 如果您确实希望所有数据都在同一个对象中，则应该使用Eelco Hoogendoorn建议的struct数组。 Some example usage: 一些示例用法：

# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
    trainimages['label'][i] = ...
    trainimages['pixel'][i] = ...

# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])

# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]

Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list. 或者，如果要分别处理每个图像，则可以将每个图像的数据存储在单独的数组中，然后将它们绑定到一个元组或字典中，然后将所有这些数据存储在一个列表中。

trainimages = []
for i in range(10000):
    pixels = ...
    label = ...
    image = (pixels, label)
    trainimages.append(image)

Now to access a single images data: 现在访问单个图像数据：

imagepixels, imagelabel = trainimages[253]

This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images. 这使得访问单个图像更加直观，但是由于所有数据都不在一个大的numpy数组中，因此无法轻松访问跨图像工作的功能。

python numpy：数组数组

问题描述

2 个解决方案

解决方案1
1 2014-08-05 15:49:27

解决方案2
1 已采纳 2014-08-05 16:30:43

python numpy：数组数组

问题描述

2 个解决方案

解决方案1 1 2014-08-05 15:49:27

解决方案2 1 已采纳 2014-08-05 16:30:43

解决方案1
1 2014-08-05 15:49:27

解决方案2
1 已采纳 2014-08-05 16:30:43