简体   繁体   English

多维 Numpy 数组到 Dataframe,错误:引发 ValueError(“数据必须是一维”) ValueError:数据必须是一维

[英]Multidimensional Numpy array to Dataframe, Error: raise ValueError("Data must be 1-dimensional") ValueError: Data must be 1-dimensional

I want to train a neural network and I have the labels (one-hot encoded) and the images both as numpy arrays.我想训练一个神经网络,我有标签(单热编码)和图像都是 numpy arrays。 I want to add them to a DataFrame to use them as input for the training.我想将它们添加到 DataFrame 以将它们用作训练的输入。 I tried to recreate an example, it looks something like this:我试图重新创建一个示例,它看起来像这样:

import pandas as pd
import numpy as np

label_onehot_example = np.asarray([[0, 0, 0, 0, 1], [0, 0, 0, 1, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 0, 0.], [1, 0, 0, 0, 0]])
images_example = np.random.randint(0, 1, (8, 10, 10, 3))

test_df = pd.DataFrame(data={'images': images_example, 'labels' : label_onehot_example})

The error I get is "raise ValueError("Data must be 1-dimensional") ValueError: Data must be 1-dimensional"我得到的错误是“raise ValueError("Data must be 1-dimensional") ValueError: Data must be 1-dimensional”

I guess it is due to the shape of the image-input (in my example that's (8, 10, 10,3)) but I don't know how to fix it.我猜这是由于图像输入的形状(在我的示例中为 (8, 10, 10,3)),但我不知道如何修复它。 I thought of looping through the image-array and adding the images and labels one by one to the DataFrame but that seems very inefficient.我想过遍历图像数组并将图像和标签一一添加到 DataFrame 但这似乎非常低效。

The values in your dictionary should be a list in this case, as pandas expects some kind of iterable I think for you column values.在这种情况下,您的字典中的值应该是一个列表,因为 pandas 期望我认为您的列值具有某种可迭代性。 You can use the normal 'list()' constructor to change the BWHC numpy array to a list with B elements of shape WHC.您可以使用普通的“list()”构造函数将 BWHC numpy 数组更改为具有 B 形状 WHC 元素的列表。 (same for the labels) (标签相同)

import pandas as pd
import numpy as np

label_onehot_example = np.asarray([[0, 0, 0, 0, 1], [0, 0, 0, 1, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 0, 0.], [1, 0, 0, 0, 0]])
images_example = np.random.randint(0, 1, (8, 10, 10, 3))

test_df = pd.DataFrame(data={'images': list(images_example), 'labels' : list(label_onehot_example)})

print(test_df.head())
>>>                                               images                     labels
0  [[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ...  [0.0, 0.0, 0.0, 0.0, 1.0]
1  [[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ...  [0.0, 0.0, 0.0, 1.0, 0.0]
2  [[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ...  [1.0, 0.0, 0.0, 0.0, 0.0]
3  [[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ...  [1.0, 0.0, 0.0, 0.0, 0.0]
4  [[[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], ...  [1.0, 0.0, 0.0, 0.0, 0.0]

print(test_df.images[0].shape)
>>> (10, 10, 3)

PS: Which version of pandas did you use? PS:你用的是哪个版本的pandas? When I first ran you code I got a different error then you reported (mine was "ValueError: If using all scalar values, you must pass an index").当我第一次运行你的代码时,我得到了一个不同的错误,然后你报告了(我的是“ValueError:如果使用所有标量值,你必须传递一个索引”)。 I used pandas 1.1.3我用 pandas 1.1.3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM