简体   繁体   中英

How do I store a numpy array as an object in a pandas dataframe?

I have a series of images, that are stored in a CVS file as one string per image, the string is a list of 9216 space separated integers. I have a function that converts this to a 96x96 numpy array.

I wish to store this numpy array in a column of my dataframe instead of the string.

However when i retrieve the item from the column it is no longer usable as a numpy array.

Data can be dowloaded from here, the last column in the training.cvs file.

https://www.kaggle.com/c/facial-keypoints-detection/data

import pandas as pd
import numpy as np

df_train = pandas.read_csv("training.csv")

def convert_to_np_arr(im_as_str):
    im = [int(i) for i in im_as_str.split()]
    im = np.asarray(im)
    im = im.reshape((96, 96))
    return im

df_train['Im_as_np'] = df_train.Image.apply(convert_to_np_arr)

im = df_train.Im_as_np[0]
plt.imshow(im, cmap = cm.Greys_r)
plt.show()

If instead of using the function and applying and storing the image, I use the code directly it works as expected

import pandas as pd
import numpy as np

df_train = pandas.read_csv("training.csv")

im = df_train.Image[0]
im = [int(i) for i in im.split()]
im = np.asarray(im)
im = im.reshape((96, 96))

plt.imshow(im, cmap = cm.Greys_r)
plt.show()

Pandas does not tend to be a suitable data structure for handling images. Generally, the assumption with Pandas is that the number of columns is much less than the number of rows. This of course doesn't need to be true, and for DataFrames that are small in both dimensions, it rarely matters. But for mathematical operations that are natural in a spatial sense, the relational structure of the DataFrame is not appropriate, and this shows as the number of columns grows. Given this, I would suggest just using NumPy's csv-reading abilities and working with it as a 2d array or an image object, with eg scikits.image.

The way you store it should be correct. It's just harder to access data. Instead of im=df_train.Im_as_np[0] use ix to access data:

im=df_train.ix[0,'Im_as_np']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM