2D array to rows in a dataframe column

Question

I have an numpy.ndarray as given below:

x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
labels = [1,0]
df = pd.DataFrame({"a":x,"labels":labels})

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-458-79198b72cdcb> in <module>()
      1 x = np.array([[1, 2, 3], [4, 5, 6]], np.int32).reshape(-1,1)
      2 labels = [1,0,1,0]
----> 3 df = pd.DataFrame({"a":x,"labels":labels})

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

I tried to reshape the np.ndarray by x.reshape(-1,1) but the result didn't change. Each of the lists in ndarray x must be a row in the dataframe. I'm expecting to get:

           a  labels
0  [1, 2, 3]       1
1  [4, 5, 6]       0

Answer 1

The problem is that since a is a multidimensional, homogeneous array, pandas doesn't know how to split it into several rows. In general pandas does not support embedded structures. Think about the case with a higher dimensional array as (3,4,2) , how should this be dealt with?

Note that the dataframe columns are created through separate calls to the pd.Series constructor. By directly trying to construct a series from the ndarray, we get the same explicit error:

pd.Series(x)
    ...
    480     elif subarr.ndim > 1:
    481         if isinstance(data, np.ndarray):
--> 482             raise Exception("Data must be 1-dimensional")
    483         else:
    484             subarr = com.asarray_tuplesafe(data, dtype=dtype)
Exception: Data must be 1-dimensional

So you have to turn the array into an iterable, where each of its values will be a row of the dataframe. For that you could unpack the numpy array's values into separate lists:

df = pd.DataFrame({"a":[*x], "labels":labels}) # or .."a":list(x)..

print(df)
           a  labels
0  [1, 2, 3]       1
1  [4, 5, 6]       0

2D array to rows in a dataframe column

Question

1 answers

solution1
1 ACCPTED 2020-04-17 14:12:04

2D array to rows in a dataframe column

Question

1 answers

solution1 1 ACCPTED 2020-04-17 14:12:04

solution1
1 ACCPTED 2020-04-17 14:12:04