Assigning multi-dimensional Numpy Array to a Pandas Series

Question

Background

I have a numpy.ndarray of shape==(95,15) . I already have the desired Series.Index names, of len(my_index)==95 . I want to create a Series in which every index is associated with one of the rows of my 95x15 numpy.ndarray .

Variable Names

pfit : 95x15 numpy.ndarray
my_index : 95x1 list(str)

Steps Taken

The following fails with corresponding error:

my_series = pd.Series(index=my_index, dtype="object", data=pfit)
Traceback (most recent call last):

  File "C:\Users\gford1\AppData\Local\Temp\1/ipykernel_22244/2329315457.py", line 1, in <module>
    my_series = pd.Series(index=my_index, dtype="object", data=pfit)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\series.py", line 439, in __init__
    data = sanitize_array(data, index, dtype, copy)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 577, in sanitize_array
    subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)

  File "C:\Users\gford1\AppData\Local\Programs\Spyder\pkgs\pandas\core\construction.py", line 628, in _sanitize_ndim
    raise ValueError("Data must be 1-dimensional")

ValueError: Data must be 1-dimensional

I therefore have to iterate through my_index and add the pfit arrays, one-by-one:

my_series = pd.Series(index=my_index, dtype="object")
i = 0
for idx in my_series.index:
    my_series[idx] = pfit[i]
    i+=1

#2 works, but I believe that there is a better / faster way that I am unaware of.

Answer 1

In [283]: pfit=np.arange(12).reshape(3,4)
In [284]: pfit
Out[284]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [285]: my_index=[1,2,3]

Your construct:

In [286]: my_series = pd.Series(index=my_index, dtype="object")
     ...: i = 0
     ...: for idx in my_series.index:
     ...:     my_series[idx] = pfit[i]
     ...:     i+=1
     ...: 
In [287]: my_series
Out[287]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [288]: my_series.values
Out[288]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

My suggestion produces the same thing:

In [289]: list(pfit)
Out[289]: [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])]
In [290]: S = pd.Series(index=my_index, data=list(pfit))
In [291]: S
Out[291]: 
1      [0, 1, 2, 3]
2      [4, 5, 6, 7]
3    [8, 9, 10, 11]
dtype: object
In [292]: S.values
Out[292]: 
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8,  9, 10, 11])],
      dtype=object)

recreating the 2d array:

In [293]: np.stack(S.values)
Out[293]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Dataframe:

In [294]: df = pd.DataFrame(index=my_index, data=pfit)
In [295]: df
Out[295]: 
   0  1   2   3
1  0  1   2   3
2  4  5   6   7
3  8  9  10  11
In [296]: df.values
Out[296]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Assigning multi-dimensional Numpy Array to a Pandas Series

Question

Background

Variable Names

Steps Taken

1 answers

solution1
1 2021-12-23 17:50:59

Assigning multi-dimensional Numpy Array to a Pandas Series

Question

Background

Variable Names

Steps Taken

1 answers

solution1 1 2021-12-23 17:50:59

solution1
1 2021-12-23 17:50:59