简体   繁体   中英

How to assigned the values of a pandas dataframe into a multidimensional numpy array

I have a 2 dimensional pandas dataframe that has as the index the values "1, 2, 'NaN', 'NaN', 'NaN', 'NaN'" and the data [10, 20, 30, 40, 50, 60]. Now I would like to build a numpy array with the dimensionality (3,2). In the first entry of the first dimension of the array, the first two values of the dataframe should be assigned. In the second entry of the first dimension the 3rd and 4th value of the dataframe should be assigned and so on.

So actually the new array should look like this

在此处输入图像描述

I tried it with the following code but it did not work, as I get a "KeyError: 0"

import pandas as pd
import numpy as np
d = {'col1': [1, 2, 'NaN', 'NaN', 'NaN', 'NaN'], 'col2': [10, 20, 30, 40, 50 , 60]}

df1 = pd.DataFrame(data=d)
df1 = df1.set_index('col1')

firstDimensionOfTheArray = 3
secondDimensionOfTheArray = 2

array = np.zeros((firstDimensionOfTheArray, secondDimensionOfTheArray))

for i in range (0, firstDimensionOfTheArray):
    for j in range (0,  secondDimensionOfTheArray): 
        array [i, j] = df1 ['col2'] [i * secondDimensionOfTheArray + j]

Do you have any idea, how to do that?

Using the SO answer , can solve the problem of your written code by adding iloc :

array[i, j] = df1['col2'].iloc[i * secondDimensionOfTheArray + j]

or use iat instead of iloc . The solution will get the result for your example:

[[10. 20.]
 [30. 40.]
 [50. 60.]]

Then you can get the desired result by just transposing the array :

array = array.T

output:

[[10. 30. 50.]
 [20. 40. 60.]]

To build the numpy array, use to_numpy and reshape :

df1['col2'].to_numpy().reshape((2,3), order='F')

output:

array([[10, 30, 50],
       [20, 40, 60]])

Now, to make a new dataframe, wrap the above in a DataFrame constructor:

import string
pd.DataFrame(df1['col2'].to_numpy().reshape((2,3), order='F'),
             # the two lines below are only needed if you want
             # the same indexes as in your image
             index=list(df1.index[:2]),
             columns=list(string.ascii_uppercase[:3])
            )

output:

    A   B   C
1  10  30  50
2  20  40  60

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM