简体   繁体   中英

numpy array to pandas dataframe conversion - ValueError

I have the following numpy array called 'data' -

array([['ksr-usconeng101', 'C', '632.3', '1'],
       ['ksr-usconeng101', 'D', '242.9', '2'],
       ['ksr-usconeng158', 'C', '1044.5', '3'],
       ['ksr-usconeng158', 'D', '2771.2', '4'],
       ['ksr-usconeng158', 'G', '7.3', '5'],
       ['ksr-usconeng163', 'C', '1597.0', '6'],
       ['ksr-usconeng163', 'D', '1676.3', '7'],
       ['server', 'drive', 'size', '']],
      dtype='<U15')

I'm trying to convert it to a dataframe -

pd.DataFrame(data=data[0:-1,0:3],
                   index = data[0:-1,-1],
                   columns = data[-1:, 0:-1])

Data -

data[0:-1,0:3]
Out[145]: 
array([['ksr-usconeng101', 'C', '632.3'],
       ['ksr-usconeng101', 'D', '242.9'],
       ['ksr-usconeng158', 'C', '1044.5'],
       ['ksr-usconeng158', 'D', '2771.2'],
       ['ksr-usconeng158', 'G', '7.3'],
       ['ksr-usconeng163', 'C', '1597.0'],
       ['ksr-usconeng163', 'D', '1676.3']],
      dtype='<U15')

Index -

data[0:-1,-1]
Out[146]: 
array(['1', '2', '3', '4', '5', '6', '7'],
      dtype='<U15')

Columns -

data[-1:, 0:-1]
Out[147]: 
array([['server', 'drive', 'size']],
      dtype='<U15')

However, python doesn't agree and responds with -

ValueError: Shape of passed values is (3, 7), indices imply (1, 7)

Please suggest what am I missing ..

The columns need to be 1D:

df = pd.DataFrame(data=data[:-1,:3],
                  index=data[:-1,-1],
                  columns=data[-1, :-1])
print(df)

Output:

         server drive    size
1  ksr-usconeng101     C   632.3
2  ksr-usconeng101     D   242.9
3  ksr-usconeng158     C  1044.5
4  ksr-usconeng158     D  2771.2
5  ksr-usconeng158     G     7.3
6  ksr-usconeng163     C  1597.0
7  ksr-usconeng163     D  1676.3

You have:

>>> data[-1:, 0:-1].shape
(1, 3)

But need:

>>> data[-1, :-1].shape
(3,)

Try this

pd.DataFrame(data=data[0:-1,0:3],
                   index = data[0:-1,-1],
                   columns = data[-1:, 0:-1].tolist())
import  numpy as np, pandas as pd

df = pd.DataFrame(data[0:7, 0:3].flatten().reshape(7,3),
       columns = ["a", "b", "c"])

            a           b     c
0   ksr-usconeng101     C   632.3
1   ksr-usconeng101     D   242.9
2   ksr-usconeng158     C   1044.5
3   ksr-usconeng158     D   2771.2
4   ksr-usconeng158     G   7.3
5   ksr-usconeng163     C   1597.0
6   ksr-usconeng163     D   1676.3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM