简体   繁体   中英

numpy array with a list to pandas dataframe

How do I convert the following numpy array to a pandas dataframe. At present I get 1-dimension exception. I tried data.flatten() but that doesn't do it.

array([ (1329865020L, 67, [84, 89, 80, 69, 32, 104, 116, 109, 108, 62, 10, 60, 104, 116, 109, 108, 32, 108, 97, 110, 103, 61, 34, 101]),
   (171844206L, 32, [32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 60, 104, 101, 97, 100, 62, 10, 32]),
   (1008738336L, 109, [101, 116, 97, 32, 105, 100, 61, 34, 98, 98, 45, 98, 111, 111, 116, 115, 116, 114, 97, 112, 34, 32, 100, 97]),
   ...,
   (573317693L, 97, [112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 73, 68, 34, 58, 34, 49, 56, 52, 49, 50, 56, 52, 34, 44]),
   (1920099618L, 111, [114, 66, 101, 97, 99, 111, 110, 34, 58, 34, 98, 97, 109, 46, 110, 114, 45, 100, 97, 116, 97, 46, 110, 101]),
   (573317748L, 97, [112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 84, 105, 109, 101, 34, 58, 49, 54, 57, 125, 60, 47, 115, 99])], 
  dtype=[('ldr', '<u4'), ('ver', 'u1'), ('dat', 'u1', (24,))])

Thank you. Best regards

Your third column will be seen as a two-dimensional array by Pandas, hence you get your exception: Exception: Data must be 1-dimensional .

You can force Pandas (and NumPy) to consider that column differently, by casting it to a Python list. Then it becomes a simply two-step process to transform your structured array into a Pandas DataFrame:

import numpy as np
import pandas as pd

a = np.array([ (1329865020, 67, [84, 89, 80, 69, 32, 104, 116, 109, 108, 62, 10, 60, 104, 116, 109, 108, 32, 108, 97, 110, 103, 61, 34, 101]),
   (171844206, 32, [32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 60, 104, 101, 97, 100, 62, 10, 32]),
   (1008738336, 109, [101, 116, 97, 32, 105, 100, 61, 34, 98, 98, 45, 98, 111, 111, 116, 115, 116, 114, 97, 112, 34, 32, 100, 97]),
   (573317693, 97, [112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 73, 68, 34, 58, 34, 49, 56, 52, 49, 50, 56, 52, 34, 44]),
   (1920099618, 111, [114, 66, 101, 97, 99, 111, 110, 34, 58, 34, 98, 97, 109, 46, 110, 114, 45, 100, 97, 116, 97, 46, 110, 101]),
   (573317748, 97, [112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 84, 105, 109, 101, 34, 58, 49, 54, 57, 125, 60, 47, 115, 99])],
  dtype=[('ldr', '<u4'), ('ver', 'u1'), ('dat', 'u1', (24,))])

df = pd.DataFrame.from_records(a[['ldr', 'ver']])
df['dat'] = a['dat'].tolist()
print(df)
print(df.dtypes)

Output:

          ldr  ver                                                dat
0  1329865020   67  [84, 89, 80, 69, 32, 104, 116, 109, 108, 62, 1...
1   171844206   32  [32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 3...
2  1008738336  109  [101, 116, 97, 32, 105, 100, 61, 34, 98, 98, 4...
3   573317693   97  [112, 112, 108, 105, 99, 97, 116, 105, 111, 11...
4  1920099618  111  [114, 66, 101, 97, 99, 111, 110, 34, 58, 34, 9...
5   573317748   97  [112, 112, 108, 105, 99, 97, 116, 105, 111, 11...
ldr    uint32
ver     uint8
dat    object
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM