简体   繁体   中英

Create pd.DataFrame from dictionary with multi-dimensional array

I've the following dictionary:

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]
        }

I want to convert it to a pd.DataFrame() , expecting this:

id       ColA        ColB        ColC
0         1           4           4
1         2           4           6
2         3           4           0
3         1           4           
4         2           4
5         3           4
6         1
7         2
8         3

How can I do that? I'm trying

pd.DataFrame(dictAll.items(), columns=['ColA', 'ColB', 'ColC'])

But it obviously doesn't work!

Here is how:

import pandas as pd
import numpy as np

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]}

df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)

Output:

   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3  

Now, let's have a look at the problem one step at a time.

  1. The first thing we might try is simply:

     df = pd.DataFrame(dictA) print(df)

    Which, of course, return this error:

     ValueError: arrays must all be same length
  2. So now we need a way to be able to create dataframes from a dict with arrays of different lengths. For that, we can:

     df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()])) print(df)

    Output:

     ABC 0 [1, 2, 3] [4, 4, 4] [4, 6, 0] 1 [1, 2, 3] [4, 4, 4] NaN 2 [1, 2, 3] NaN NaN
  3. We want the dataframe to be vertical, so for each iteration, flatten out the lists with a list comprehension:

     df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])) print(df)

    Output:

     ABC 0 1 4.0 4.0 1 2 4.0 6.0 2 3 4.0 0.0 3 1 4.0 NaN 4 2 4.0 NaN 5 3 4.0 NaN 6 1 NaN NaN 7 2 NaN NaN 8 3 NaN NaN
  4. Now we want to replace all the NaN s with blanks. For that, we need to import numpy as np , and do:

     df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '') print(df)

    Output:

     ABC 0 1 4 4 1 2 4 6 2 3 4 0 3 1 4 4 2 4 5 3 4 6 1 7 2 8 3
  5. Finally use formatted string to convert the letters into "Col" letters:

     df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '') print(df)

    Output:

     ColA ColB ColC 0 1 4 4 1 2 4 6 2 3 4 0 3 1 4 4 2 4 5 3 4 6 1 7 2 8 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM