简体   繁体   中英

Fastest way to convert a list of dictionaries (each having multiple sub-dictionaries) into a single dataframe

I currently have a list of dictionaries shown below:

temp_indices_=[{0: {12:11,11:12}}, {0: {14:13,13:14}}, {0: {16:15,15:16}}, {0: {20:19,19:20}},{0: {24: 23, 23: 24, 22: 24}, 1: {24: 22, 23: 22, 22: 23}},{0: {28: 27, 27: 28, 26: 28}, 1: {28: 26, 27: 26, 26: 27}}]

To convert the list into a dataframe, the following code is called:

  temp_indices= pd.DataFrame()
  
  for ind in range(len(temp_indices_)):
       # print(ind)
        temp_indices = pd.concat([temp_indices,pd.DataFrame(temp_indices_[ind][0].items())],axis=0)
  temp_indices = temp_indices.rename(columns={0:'ind',1:'label_ind'})

An example output from temp_indices is shown below which should concat all dictionaries into one dataframe:

   ind  label_ind
0   12  11
1   11  12
0   14  13
1   13  14
0   16  15
1   15  16
0   20  19
1   19  20
0   24  23
1   23  24
2   22  24
0   28  27
1   27  28
2   26  28
0   28  26 
1   27  26  
2   26 27

To improve speed I have tried out pd.Series(temp_indices_).explode().reset_index() as well as pd.DataFrame(map(lambda i: pd.DataFrame(i[0].items()), temp_indices_)) but can not drill down to the core dictionary to convert it to a dataframe.

爆破法

Use list comprehension for speedup:

  • Three loops have been used inside list comprehension . One for iterating over the list of dictionaries. Second for accessing values from dictionary. And thired for accessing key,value pair along with increasing index.
  • Then make dataframe from resultant list.
  • Since column named 'label' contains tuple of values so break it using df['label'].tolist()
  • Finally delete the column named 'label'
data = [(ind,list(value.items())[ind]) for i in temp_indices_ for value in i.values() for ind in range(len(value))]
df = pd.DataFrame(data, columns =["Index","label"])
df[['ind', 'label_ind']] = pd.DataFrame(df['label'].tolist(), index=df.index)
df.drop(['label'], axis=1, inplace=True)
print(df)

        Index  ind  label_ind
    0       0   12         11
    1       1   11         12
    2       0   14         13
    3       1   13         14
    4       0   16         15
    5       1   15         16
    6       0   20         19
    7       1   19         20
    8       0   24         23
    9       1   23         24
    10      2   22         24
    11      0   24         22
    12      1   23         22
    13      2   22         23
    14      0   28         27
    15      1   27         28
    16      2   26         28
    17      0   28         26
    18      1   27         26
    19      2   26         27

This just sounds like a problem that can be solved through recursion with the final output being used to create a DataFrame .

def unpacker(data, parent_idx=None):
    final = []
    
    if isinstance(data, list):
        for row in data:
            for k, v in row.items():
                if isinstance(v, dict):
                    unpacked = unpacker(v, parent_idx=k)
                    for row1 in unpacked:
                        final.append(row1)
    else:
        for k1, v1 in data.items():
            final.append((parent_idx, k1, v1))
    
    return final

l = unpacker(temp_indices_)
df = pd.DataFrame(l, columns=["Index", "Ind", "Label_Ind"])
print(df)

    Index  Ind  Label_Ind
0       0   12         11
1       0   11         12
2       0   14         13
3       0   13         14
4       0   16         15
5       0   15         16
6       0   20         19
7       0   19         20
8       0   24         23
9       0   23         24
10      0   22         24
11      1   24         22
12      1   23         22
13      1   22         23
14      0   28         27
15      0   27         28
16      0   26         28
17      1   28         26
18      1   27         26
19      1   26         27

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM