简体   繁体   中英

Build dict from list of tuples combining two multi index dfs and column index

I have two multi-index dataframes: mean and std

arrays = [['A', 'A', 'B', 'B'], ['Z', 'Y', 'X', 'W']]

mean=pd.DataFrame(data={0.0:[np.nan,2.0,3.0,4.0], 60.0: [5.0,np.nan,7.0,8.0], 120.0:[9.0,10.0,np.nan,12.0]}, 
         index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
mean.columns.name='Times'

std=pd.DataFrame(data={0.0:[10.0,10.0,10.0,10.0], 60.0: [10.0,10.0,10.0,10.0], 120.0:[10.0,10.0,10.0,10.0]}, 
         index=pd.MultiIndex.from_arrays(arrays, names=('id', 'comp')))
std.columns.name='Times'

My task is to combine them in a dictionary with '{id:' as first level, followed by second level dictionary with '{comp:' and then for each comp a list of tuples, which combines the (time-points, mean, std). So, the result should look like that :

{'A': {
     'Z': [(60.0,5.0,10.0),
            (120.0,9.0,10.0)],
      'Y': [(0.0,2.0,10.0),
            (120.0,10.0,10.0)]
       },
  'B': {
     'X': [(0.0,3.0,10.0),
            (60.0,7.0,10.0)],
      'W': [(0.0,4.0,10.0),
            (60.0,8.0,10.0),
            (120.0,12.0,10.0)]
       }
 }

Additionally, when there is NaN in data, the triplets are left out, so value A,Z at time 0, A,Y at time 60 B,X at time 120.

How do I get there? I constructed already a dict of dict of list of tuples for a single line:

iter=0
{mean.index[iter][0]:{mean.index[iter][1]:list(zip(mean.columns, mean.iloc[iter], std.iloc[iter]))}}
>{'A': {'Z': [(0.0, 1.0, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]}}

Now, I need to extend to a dictionary with a loop over each line {inner dict) and adding the ids each {outer dict}. I started with iterrows and dic comprehension, but here I have problems, indexing with the iter ('A','Z') which i get from iterrows(), and building the whole dict, iteratively.

{mean.index[iter[1]]:list(zip(mean.columns, mean.loc[iter[1]], std.loc[iter[1]])) for (iter,row) in mean.iterrows()}

creates errors, and I would only have the inner loop

KeyError: 'the label [Z] is not in the [index]'

Thanks!

EDIT : I exchanged the numbers to float in this example, because here integers were generated before which was not consistent with my real data, and which would fail in following json dump.

Here is a solution using a defaultdict :

from collections import defaultdict

mean_as_dict = mean.to_dict(orient='index')
std_as_dict = std.to_dict(orient='index')

mean_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in mean_as_dict.items()}
std_clean_sorted = {k: sorted([(i, j) for i, j in v.items()]) for k, v in std_as_dict.items()}

sol = {k: [j + (std_clean_sorted[k][i][1],) for i, j in enumerate(v) if not np.isnan(j[1])] for k, v in mean_clean_sorted.items()}

solution = defaultdict(dict)

for k, v in sol.items():
    solution[k[0]][k[1]] = v

Resulting dict will be defaultdict object that you can change to dict easily:

solution = dict(solution)
con = pd.concat([mean, std])
primary = dict()
for i in set(con.index.values):
    if i[0] not in primary.keys():
        primary[i[0]] = dict()
    primary[i[0]][i[1]] = list()
    for x in con.columns:
        primary[i[0]][i[1]].append((x, tuple(con.loc[i[0]].loc[i[1][0].values)))

Here is sample output

I found a very comprehensive way of putting up this nested dict:

mean_dict_items=mean.to_dict(orient='index').items()
{k[0]:{u[1]:list(zip(mean.columns, mean.loc[u], std.loc[u]))
      for u,v in mean_dict_items if (k[0],u[1]) == u} for k,l in mean_dict_items}

creates:

{'A': {'Y': [(0.0, 2.0, 10.0), (60.0, nan, 10.0), (120.0, 10.0, 10.0)],
  'Z': [(0.0, nan, 10.0), (60.0, 5.0, 10.0), (120.0, 9.0, 10.0)]},
 'B': {'W': [(0.0, 4.0, 10.0), (60.0, 8.0, 10.0), (120.0, 12.0, 10.0)],
  'X': [(0.0, 3.0, 10.0), (60.0, 7.0, 10.0), (120.0, nan, 10.0)]}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM