简体   繁体   中英

Create new dataframe columns based on lists of indices in a column and another dictionary

Given the following dataframe and list of dictionaries:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict([
                        {'id': '912SAFD', 'key': 3, 'list_index': [0]},
                        {'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
                        {'id': '712SAFD', 'key': 5, 'list_index': [2]}])

designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]}, 
           {'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]}, 
           {'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]

Dataframe output:

        id  key list_index
0  912SAFD    3        [0]
1  812SAFD    4     [0, 1]
2  712SAFD    5        [2]

Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index' for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?

Here is an example of the expected result:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b    NaN    NaN         NaN
1  812SAFD    4     [0, 1]      b      c    NaN         NaN
2  712SAFD    5        [2]    NaN    NaN      d          fr

If 'lang' is not empty, it should be added as a column to the dataframe by using the color_id value combined with an underscore and its own name as the column name. For example: 609092_lang .

Any help would be much appreciated.

# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]

df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()

print(df)

output:

>>>
color_id       id  key lang 609090 609091 609092
0         712SAFD    5   fr    NaN    NaN      d
1         812SAFD    4           b      c    NaN
2         912SAFD    3           b    NaN    NaN

First, we need to change the designs dictionary to get the relevant data and create a mapper that maps indices to dict values. Use enumerate and dict.setdefault for that:

designs_dict = {} 
for i, des in enumerate(designs):
    color_id = des['designs'][0]['color_id']
    designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
    if des['designs'][0]['lang'] != '':
        designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})

Now designs_dict looks like this:

{0: [{609090: 'b'}], 
 1: [{609091: 'c'}], 
 2: [{609092: 'd'}, {'609092_lang': 'fr'}]}

Then

(i) explode "list_index" and for each index there, map "designs_dict" to it; then explode again to get rid of lists

(ii) construct a DataFrame from (i); groupby the index and use first to shrink the DataFrame

(iii) join (ii) to df

s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)

Final output:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b   None   None        None
1  812SAFD    4     [0, 1]      b      c   None        None
2  712SAFD    5        [2]   None   None      d          fr

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM