Given the following dataframe and list of dictionaries:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict([
{'id': '912SAFD', 'key': 3, 'list_index': [0]},
{'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
{'id': '712SAFD', 'key': 5, 'list_index': [2]}])
designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]},
{'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]},
{'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]
Dataframe output:
id key list_index
0 912SAFD 3 [0]
1 812SAFD 4 [0, 1]
2 712SAFD 5 [2]
Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index'
for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?
Here is an example of the expected result:
id key list_index 609090 609091 609092 609092_lang
0 912SAFD 3 [0] b NaN NaN NaN
1 812SAFD 4 [0, 1] b c NaN NaN
2 712SAFD 5 [2] NaN NaN d fr
If 'lang'
is not empty, it should be added as a column to the dataframe by using the color_id
value combined with an underscore and its own name as the column name. For example: 609092_lang
.
Any help would be much appreciated.
# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()
print(df)
output:
>>>
color_id id key lang 609090 609091 609092
0 712SAFD 5 fr NaN NaN d
1 812SAFD 4 b c NaN
2 912SAFD 3 b NaN NaN
First, we need to change the designs
dictionary to get the relevant data and create a mapper that maps indices to dict values. Use enumerate
and dict.setdefault
for that:
designs_dict = {}
for i, des in enumerate(designs):
color_id = des['designs'][0]['color_id']
designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
if des['designs'][0]['lang'] != '':
designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})
Now designs_dict
looks like this:
{0: [{609090: 'b'}],
1: [{609091: 'c'}],
2: [{609092: 'd'}, {'609092_lang': 'fr'}]}
Then
(i) explode
"list_index" and for each index there, map
"designs_dict" to it; then explode
again to get rid of lists
(ii) construct a DataFrame from (i); groupby
the index and use first
to shrink the DataFrame
(iii) join
(ii) to df
s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)
Final output:
id key list_index 609090 609091 609092 609092_lang
0 912SAFD 3 [0] b None None None
1 812SAFD 4 [0, 1] b c None None
2 712SAFD 5 [2] None None d fr
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.