[英]Create new dataframe columns based on lists of indices in a column and another dictionary
Given the following dataframe and list of dictionaries:给定以下 dataframe 和字典列表:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict([
{'id': '912SAFD', 'key': 3, 'list_index': [0]},
{'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
{'id': '712SAFD', 'key': 5, 'list_index': [2]}])
designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]},
{'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]},
{'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]
Dataframe output: Dataframe output:
id key list_index
0 912SAFD 3 [0]
1 812SAFD 4 [0, 1]
2 712SAFD 5 [2]
Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index'
for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?如果不使用显式循环(如果可能),是否可以遍历'list_index'
中每一行的列表,提取值并使用它们按索引访问字典列表,然后根据中的值创建新列字典?
Here is an example of the expected result:以下是预期结果的示例:
id key list_index 609090 609091 609092 609092_lang
0 912SAFD 3 [0] b NaN NaN NaN
1 812SAFD 4 [0, 1] b c NaN NaN
2 712SAFD 5 [2] NaN NaN d fr
If 'lang'
is not empty, it should be added as a column to the dataframe by using the color_id
value combined with an underscore and its own name as the column name.如果'lang'
不为空,则应将其作为列添加到 dataframe 中,方法是将color_id
值结合下划线和它自己的名称作为列名。 For example: 609092_lang
.例如: 609092_lang
。
Any help would be much appreciated.任何帮助将非常感激。
# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]
df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()
print(df)
output: output:
>>>
color_id id key lang 609090 609091 609092
0 712SAFD 5 fr NaN NaN d
1 812SAFD 4 b c NaN
2 912SAFD 3 b NaN NaN
First, we need to change the designs
dictionary to get the relevant data and create a mapper that maps indices to dict values.首先,我们需要更改designs
字典以获取相关数据并创建一个将索引映射到字典值的映射器。 Use enumerate
and dict.setdefault
for that:为此使用enumerate
和dict.setdefault
:
designs_dict = {}
for i, des in enumerate(designs):
color_id = des['designs'][0]['color_id']
designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
if des['designs'][0]['lang'] != '':
designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})
Now designs_dict
looks like this:现在designs_dict
看起来像这样:
{0: [{609090: 'b'}],
1: [{609091: 'c'}],
2: [{609092: 'd'}, {'609092_lang': 'fr'}]}
Then然后
(i) explode
"list_index" and for each index there, map
"designs_dict" to it; (i) 分解“ explode
”,对于那里的每个索引, map
“designs_dict”; then explode
again to get rid of lists然后再次explode
以摆脱列表
(ii) construct a DataFrame from (i); (ii) 从 (i) 构建 DataFrame; groupby
the index and use first
to shrink the DataFrame按索引分组并first
使用来缩小groupby
(iii) join
(ii) to df
(iii) join
(ii) 到df
s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)
Final output:最终 output:
id key list_index 609090 609091 609092 609092_lang
0 912SAFD 3 [0] b None None None
1 812SAFD 4 [0, 1] b c None None
2 712SAFD 5 [2] None None d fr
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.