根据列和另一个字典中的索引列表创建新的 dataframe 列

Question

Given the following dataframe and list of dictionaries:给定以下 dataframe 和字典列表：

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict([
                        {'id': '912SAFD', 'key': 3, 'list_index': [0]},
                        {'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
                        {'id': '712SAFD', 'key': 5, 'list_index': [2]}])

designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]}, 
           {'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]}, 
           {'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]

Dataframe output: Dataframe output：

        id  key list_index
0  912SAFD    3        [0]
1  812SAFD    4     [0, 1]
2  712SAFD    5        [2]

Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index' for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?如果不使用显式循环（如果可能），是否可以遍历'list_index'中每一行的列表，提取值并使用它们按索引访问字典列表，然后根据中的值创建新列字典？

Here is an example of the expected result:以下是预期结果的示例：

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b    NaN    NaN         NaN
1  812SAFD    4     [0, 1]      b      c    NaN         NaN
2  712SAFD    5        [2]    NaN    NaN      d          fr

If 'lang' is not empty, it should be added as a column to the dataframe by using the color_id value combined with an underscore and its own name as the column name.如果'lang'不为空，则应将其作为列添加到 dataframe 中，方法是将color_id值结合下划线和它自己的名称作为列名。 For example: 609092_lang .例如： 609092_lang 。

Any help would be much appreciated.任何帮助将非常感激。

Answer 1

# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]

df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()

print(df)

output: output：

>>>
color_id       id  key lang 609090 609091 609092
0         712SAFD    5   fr    NaN    NaN      d
1         812SAFD    4           b      c    NaN
2         912SAFD    3           b    NaN    NaN

Answer 2

First, we need to change the designs dictionary to get the relevant data and create a mapper that maps indices to dict values.首先，我们需要更改designs字典以获取相关数据并创建一个将索引映射到字典值的映射器。 Use enumerate and dict.setdefault for that:为此使用enumerate和dict.setdefault ：

designs_dict = {} 
for i, des in enumerate(designs):
    color_id = des['designs'][0]['color_id']
    designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
    if des['designs'][0]['lang'] != '':
        designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})

Now designs_dict looks like this:现在designs_dict看起来像这样：

{0: [{609090: 'b'}], 
 1: [{609091: 'c'}], 
 2: [{609092: 'd'}, {'609092_lang': 'fr'}]}

Then然后

(i) explode "list_index" and for each index there, map "designs_dict" to it; (i) 分解“ explode ”，对于那里的每个索引， map “designs_dict”； then explode again to get rid of lists然后再次explode以摆脱列表

(ii) construct a DataFrame from (i); (ii) 从 (i) 构建 DataFrame； groupby the index and use first to shrink the DataFrame按索引分组并first使用来缩小groupby

(iii) join (ii) to df (iii) join (ii) 到df

s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)

Final output:最终 output：

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b   None   None        None
1  812SAFD    4     [0, 1]      b      c   None        None
2  712SAFD    5        [2]   None   None      d          fr

根据列和另一个字典中的索引列表创建新的 dataframe 列

问题描述

2 个解决方案

解决方案1
1 2022-01-21 17:18:40

解决方案2
0 2022-01-21 17:38:15

根据列和另一个字典中的索引列表创建新的 dataframe 列

问题描述

2 个解决方案

解决方案1 1 2022-01-21 17:18:40

解决方案2 0 2022-01-21 17:38:15

解决方案1
1 2022-01-21 17:18:40

解决方案2
0 2022-01-21 17:38:15