简体   繁体   English

根据列和另一个字典中的索引列表创建新的 dataframe 列

[英]Create new dataframe columns based on lists of indices in a column and another dictionary

Given the following dataframe and list of dictionaries:给定以下 dataframe 和字典列表:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict([
                        {'id': '912SAFD', 'key': 3, 'list_index': [0]},
                        {'id': '812SAFD', 'key': 4, 'list_index': [0, 1]},
                        {'id': '712SAFD', 'key': 5, 'list_index': [2]}])

designs = [{'designs': [{'color_id': 609090, 'value': 'b', 'lang': ''}]}, 
           {'designs': [{'color_id': 609091, 'value': 'c', 'lang': ''}]}, 
           {'designs': [{'color_id': 609092, 'value': 'd', 'lang': 'fr'}]}]

Dataframe output: Dataframe output:

        id  key list_index
0  912SAFD    3        [0]
1  812SAFD    4     [0, 1]
2  712SAFD    5        [2]

Without using explicit loops (if possible), is it feasible to iterate through the lists in 'list_index' for each row, extract the values and use them to access the list of dictionaries by index and then create new columns based on the values in the dictionaries?如果不使用显式循环(如果可能),是否可以遍历'list_index'中每一行的列表,提取值并使用它们按索引访问字典列表,然后根据中的值创建新列字典?

Here is an example of the expected result:以下是预期结果的示例:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b    NaN    NaN         NaN
1  812SAFD    4     [0, 1]      b      c    NaN         NaN
2  712SAFD    5        [2]    NaN    NaN      d          fr

If 'lang' is not empty, it should be added as a column to the dataframe by using the color_id value combined with an underscore and its own name as the column name.如果'lang'不为空,则应将其作为列添加到 dataframe 中,方法是将color_id值结合下划线和它自己的名称作为列名。 For example: 609092_lang .例如: 609092_lang

Any help would be much appreciated.任何帮助将非常感激。

# this is to get the inner dictionary and make a tidy dataframe from it
designs = [info for design in designs for info in design['designs']]

df_designs = pd.DataFrame(designs)
df = df.explode('list_index').merge(df_designs , left_on='list_index', right_index=True)
df = df.pivot(index=['id', 'key','lang'], columns = 'color_id', values = 'value').reset_index()

print(df)

output: output:

>>>
color_id       id  key lang 609090 609091 609092
0         712SAFD    5   fr    NaN    NaN      d
1         812SAFD    4           b      c    NaN
2         912SAFD    3           b    NaN    NaN

First, we need to change the designs dictionary to get the relevant data and create a mapper that maps indices to dict values.首先,我们需要更改designs字典以获取相关数据并创建一个将索引映射到字典值的映射器。 Use enumerate and dict.setdefault for that:为此使用enumeratedict.setdefault

designs_dict = {} 
for i, des in enumerate(designs):
    color_id = des['designs'][0]['color_id']
    designs_dict.setdefault(i, []).append({color_id : des['designs'][0]['value']})
    if des['designs'][0]['lang'] != '':
        designs_dict.setdefault(i, []).append({'{}_lang'.format(color_id) : des['designs'][0]['lang']})

Now designs_dict looks like this:现在designs_dict看起来像这样:

{0: [{609090: 'b'}], 
 1: [{609091: 'c'}], 
 2: [{609092: 'd'}, {'609092_lang': 'fr'}]}

Then然后

(i) explode "list_index" and for each index there, map "designs_dict" to it; (i) 分解“ explode ”,对于那里的每个索引, map “designs_dict”; then explode again to get rid of lists然后再次explode以摆脱列表

(ii) construct a DataFrame from (i); (ii) 从 (i) 构建 DataFrame; groupby the index and use first to shrink the DataFrame按索引分组并first使用来缩小groupby

(iii) join (ii) to df (iii) join (ii) 到df

s_from_designs = df['list_index'].explode().map(designs_dict).explode()
df_from_designs = pd.DataFrame(s_from_designs.tolist(), index=s_from_designs.index).groupby(level=0).first()
out = df.join(df_from_designs)

Final output:最终 output:

        id  key list_index 609090 609091 609092 609092_lang
0  912SAFD    3        [0]      b   None   None        None
1  812SAFD    4     [0, 1]      b      c   None        None
2  712SAFD    5        [2]   None   None      d          fr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Dataframe,创建一个新列,其值基于另一列的索引 - Dataframe, creating a new column with values based on another column's indices 根据复杂的字典为 dataframe 创建一个新列 - Create a new column for a dataframe based on a complicated dictionary 创建一个新的 Dataframe 列,其中包含来自两个现有列的字典,每个列都包含列表 - Create a new Dataframe column containing a dictionary from two existing columns each containing lists 如何创建一个新列与 dataframe 中的列表与另一个 dataframe 的索引相匹配? - How can I create a new columns matching a column with lists in a dataframe with the indexes of another dataframe? 根据其他列和字典创建一个新列 - Create a new column based on other columns and a dictionary 基于另一个多个数据框列的新列 - New column based on another multiple dataframe columns Python数据框:基于另一列创建列 - Python Dataframe: Create columns based on another column 使用基于现有列和字典的值创建新的数据框列? - Create new dataframe column with values based existing column AND on dictionary? 如何从 dataframe 中包含列表的列创建新列 - How can I create a new columns from a column with lists in a dataframe 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM