DataFrame 按具有字典值的列分组

Question

I'm having a dataframe which contains a column as dictionary.我有一个 dataframe ，其中包含一个作为字典的列。 And I need to groupby the column by the dictionary values.我需要按字典值对列进行分组。 For example,例如，

import pandas as pd
    
data = [
         {
            "name":"xx",
            "values":{
                "element":[
                    {
                        "path":"path1/id1"
                    },
                    {
                        "path":"path2/id1"
                    }
                ],
                "nonrequired":[
                    {}
                ]
            }
         },
        {
                "name":"yy",
                "values":{
                    "element":[
                        {
                            "path":"path1/id2"
                        },
                        {
                            "path":"path2/id2"
                        }
                    ],
                    "nonrequired":[
                        {}
                    ]
                }
             }
       ]

df = pd.DataFrame(data)

What I'm looking for,我在寻找什么，

I want to groupby the column "values" by inside specific key.我想按内部特定键对“值”列进行分组。
The grouping should be values->element->path分组应该是值->元素->路径
The grouping should be based on the partial path values.分组应基于部分路径值。 For example if path="path1/id2", the grouping should be based on path="path1"例如如果 path="path1/id2"，分组应该基于 path="path1"
After grouping I need to extract the result as dictionary.分组后，我需要将结果提取为字典。

Expected result:预期结果：

result = {
            'path1': [
                        {
                            "name":'xx',
                            "renamecolumn":['id1','id2']
                        }
                    ],
            'path2': [
                        {
                            "name":'yy',
                            "renamecolumn":['id1','id2']
                        }
                    ]
        }

Answer 1

Still not 100% sure of the logic of the final dictionary creation as the example input and output don't quite match up.仍然不能 100% 确定最终字典创建的逻辑作为示例输入，并且 output 不太匹配。 However, here is how you can extract the values and you can create your desired dictionary from there.但是，您可以通过以下方式提取值，然后从那里创建所需的字典。

# ectract the values and split them on the forward slash
df['split'] = df['values'].apply(lambda x: [item['path'].split('/') for item in x['element']])

# generate the path and ids columns
df['path'] = df['split'].apply(lambda x: [x[i][0] for i in range(0,len(x))])
df['ids'] = df['split'].apply(lambda x: [x[i][1] for i in range(0,len(x))])

# separate out all the lists and 
result = df.drop(['values', 'split'], axis=1) \
  .explode('ids').explode('path').drop_duplicates()

Result is: Result是：

  name   path  ids
0   xx  path1  id1
0   xx  path2  id1
1   yy  path1  id2
1   yy  path2  id2

DataFrame 按具有字典值的列分组

问题描述

What I'm looking for,我在寻找什么，

Expected result:预期结果：

1 个解决方案

解决方案1
0 已采纳 2021-02-08 15:36:10

DataFrame 按具有字典值的列分组

问题描述

What I'm looking for,我在寻找什么，

Expected result:预期结果：

1 个解决方案

解决方案1 0 已采纳 2021-02-08 15:36:10

解决方案1
0 已采纳 2021-02-08 15:36:10