简体   繁体   English

DataFrame 按具有字典值的列分组

[英]DataFrame groupby a column which has dictionary values

I'm having a dataframe which contains a column as dictionary.我有一个 dataframe ,其中包含一个作为字典的列。 And I need to groupby the column by the dictionary values.我需要按字典值对列进行分组。 For example,例如,

import pandas as pd
    
data = [
         {
            "name":"xx",
            "values":{
                "element":[
                    {
                        "path":"path1/id1"
                    },
                    {
                        "path":"path2/id1"
                    }
                ],
                "nonrequired":[
                    {}
                ]
            }
         },
        {
                "name":"yy",
                "values":{
                    "element":[
                        {
                            "path":"path1/id2"
                        },
                        {
                            "path":"path2/id2"
                        }
                    ],
                    "nonrequired":[
                        {}
                    ]
                }
             }
       ]

df = pd.DataFrame(data)

What I'm looking for,我在寻找什么,

  1. I want to groupby the column "values" by inside specific key.我想按内部特定键对“值”列进行分组。
  2. The grouping should be values->element->path分组应该是值->元素->路径
  3. The grouping should be based on the partial path values.分组应基于部分路径值。 For example if path="path1/id2", the grouping should be based on path="path1"例如如果 path="path1/id2",分组应该基于 path="path1"
  4. After grouping I need to extract the result as dictionary.分组后,我需要将结果提取为字典。

Expected result:预期结果:

result = {
            'path1': [
                        {
                            "name":'xx',
                            "renamecolumn":['id1','id2']
                        }
                    ],
            'path2': [
                        {
                            "name":'yy',
                            "renamecolumn":['id1','id2']
                        }
                    ]
        }

Still not 100% sure of the logic of the final dictionary creation as the example input and output don't quite match up.仍然不能 100% 确定最终字典创建的逻辑作为示例输入,并且 output 不太匹配。 However, here is how you can extract the values and you can create your desired dictionary from there.但是,您可以通过以下方式提取值,然后从那里创建所需的字典。

# ectract the values and split them on the forward slash
df['split'] = df['values'].apply(lambda x: [item['path'].split('/') for item in x['element']])

# generate the path and ids columns
df['path'] = df['split'].apply(lambda x: [x[i][0] for i in range(0,len(x))])
df['ids'] = df['split'].apply(lambda x: [x[i][1] for i in range(0,len(x))])

# separate out all the lists and 
result = df.drop(['values', 'split'], axis=1) \
  .explode('ids').explode('path').drop_duplicates()

Result is: Result是:

  name   path  ids
0   xx  path1  id1
0   xx  path2  id1
1   yy  path1  id2
1   yy  path2  id2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我如何获得数据帧的一列的值是一个字典 - How can I get the values of a column of a dataframe which are a dictionary 如何访问字典列表中的值,该列表是 oneliner 中 dataframe 中的一列 - How to access values from lists of dictionary which is a column in dataframe in oneliner 如何编写一个以字典键作为列名和字典值作为列值的excel文件? - How to write an excel file which has dictionary keys as column name and dictionary values as column values? 查询具有值为列表的pandas数据帧列 - Querying a pandas dataframe column which has values as list 我有数据框。 我需要创建一个以行为键的字典,以“True”作为字典值的列 - I have dataframe. I need to create a dictionary with row as the key and columns which has 'True' as the values of the dictionary 在使用 groupby 函数进行分组并尝试打印数据帧后,它缺少与我分组的列中的值 - After groupingby using groupby function and trying to print the dataframe, its missing the values in the column with which i grouped Groupby给出所选DataFrame列的值的百分位数 - Groupby given percentiles of the values of the chosen DataFrame column 按 dataframe 的列值合并两个数据帧 - Merge two dataframes groupby the column values of a dataframe 熊猫:生成一个数据框列,其值取决于数据框的另一列 - Pandas: Generate a Dataframe column which has values depending on another column of a dataframe 从 dataframe 列创建一个字典,该列在其单元格中有多个值 - Create a dictionary from dataframe column which has more than one value in its cell
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM