简体   繁体   English

如何将 Pandas DataFrame 中的字典列表展平为几列?

[英]How to flatten a list of dicts from a Pandas DataFrame into several columns?

I have a pandas dataframe that looks like this:我有一个如下所示的 Pandas 数据框:

User | Query|                                 Filters                 
----------------------------------------------------------------------------------------- 
1    |  abc | [{u'Op': u'and', u'Type': u'date', u'Val': u'1992'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]
1    |  efg | [{u'Op': u'and', u'Type': u'date', u'Val': u'2000'},{u'Op': u'and', u'Type': u'col', u'Val': u'Blue'}] 
1    |  fgs | [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'col', u'Val': u'Red'}]        
2    |  hij | [{u'Op': u'and', u'Type': u'date', u'Val': u'2002'}]  
2    |  dcv | [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]     
2    |  tyu | [{u'Op': u'and', u'Type': u'date', u'Val': u'1999'},{u'Op': u'and', u'Type': u'col', u'Val': u'Yellow'}]     
3    |  jhg | [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'M'}]    
4    |  mlh | [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'}]  

The result that I expect:我期望的结果:

User| Query |  date | sex | col
-------------------------------- 
1   | abc   | 1992  |  F  |
1   | efg   | 2000  |     | Blue
1   | fgs   | 2001  |     | Red
2   | hij   | 2002  |     |
2   | dcv   | 2001  |  F  |
2   | tyu   | 1999  |     | Yellow
3   | jhg   | 2001  |     |
4   | mlh   | 2001  |  H  |

I'm using pandas 0.21.0 with python 2.7.我在 python 2.7 中使用 pandas 0.21.0。

Example data:示例数据:

df = pd.DataFrame([{'user': 1,'query': 'abc', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'1992'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 1,'query': 'efg', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2000'},{u'Op': u'and', u'Type': u'col', u'Val': u'Blue'}]},
              {'user': 1,'query': 'fgs', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'col', u'Val': u'Red'}]},
              {'user': 2 ,'query': 'hij', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2002'}]},
              {'user': 2 ,'query': 'dcv', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 2 ,'query': 'tyu', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'1999'},{u'Op': u'and', u'Type': u'col', u'Val': u'Yellow'}]},
              {'user': 3 ,'query': 'jhg', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'M'}]},
              {'user': 4 ,'query': 'mlh', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'}]},
             ])

I tried many solutions:我尝试了很多解决方案:

Any suggestions would be much appreciated!任何建议将不胜感激!

Assuming you have already imported your data, as defined in your MCWE:假设您已经按照 MCWE 中的定义导入了数据:

data = [{'user': 1,'query': 'abc', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'1992'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 1,'query': 'efg', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2000'},{u'Op': u'and', u'Type': u'col', u'Val': u'Blue'}]},
              {'user': 1,'query': 'fgs', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'col', u'Val': u'Red'}]},
              {'user': 2 ,'query': 'hij', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2002'}]},
              {'user': 2 ,'query': 'dcv', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 2 ,'query': 'tyu', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'1999'},{u'Op': u'and', u'Type': u'col', u'Val': u'Yellow'}]},
              {'user': 3 ,'query': 'jhg', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'M'}]},
              {'user': 4 ,'query': 'mlh', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'}]},
             ]

Then, you are looking for Pandas json_normalize method for data normalization:然后,您正在寻找用于数据规范化的 Pandas json_normalize方法:

from pandas.io.json import json_normalize
df = json_normalize(data, 'Filters', ['query', 'user'])

It returns a normalized DataFrame version where your column of json is expanded into eponymous typed columns:它返回一个规范化的 DataFrame 版本,其中您的json列扩展为同名类型的列:

     Op  Type     Val  user query
0   and  date    1992     1   abc
1   and   sex       F     1   abc
2   and  date    2000     1   efg
3   and   col    Blue     1   efg
4   and  date    2001     1   fgs
5   and   col     Red     1   fgs
6   and  date    2002     2   hij
7   and  date    2001     2   dcv
8   and   sex       F     2   dcv
9   and  date    1999     2   tyu
10  and   col  Yellow     2   tyu
11  and  date    2001     3   jhg
12  and   sex       M     3   jhg
13  and  date    2001     4   mlh

Now, you would pivot your DataFrame to convert Type modalities into columns:现在,您将旋转DataFrame 以将 Type 模式转换为列:

df = df.pivot_table(index=['user', 'query', 'Op'], columns='Type', aggfunc='first')

It leads to:它导致:

                   Val            
Type               col  date   sex
user query Op                     
1    abc   and    None  1992     F
     efg   and    Blue  2000  None
     fgs   and     Red  2001  None
2    dcv   and    None  2001     F
     hij   and    None  2002  None
     tyu   and  Yellow  1999  None
3    jhg   and    None  2001     M
4    mlh   and    None  2001  None

Finally, you can clean and reset index, if they bother you:最后,如果它们打扰您,您可以清理和重置索引:

df.columns = df.columns.droplevel(0)
df.reset_index(inplace=True)

Which returns your requested MCVE output:它返回您请求的 MCVE 输出:

Type  user query   Op     col  date   sex
0        1   abc  and    None  1992     F
1        1   efg  and    Blue  2000  None
2        1   fgs  and     Red  2001  None
3        2   dcv  and    None  2001     F
4        2   hij  and    None  2002  None
5        2   tyu  and  Yellow  1999  None
6        3   jhg  and    None  2001     M
7        4   mlh  and    None  2001  None

Not column不列

In this final DataFrame the first column seems to be called Type , but it is not.在这个最终的 DataFrame 中,第一列似乎被称为Type ,但事实并非如此。 It is instead a Integer Index without Name:它是一个没有名称的整数索引:

df.index
RangeIndex(start=0, stop=8, step=1)

And Columns index is called Type which does not hold any modality called Type (therefore no column with this name).列索引称为Type ,它不包含任何称为Type模态(因此没有具有此名称的列)。

df.columns
Index(['user', 'query', 'Op', 'col', 'date', 'sex'], dtype='object', name='Type')

This is why you cannot remove the column Type (column used in pivot_table ), because it does not exist.这就是您不能删除列Type (在pivot_table使用的pivot_table )的原因,因为它不存在。

If you want to remove this fake column , you need to create a new index for rows:如果你想删除这个假列,你需要为行创建一个新的索引:

df.set_index(['user', 'query'], inplace=True)

If Column index Name bothers you, you can reset it:如果列索引名称困扰您,您可以重置它:

df.columns.name = None

It leads to:它导致:

             Op     col  date   sex
user query                         
1    abc    and    None  1992     F
     efg    and    Blue  2000  None
     fgs    and     Red  2001  None
2    dcv    and    None  2001     F
     hij    and    None  2002  None
     tyu    and  Yellow  1999  None
3    jhg    and    None  2001     M
4    mlh    and    None  2001  None

It is a good practice when you create a new index to always check it is unique:创建新索引时始终检查它的唯一性是一种很好的做法:

df.index.is_unique
True

Data from file来自文件的数据

If your data are in a file, you should first import it into a variable using PSL json module:如果您的数据在文件中,您应该首先使用 PSL json模块将其导入到一个变量中:

import json
with open(path) as file:
    data = json.load(file)

This will do the trick, then come back to the beginning of my answer.这将解决问题,然后回到我的答案的开头。

import pandas as pd

df = pd.DataFrame([{'user': 1,'query': 'abc', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'1992'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 1,'query': 'efg', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2000'},{u'Op': u'and', u'Type': u'col', u'Val': u'Blue'}]},
              {'user': 1,'query': 'fgs', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'col', u'Val': u'Red'}]},
              {'user': 2 ,'query': 'hij', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2002'}]},
              {'user': 2 ,'query': 'dcv', 'Filters': [{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'F'}]},
              {'user': 2 ,'query': 'tyu', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'1999'},{u'Op': u'and', u'Type': u'col', u'Val': u'Yellow'}]},
              {'user': 3 ,'query': 'jhg', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'},{u'Op': u'and', u'Type': u'sex', u'Val': u'M'}]},
              {'user': 4 ,'query': 'mlh', 'Filters':[{u'Op': u'and', u'Type': u'date', u'Val': u'2001'}]},
             ])

def func(x):
    date = x[0]['Val']
    sex = ''
    col = ''
    if len(x) > 1:
        if x[1]['Val'] in ['F','M']:
            sex = x[1]['Val']
        else:
            col = x[1]['Val']      
    return pd.Series([date,sex,col])

df[['date','sex','color']] = df['Filters'].apply(func)

df

Outputs (not showing filter):输出(未显示过滤器):

  query  user  date sex   color
0   abc     1  1992   F        
1   efg     1  2000        Blue
2   fgs     1  2001         Red
3   hij     2  2002            
4   dcv     2  2001   F        
5   tyu     2  1999      Yellow
6   jhg     3  2001   M        
7   mlh     4  2001            

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用嵌套字典列表中的嵌套列制作 pandas dataframe - How to make a pandas dataframe with nested columns from a list of nested dicts 如何展平 pandas 中的字典列表? - How to flatten a list of list of dicts in pandas? Pandas:如何将 dicts 列表中的 dicts 列表展平到数据框中,如果嵌套列表中的任何 dict 缺少任何指定的键,则会抛出错误? - Pandas: How to flatten lists of dicts within a list of dicts into dataframe, throwing error if any dict in nested list is missing any specified keys? Pandas Dataframe - 用于分隔列的字典列表 - Pandas Dataframe - list of dicts to seperate columns 如何在 pandas dataframe 的多列中展平字典列表 - How to flatten list of dictionaries in multiple columns of pandas dataframe 如何将嵌套字典的pandas列展平到每个键的单独列中 - How to flatten a pandas column of nested dicts, into separate columns for each key 如何规范化 pandas dataframe 中的多列字典 - How to normalize multiple columns of dicts in a pandas dataframe 如何使用字典列表中的值更新Pandas数据框? - How do I update a Pandas dataframe with values from a list of dicts? 如何从字典列表中提取数据到熊猫数据框中? - How to extract data from a list of dicts, into a pandas dataframe? 如何将字典列表的列表转换为 Pandas 数据框 - How to transform a List of a List of dicts into a Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM