简体   繁体   English

从字典词典列表创建Pandas数据框

[英]Creating a Pandas Dataframe from List of Dictionaries of Dictionaries

I have a list of dictionaries, where each dictionary represents a record. 我有一个字典列表,其中每个字典代表一条记录。 It is formatted as follows: 格式如下:

>>> ListOfData=[
... {'Name':'Andrew',
...  'number':4,
...  'contactinfo':{'Phone':'555-5555', 'Address':'123 Main St'}},
... {'Name':'Ben',
...  'number':6,
...  'contactinfo':{'Phone':'555-5554', 'Address':'124 2nd St'}},
... {'Name':'Cathy',
...  'number':1,
...  'contactinfo':{'Phone':'555-5556', 'Address':'126 3rd St'}}]
>>> 
>>> import pprint
>>> pprint.pprint(ListOfData)
[{'Name': 'Andrew',
  'contactinfo': {'Address': '123 Main St', 'Phone': '555-5555'},
  'number': 4},
 {'Name': 'Ben',
  'contactinfo': {'Address': '124 2nd St', 'Phone': '555-5554'},
  'number': 6},
 {'Name': 'Cathy',
  'contactinfo': {'Address': '126 3rd St', 'Phone': '555-5556'},
  'number': 1}]
>>> 

What is the best way to read this into a Pandas dataframe with multiindex columns for those attributes in the sub dictionaries? 将其读入带有多索引列的Pandas数据框中的最佳方法是什么?

For example, I'd ideally have 'Phone' and 'Address' columns nested under the 'contactinfo' columns. 例如,理想情况下,我会在“ contactinfo”列下面嵌套“ Phone”和“ Address”列。

I can read in the data as follows, but would like the contact info column to be broken into sub columns. 我可以按以下方式读取数据,但希望将联系信息列细分为子列。

>>> pd.DataFrame.from_dict(ListOfData)
     Name                                        contactinfo  number
0  Andrew  {u'Phone': u'555-5555', u'Address': u'123 Main...       4
1     Ben  {u'Phone': u'555-5554', u'Address': u'124 2nd ...       6
2   Cathy  {u'Phone': u'555-5556', u'Address': u'126 3rd ...       1
>>> 

how about this 这个怎么样

declare empty data frame 声明空数据框

df = DataFrame(columns=('Name', 'conntactinfo', 'number'))

then iterate over List and add rows 然后遍历列表并添加行

for row in ListOfData:
    df.loc[len(df)] = row

complete code 完整的代码

import pandas as pd

ListOfData=[
 {'Name':'Andrew',
  'number':4,
  'contactinfo':{'Phone':'555-5555', 'Address':'123 Main St'}},
 {'Name':'Ben',
  'number':6,
  'contactinfo':{'Phone':'555-5554', 'Address':'124 2nd St'}}]

df = pd.DataFrame(columns=('Name', 'contactinfo', 'number'))

for row in ListOfData:

    df.loc[len(df)] = row

print(df)

this prints 此打印

  Name                                      contactinfo  number
0  Andrew  {'Phone': '555-5555', 'Address': '123 Main St'}       4
1     Ben   {'Phone': '555-5554', 'Address': '124 2nd St'}       6

Here is a pretty clunky workaround that I was able to get what I need. 这是一个很笨拙的解决方法,我能够得到所需的东西。 I loop through the columns, find those that are made of dicts and then divide it into multiple columns and merge it to the dataframe. 我遍历各列,找到由字典组成的列,然后将其分为多个列,然后将其合并到数据框。 I'd appreciate hearing any ways to improve this code. 我很高兴听到任何改进此代码的方法。 I'd imagine that ideally the dataframe would be constructed from the get-go without having dictionaries as values. 我以为理想情况下,数据框架将从一开始就构建而无需将字典作为值。

>>> df=pd.DataFrame.from_dict(ListOfData)
>>> 
>>> for name,col in df.iteritems():
...     if any(isinstance(x, dict) for x in col.tolist()):
...         DividedDict=col.apply(pd.Series)
...         DividedDict.columns=pd.MultiIndex.from_tuples([(name,x) for x in DividedDict.columns.tolist()])
...         df=df.join(DividedDict)
...         df.drop(name,1, inplace=True)
... 
>>> print df
     Name  number (contactinfo, Address) (contactinfo, Phone)
0  Andrew       4            123 Main St             555-5555
1     Ben       6             124 2nd St             555-5554
2   Cathy       1             126 3rd St             555-5556
>>> 

Don't know about best or not, but you could do it in two steps: 不知道最好还是不知道,但是您可以分两个步骤进行操作:

>>> df = pd.DataFrame(ListOfData)
>>> df = df.join(pd.DataFrame.from_records(df.pop("contactinfo")))
>>> df
     Name  number      Address     Phone
0  Andrew       4  123 Main St  555-5555
1     Ben       6   124 2nd St  555-5554
2   Cathy       1   126 3rd St  555-5556

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM