[英]Creating a Pandas Dataframe from List of Dictionaries of Dictionaries
I have a list of dictionaries, where each dictionary represents a record. 我有一个字典列表,其中每个字典代表一条记录。 It is formatted as follows: 格式如下:
>>> ListOfData=[
... {'Name':'Andrew',
... 'number':4,
... 'contactinfo':{'Phone':'555-5555', 'Address':'123 Main St'}},
... {'Name':'Ben',
... 'number':6,
... 'contactinfo':{'Phone':'555-5554', 'Address':'124 2nd St'}},
... {'Name':'Cathy',
... 'number':1,
... 'contactinfo':{'Phone':'555-5556', 'Address':'126 3rd St'}}]
>>>
>>> import pprint
>>> pprint.pprint(ListOfData)
[{'Name': 'Andrew',
'contactinfo': {'Address': '123 Main St', 'Phone': '555-5555'},
'number': 4},
{'Name': 'Ben',
'contactinfo': {'Address': '124 2nd St', 'Phone': '555-5554'},
'number': 6},
{'Name': 'Cathy',
'contactinfo': {'Address': '126 3rd St', 'Phone': '555-5556'},
'number': 1}]
>>>
What is the best way to read this into a Pandas dataframe with multiindex columns for those attributes in the sub dictionaries? 将其读入带有多索引列的Pandas数据框中的最佳方法是什么?
For example, I'd ideally have 'Phone' and 'Address' columns nested under the 'contactinfo' columns. 例如,理想情况下,我会在“ contactinfo”列下面嵌套“ Phone”和“ Address”列。
I can read in the data as follows, but would like the contact info column to be broken into sub columns. 我可以按以下方式读取数据,但希望将联系信息列细分为子列。
>>> pd.DataFrame.from_dict(ListOfData)
Name contactinfo number
0 Andrew {u'Phone': u'555-5555', u'Address': u'123 Main... 4
1 Ben {u'Phone': u'555-5554', u'Address': u'124 2nd ... 6
2 Cathy {u'Phone': u'555-5556', u'Address': u'126 3rd ... 1
>>>
how about this 这个怎么样
declare empty data frame 声明空数据框
df = DataFrame(columns=('Name', 'conntactinfo', 'number'))
then iterate over List and add rows 然后遍历列表并添加行
for row in ListOfData:
df.loc[len(df)] = row
complete code 完整的代码
import pandas as pd
ListOfData=[
{'Name':'Andrew',
'number':4,
'contactinfo':{'Phone':'555-5555', 'Address':'123 Main St'}},
{'Name':'Ben',
'number':6,
'contactinfo':{'Phone':'555-5554', 'Address':'124 2nd St'}}]
df = pd.DataFrame(columns=('Name', 'contactinfo', 'number'))
for row in ListOfData:
df.loc[len(df)] = row
print(df)
this prints 此打印
Name contactinfo number
0 Andrew {'Phone': '555-5555', 'Address': '123 Main St'} 4
1 Ben {'Phone': '555-5554', 'Address': '124 2nd St'} 6
Here is a pretty clunky workaround that I was able to get what I need. 这是一个很笨拙的解决方法,我能够得到所需的东西。 I loop through the columns, find those that are made of dicts and then divide it into multiple columns and merge it to the dataframe. 我遍历各列,找到由字典组成的列,然后将其分为多个列,然后将其合并到数据框。 I'd appreciate hearing any ways to improve this code. 我很高兴听到任何改进此代码的方法。 I'd imagine that ideally the dataframe would be constructed from the get-go without having dictionaries as values. 我以为理想情况下,数据框架将从一开始就构建而无需将字典作为值。
>>> df=pd.DataFrame.from_dict(ListOfData)
>>>
>>> for name,col in df.iteritems():
... if any(isinstance(x, dict) for x in col.tolist()):
... DividedDict=col.apply(pd.Series)
... DividedDict.columns=pd.MultiIndex.from_tuples([(name,x) for x in DividedDict.columns.tolist()])
... df=df.join(DividedDict)
... df.drop(name,1, inplace=True)
...
>>> print df
Name number (contactinfo, Address) (contactinfo, Phone)
0 Andrew 4 123 Main St 555-5555
1 Ben 6 124 2nd St 555-5554
2 Cathy 1 126 3rd St 555-5556
>>>
Don't know about best or not, but you could do it in two steps: 不知道最好还是不知道,但是您可以分两个步骤进行操作:
>>> df = pd.DataFrame(ListOfData)
>>> df = df.join(pd.DataFrame.from_records(df.pop("contactinfo")))
>>> df
Name number Address Phone
0 Andrew 4 123 Main St 555-5555
1 Ben 6 124 2nd St 555-5554
2 Cathy 1 126 3rd St 555-5556
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.