熊猫数据框字典列表

Question

I have a list of dictionaries that I want to convert to a dataframe. 我有一个字典列表，我想将其转换为数据框。 Here's what I'm doing: 这是我在做什么：

comments = getComments(submission) #returns list of dicts
tree = flattenTree(comments) #this just removes indentation from one of the text fields
df = pd.DataFrame(tree)['data']

df.head() returns: df.head（）返回：

0    {u'subreddit_id': u't5_2qj9g', u'banned_by': N...
1    {u'subreddit_id': u't5_2qj9g', u'banned_by': N...
2    {u'subreddit_id': u't5_2qj9g', u'banned_by': N...
3    {u'subreddit_id': u't5_2qj9g', u'banned_by': N...
4    {u'subreddit_id': u't5_2qj9g', u'banned_by': N...
Name: data, dtype: object

raw data is a list of nested dictionaries: 原始数据是嵌套字典的列表：

[{u'data': {u'approved_by': None,
u'archived': False,
u'author': u'des-tal',
u'controversiality': 0,
...
u'user_reports': []},
u'kind': u't1'},
{u'data': {u'approved_by': None,
u'archived': False,
...

The format I'm looking for is: 我正在寻找的格式是：

which I can get by selecting rows from the dataframe like this: 我可以这样从数据框中选择行来获得：

...
df = pd.DataFrame(tree)['data']
inddf = pd.DataFrame([df[0],df[1],df[3]])
print inddf

How can I form my dataframe from my dataset for all rows without manually selecting all the rows? 如何在不手动选择所有行的情况下从数据集中为所有行形成数据框？ I was trying to iterate through the index, but I'm sure there's a better way. 我试图遍历索引，但是我确信有更好的方法。

Thanks 谢谢

Answer 1

you can pass list of dictionaries to pandas dataframe. 您可以将字典列表传递给pandas数据框。 For example see below 例如看下面

my_list = [

{u'data': {u'approved_by': None,
u'archived': False,
u'author': u'des-tal',
u'controversiality': 0,
u'user_reports': []},
u'kind': u't1'},

 {u'data': {u'approved_by': None,
u'archived': True,
u'author': u'des-tal',
u'controversiality': 0,
u'user_reports': []},
u'kind': u't1'}

]

import pandas as pd
df = pd.DataFrame([i['data'] for i in my_list])
print df.head()

results in 结果是

  approved_by archived   author  controversiality user_reports
0        None    False  des-tal                 0           []
1        None     True  des-tal                 0           []

Answer 2

If every dictionary has the same keys, then this should work for what I think you're trying to do. 如果每个字典都具有相同的键，那么这应该适用于我认为您要尝试执行的操作。

cols = list_of_dicts[0]['data'].keys()
cols = list(cols)
df=pd.DataFrame(columns=cols)
for d in list_of_dicts:
    df.append(d['data'], ignore_index=True)

If not, make sure you use a representative dictionary to initialize the dataframe. 如果不是，请确保使用代表性字典来初始化数据框。 A little slow because it's in a for loop, but should do the trick. 有点慢，因为它处于for循环中，但应该可以解决问题。

熊猫数据框字典列表

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-03-22 23:57:51

解决方案2
1 2017-03-22 23:48:47

熊猫数据框字典列表

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-03-22 23:57:51

解决方案2 1 2017-03-22 23:48:47

解决方案1
2 已采纳 2017-03-22 23:57:51

解决方案2
1 2017-03-22 23:48:47