将Pandas DataFrame转换为Python列表

Question

I have the following dataframe: 我有以下数据帧：

In [137]: counts
Out[137]: 
SourceColumnID                    3029903181  3029903182  3029903183  3029903184  ResponseCount
ColID      QuestionID RowID                                                                    
3029903193 316923119  3029903189         773         788         778         803           3142
3029903194 316923119  3029903189         766         799         782         773           3120

[2 rows x 5 columns]

that works well for what I want when I access it via iloc: 当我通过iloc访问它时，它适用于我想要的东西：

In [138]: counts.iloc[0][3029903181]
Out[138]: 773

but when I convert this to a dict it formats it in a way that isn't accessible in the same way anymore: 但是当我将它转换为dict时，它将以不再以相同方式访问的方式对其进行格式化：

In [139]: counts.to_dict()
Out[139]: 
{3029903181: {(3029903193, 316923119, 3029903189): 773,
  (3029903194, 316923119, 3029903189): 766},
 3029903182: {(3029903193, 316923119, 3029903189): 788,
  (3029903194, 316923119, 3029903189): 799},
 3029903183: {(3029903193, 316923119, 3029903189): 778,
  (3029903194, 316923119, 3029903189): 782},
 3029903184: {(3029903193, 316923119, 3029903189): 803,
  (3029903194, 316923119, 3029903189): 773},
 'ResponseCount': {(3029903193, 316923119, 3029903189): 3142,
  (3029903194, 316923119, 3029903189): 3120}}

In [140]: counts.to_dict('list')
Out[140]: 
{3029903181: [773, 766],
 3029903182: [788, 799],
 3029903183: [778, 782],
 3029903184: [803, 773],
 'ResponseCount': [3142, 3120]}

I need to convert this datastructure to a standard python object to return for an API to consume it. 我需要将此数据结构转换为标准python对象，以返回API以使用它。

Should I have created the table in a different format? 我应该以不同的格式创建表吗？

I started with this DataFrame: 我从这个DataFrame开始：

In [141]: df
Out[141]: 
        ColID  QuestionID  ResponseCount       RowID  SourceColumnID
0  3029903193   316923119            773  3029903189      3029903181
1  3029903193   316923119            788  3029903189      3029903182
2  3029903193   316923119            778  3029903189      3029903183
3  3029903193   316923119            803  3029903189      3029903184
4  3029903194   316923119            766  3029903189      3029903181
5  3029903194   316923119            799  3029903189      3029903182
6  3029903194   316923119            782  3029903189      3029903183
7  3029903194   316923119            773  3029903189      3029903184

[8 rows x 5 columns]

and converted it to a pivot table like this: 并将其转换为如下所示的数据透视表：

counts = df.pivot_table(values='ResponseCount', rows=['ColID', 'QuestionID', 'RowID'], cols='SourceColumnID', aggfunc='sum')

I'm really looking for the datastructure to come out looking like this: 我真的在寻找看起来像这样的数据结构：

[
  {
    'QuestionID': 316923119, 
    'RowID': 3029903189, 
    'ColID': 3029903193, 
    '3029903181': 773,
    '3029903182': 788,
    '3029903183': 778,
    '3029903184': 803,
    'ResponseCount': 3142
  },
  {
    'QuestionID': 316923119, 
    'RowID': 3029903189, 
    'ColID': 3029903194, 
    '3029903181': 766,
    '3029903182': 799,
    '3029903183': 782,
    '3029903184': 773,
    'ResponseCount': 3120
  },
]

Answer 1

I believe you want counts.reset_index().to_dict('records') . 我相信你想要counts.reset_index().to_dict('records') 。

Using 'records' with to_dict makes it give you a list of dicts, one dict per row, which is what you want. 在to_dict使用'records'可以得到一个to_dict列表，每行一个dict，这就是你想要的。 You need to use reset_index() to get the index information in as columns (because 'records' throws away the index). 您需要使用reset_index()以列的形式获取索引信息（因为'records'会抛弃索引）。 Conceptually, the dicts you say you want don't distinguish between what's in the index of your pivot table and what's in the columns (you just want all index and column labels as keys in the dict), so you need to reset_index to remove the index/column distinction. 从概念上讲，你说你想要的词汇不区分数据透视表索引中的内容和列中的内容（你只想将所有索引和列标签作为字典中的键），所以你需要reset_index来删除索引/列的区别。

将Pandas DataFrame转换为Python列表

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-02-13 05:11:23

将Pandas DataFrame转换为Python列表

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-02-13 05:11:23

解决方案1
2 已采纳 2014-02-13 05:11:23