[英]Convert Pandas DataFrame to Python list
I have the following dataframe: 我有以下数据帧:
In [137]: counts
Out[137]:
SourceColumnID 3029903181 3029903182 3029903183 3029903184 ResponseCount
ColID QuestionID RowID
3029903193 316923119 3029903189 773 788 778 803 3142
3029903194 316923119 3029903189 766 799 782 773 3120
[2 rows x 5 columns]
that works well for what I want when I access it via iloc: 当我通过iloc访问它时,它适用于我想要的东西:
In [138]: counts.iloc[0][3029903181]
Out[138]: 773
but when I convert this to a dict it formats it in a way that isn't accessible in the same way anymore: 但是当我将它转换为dict时,它将以不再以相同方式访问的方式对其进行格式化:
In [139]: counts.to_dict()
Out[139]:
{3029903181: {(3029903193, 316923119, 3029903189): 773,
(3029903194, 316923119, 3029903189): 766},
3029903182: {(3029903193, 316923119, 3029903189): 788,
(3029903194, 316923119, 3029903189): 799},
3029903183: {(3029903193, 316923119, 3029903189): 778,
(3029903194, 316923119, 3029903189): 782},
3029903184: {(3029903193, 316923119, 3029903189): 803,
(3029903194, 316923119, 3029903189): 773},
'ResponseCount': {(3029903193, 316923119, 3029903189): 3142,
(3029903194, 316923119, 3029903189): 3120}}
In [140]: counts.to_dict('list')
Out[140]:
{3029903181: [773, 766],
3029903182: [788, 799],
3029903183: [778, 782],
3029903184: [803, 773],
'ResponseCount': [3142, 3120]}
I need to convert this datastructure to a standard python object to return for an API to consume it. 我需要将此数据结构转换为标准python对象,以返回API以使用它。
Should I have created the table in a different format? 我应该以不同的格式创建表吗?
I started with this DataFrame: 我从这个DataFrame开始:
In [141]: df
Out[141]:
ColID QuestionID ResponseCount RowID SourceColumnID
0 3029903193 316923119 773 3029903189 3029903181
1 3029903193 316923119 788 3029903189 3029903182
2 3029903193 316923119 778 3029903189 3029903183
3 3029903193 316923119 803 3029903189 3029903184
4 3029903194 316923119 766 3029903189 3029903181
5 3029903194 316923119 799 3029903189 3029903182
6 3029903194 316923119 782 3029903189 3029903183
7 3029903194 316923119 773 3029903189 3029903184
[8 rows x 5 columns]
and converted it to a pivot table like this: 并将其转换为如下所示的数据透视表:
counts = df.pivot_table(values='ResponseCount', rows=['ColID', 'QuestionID', 'RowID'], cols='SourceColumnID', aggfunc='sum')
I'm really looking for the datastructure to come out looking like this: 我真的在寻找看起来像这样的数据结构:
[
{
'QuestionID': 316923119,
'RowID': 3029903189,
'ColID': 3029903193,
'3029903181': 773,
'3029903182': 788,
'3029903183': 778,
'3029903184': 803,
'ResponseCount': 3142
},
{
'QuestionID': 316923119,
'RowID': 3029903189,
'ColID': 3029903194,
'3029903181': 766,
'3029903182': 799,
'3029903183': 782,
'3029903184': 773,
'ResponseCount': 3120
},
]
I believe you want counts.reset_index().to_dict('records')
. 我相信你想要counts.reset_index().to_dict('records')
。
Using 'records'
with to_dict
makes it give you a list of dicts, one dict per row, which is what you want. 在to_dict
使用'records'
可以得到一个to_dict
列表,每行一个dict,这就是你想要的。 You need to use reset_index()
to get the index information in as columns (because 'records' throws away the index). 您需要使用reset_index()
以列的形式获取索引信息(因为'records'会抛弃索引)。 Conceptually, the dicts you say you want don't distinguish between what's in the index of your pivot table and what's in the columns (you just want all index and column labels as keys in the dict), so you need to reset_index
to remove the index/column distinction. 从概念上讲,你说你想要的词汇不区分数据透视表索引中的内容和列中的内容(你只想将所有索引和列标签作为字典中的键),所以你需要reset_index
来删除索引/列的区别。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.