[英]Convert ndarray to dict in python3
I have a ndarray that look like this我有一个看起来像这样的 ndarray
LABEL1 99 113 2010-04-26 20:12:23+00:00
LABEL1 29 143 2010-05-06 20:12:23+00:00
LABEL1 99 323 2010-02-12 20:12:23+00:00
LABEL1 23 223 2010-04-25 20:12:23+00:00
LABEL2 23 23 2010-01-21 20:12:23+00:00
LABEL1 234 123 2010-12-26 20:12:23+00:00
LABEL1 93 133 2010-02-23 20:12:23+00:00
LABEL4 19 1223 2010-07-24 20:12:23+00:00
I need to do some aggregation and return as dict..我需要做一些聚合并作为字典返回..
What I should get at the end is similary to this最后我应该得到的与此相似
[
{ 'LABEL1': { 'COLA':577, 'COLB': 1058, 'LAST': '2010-12-26 20:12:23+00:00' } },
{ 'LABEL2': { 'COLA':23, 'COLB': 23, 'LAST': '2010-01-21 20:12:23+00:00' } },
{ 'LABEL4': { 'COLA':19, 'COLB':1223, 'LAST': '2010-07-24 20:12:23+00:00' } }
]
The way I was thinking of doing was to convert to DataFrame, then do a group().agg...我想做的方法是转换为 DataFrame,然后执行 group().agg ...
aggr = select_df.groupby('LABELS').agg({'LABELS': [('LABELS', 'max')], 'COLA': [('COLA', 'sum'), ('COLB', 'count')], {'LAST': [('LAST', 'max')]})
I'm kinda new to Python... and having nightmare with all data conversion required to do this...我对 Python 有点陌生……并且对执行此操作所需的所有数据转换做噩梦……
The original structure is a list原始结构是一个列表
[
{ 'Label': 'xxxx', 'LABELS': 'xxxx', 'COLA': ##, 'COLB': ##, 'LAST': 'datetime' },...
]
If I could simply aggregate directly this list and then concatenate with the next pass (list is read in chunk) to have a final list as mentioned above...如果我可以简单地直接聚合这个列表,然后与下一个通道连接(列表以块的形式读取)以获得如上所述的最终列表......
First convert it into dataframe:首先将其转换为数据帧:
df: df:
0 1 2 3
0 LABEL1 29 143 2010-05-06 20:12:23+00:00
1 LABEL1 99 323 2010-02-12 20:12:23+00:00
2 LABEL1 23 223 2010-04-25 20:12:23+00:00
3 LABEL2 23 23 2010-01-21 20:12:23+00:00
4 LABEL1 234 123 2010-12-26 20:12:23+00:00
5 LABEL1 93 133 2010-02-23 20:12:23+00:00
6 LABEL4 19 1223 2010-07-24 20:12:23+00:00
df.columns = ['label','x','y','z','w']
df.set_index('label').T.to_dict('dict')
result:结果:
{'LABEL1': {'x': 93, 'y': 133, 'z': '2010-02-23', 'w': '20:12:23+00:00'},
'LABEL2': {'x': 23, 'y': 23, 'z': '2010-01-21', 'w': '20:12:23+00:00'},
'LABEL4': {'x': 19, 'y': 1223, 'z': '2010-07-24', 'w': '20:12:23+00:00'}}
Edit: Then groupby label and aggregate by sum, max编辑:然后分组标签并按总和聚合,最大值
df.groupby(["label"])\
.agg({"x": "sum", "y": "sum", "z": "max", "w": "max"}).T.to_dict('dict')
result:结果:
{'LABEL1': {'x': 478, 'y': 945, 'z': '2010-12-26', 'w': '20:12:23+00:00'},
'LABEL2': {'x': 23, 'y': 23, 'z': '2010-01-21', 'w': '20:12:23+00:00'},
'LABEL4': {'x': 19, 'y': 1223, 'z': '2010-07-24', 'w': '20:12:23+00:00'}}
Your attempt was pretty close.你的尝试非常接近。
Code:代码:
import pandas as pd
input = [
{"LABELS": "LABEL1", "COLA": 99, "COLB": 113, "LAST": "2010-04-26 20:12:23+00:00"},
{"LABELS": "LABEL1", "COLA": 29, "COLB": 143, "LAST": "2010-05-06 20:12:23+00:00"},
{"LABELS": "LABEL1", "COLA": 99, "COLB": 323, "LAST": "2010-02-12 20:12:23+00:00"},
{"LABELS": "LABEL1", "COLA": 23, "COLB": 223, "LAST": "2010-04-25 20:12:23+00:00"},
{"LABELS": "LABEL2", "COLA": 23, "COLB": 23, "LAST": "2010-01-21 20:12:23+00:00"},
{"LABELS": "LABEL1", "COLA": 234, "COLB": 123, "LAST": "2010-12-26 20:12:23+00:00"},
{"LABELS": "LABEL1", "COLA": 93, "COLB": 133, "LAST": "2010-02-23 20:12:23+00:00"},
{"LABELS": "LABEL4", "COLA": 19, "COLB": 1223, "LAST": "2010-07-24 20:12:23+00:00"},
]
df = (
pd.DataFrame(input)
.groupby(["LABELS"])
.agg({"COLA": "sum", "COLB": "sum", "LAST": "max"})
)
print(df.to_dict("index"))
Output:输出:
{'LABEL1': {'COLA': 577, 'COLB': 1058, 'LAST': '2010-12-26 20:12:23+00:00'}, 'LABEL2': {'COLA': 23, 'COLB': 23, 'LAST': '2010-01-21 20:12:23+00:00'}, 'LABEL4': {'COLA': 19, 'COLB': 1223, 'LAST': '2010-07-24 20:12:23+00:00'}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.