[英]Creating a hierarchical dictionary from DataFrame column
I have a dataframe like this;我有一个这样的数据框;
+------------+------------+-------+
| Date | Total Cars | Lanes |
+------------+------------+-------+
| 2019-10-20 | 5 | 2 |
| 2019-10-23 | 15 | 3 |
| 2020-01-20 | 23 | 2 |
+------------+------------+-------+
I want to return a dictionary like this;我想返回这样的字典;
{
"y2019":{
"Year Total Cars":20,
"Year Total Lanes":5,
"M10":{
"Month All Cars":20,
"Month All Lanes":5,
"Day20":{
"Day All Cars":5,
"Day All Lanes":2
},
"Day 23":{
"Day All Cars":15,
"Day All Lanes":3
}
}
},
"y2020":{
"Year Total Cars":23,
"Year Total Lanes":2,
"M10":{
"Month Total Cars":23,
"Month Total Lanes":2,
"Day20":{
"Day All Cars":23,
"Day All Lanes":2
}
}
}
}
So far Tried to use df.resample and tried to create a nested dict out of this but wasn't successful.到目前为止尝试使用 df.resample 并尝试从中创建一个嵌套的 dict 但没有成功。 Are there any other elegant ways to tackle this in Pandas?在 Pandas 中还有其他优雅的方法来解决这个问题吗?
I tried to unpack dates into YMD and added these as into the dataframe.我试图将日期解压缩到 YMD 中并将它们添加到数据框中。 Then I created a groupby object to iterate through.然后我创建了一个 groupby 对象来迭代。
date_counter_df.loc[:,'year'] = [x.year for x in date_counter_df.index]
date_counter_df.loc[:,'month'] = [x.month for x in date_counter_df.index]
date_counter_df.loc[:,'day'] = [x.day for x in date_counter_df.index]
result = {}
for to_unpack, df_res in date_counter_df.groupby(['year','month','day']):
year, month, day = to_unpack
try:
result[year]
except KeyError:
result[year] = {}
try:
result[year][month]
except KeyError:
result[year][month] = {}
try:
result[year][month][day]
except KeyError:
result[year][month][day] = {}
result[year][month][day] = df_res
Result looks like this from the original dataset;原始数据集的结果是这样的;
{2019: {10: {17: sensor-id Totalt i retning Kristiansand Totalt i retning Oslo \
Date
2019-10-17 11219V22151 0 0
1 2 3 4 Totalt Totalt i retning Fianex Rv 415 \
Date
2019-10-17 2702 2615 0 0 5317 2614
Totalt i retning Stølen X Rv 420 year month day
Date
2019-10-17 2703 2019 10 17 ,
30: sensor-id Totalt i retning Kristiansand Totalt i retning Oslo \
Date
2019-10-30 11219V22151 0 0
1 2 3 4 Totalt Totalt i retning Fianex Rv 415 \
Date
2019-10-30 2729 2589 0 0 5318 2589
Totalt i retning Stølen X Rv 420 year month day
Date
2019-10-30 2729 2019 10 30 },
12: {28: sensor-id Totalt i retning Kristiansand Totalt i retning Oslo \
Date
2019-12-28 61942V2809673 3134 3461
1 2 3 4 Totalt Totalt i retning Fianex Rv 415 \
Date
2019-12-28 333 494 2801 2967 6595 0
Totalt i retning Stølen X Rv 420 year month day
Date
2019-12-28 0 2019 12 28 }},
2020: {2: {19: sensor-id Totalt i retning Kristiansand Totalt i retning Oslo \
Date
2020-02-19 71445V2809674 5006 5202
1 2 3 4 Totalt Totalt i retning Fianex Rv 415 \
Date
2020-02-19 686 747 4320 4455 10208 0
Totalt i retning Stølen X Rv 420 year month day
Date
2020-02-19 0 2020 2 19 }}}
Now I only will need to add the totals for both Years and Months as dict items.现在我只需要将 Years 和 Months 的总数添加为 dict 项目。
Start from conversion of Date column to Datetime (if till now it is of string type):从Date列到Datetime 的转换开始(如果到目前为止它是字符串类型):
df.Date = pd.to_datetime(df.Date)
Then define 3 functions:然后定义3个函数:
A function to process each row, from a monthly group:处理来自每月组的每一行的函数:
def rowFn(row): return { 'Day All Cars': row['Total Cars'], 'Day All Lanes': row.Lanes }
A function to process each monthly group:处理每个月组的函数:
def monthGrpFun(grp): dct = { 'Month All Cars': grp['Total Cars'].sum(), 'Month All Lanes': grp.Lanes.sum() } for _, row in grp.iterrows(): dct[f'Day {row.Date.day:02}'] = rowFn(row) return dct
A function to process each yearly group:处理每个年度组的函数:
def yearGrpFun(grp): dct = { 'Year Total Cars': grp['Total Cars'].sum(), 'Year Total Lanes': grp.Lanes.sum() } for key, grp2 in grp.groupby(grp.Date.dt.month): dct[f'M{key:02}'] = monthGrpFun(grp2) return dct
And to get the result, run:要获得结果,请运行:
dct = {}
for key, grp in df.groupby(df.Date.dt.year):
dct[f'y{key}'] = yearGrpFun(grp)
The result for your data (reformatted for readability) is:您的数据的结果(重新格式化以提高可读性)是:
{
'y2019': {
'Year Total Cars': 20,
'Year Total Lanes': 5,
'M10': {
'Month All Cars': 20,
'Month All Lanes': 5,
'Day 20': {'Day All Cars': 5, 'Day All Lanes': 2},
'Day 23': {'Day All Cars': 15, 'Day All Lanes': 3}
}
},
'y2020': {
'Year Total Cars': 23,
'Year Total Lanes': 2,
'M01': {
'Month All Cars': 23,
'Month All Lanes': 2,
'Day 20': {'Day All Cars': 23, 'Day All Lanes': 2}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.