简体   繁体   English

从 DataFrame 列创建分层字典

[英]Creating a hierarchical dictionary from DataFrame column

I have a dataframe like this;我有一个这样的数据框;

+------------+------------+-------+
|    Date    | Total Cars | Lanes |  
+------------+------------+-------+
| 2019-10-20 |          5 |     2 |  
| 2019-10-23 |         15 |     3 |
| 2020-01-20 |         23 |     2 |
+------------+------------+-------+

I want to return a dictionary like this;我想返回这样的字典;

{
    "y2019":{
        "Year Total Cars":20,
        "Year Total Lanes":5,
        "M10":{
            "Month All Cars":20,
            "Month All Lanes":5,
            "Day20":{
                "Day All Cars":5,
                "Day All Lanes":2
            },
            "Day 23":{
                "Day All Cars":15,
                "Day All Lanes":3
            }
        }
    },
    "y2020":{
        "Year Total Cars":23,
        "Year Total Lanes":2,
        "M10":{
            "Month Total Cars":23,
            "Month Total Lanes":2,
            "Day20":{
                "Day All Cars":23,
                "Day All Lanes":2
            }
        }
    }
}

So far Tried to use df.resample and tried to create a nested dict out of this but wasn't successful.到目前为止尝试使用 df.resample 并尝试从中创建一个嵌套的 dict 但没有成功。 Are there any other elegant ways to tackle this in Pandas?在 Pandas 中还有其他优雅的方法来解决这个问题吗?

I tried to unpack dates into YMD and added these as into the dataframe.我试图将日期解压缩到 YMD 中并将它们添加到数据框中。 Then I created a groupby object to iterate through.然后我创建了一个 groupby 对象来迭代。

date_counter_df.loc[:,'year'] = [x.year for x in date_counter_df.index]
date_counter_df.loc[:,'month'] = [x.month for x in date_counter_df.index]
date_counter_df.loc[:,'day'] = [x.day for x in date_counter_df.index]

result = {}
for to_unpack, df_res in date_counter_df.groupby(['year','month','day']):
    year, month, day = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][day]
    except KeyError:
        result[year][month][day] = {}

    result[year][month][day] = df_res

Result looks like this from the original dataset;原始数据集的结果是这样的;

{2019: {10: {17:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-17  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-17  2702  2615  0  0   5317                           2614   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-17                             2703  2019     10   17  ,
   30:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-30  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-30  2729  2589  0  0   5318                           2589   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-30                             2729  2019     10   30  },
  12: {28:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2019-12-28  61942V2809673                          3134                  3461   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2019-12-28  333  494  2801  2967   6595                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-12-28                                0  2019     12   28  }},
 2020: {2: {19:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2020-02-19  71445V2809674                          5006                  5202   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2020-02-19  686  747  4320  4455  10208                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2020-02-19                                0  2020      2   19  }}}

Now I only will need to add the totals for both Years and Months as dict items.现在我只需要将 Years 和 Months 的总数添加为 dict 项目。

Start from conversion of Date column to Datetime (if till now it is of string type):Date列到Datetime 的转换开始(如果到目前为止它是字符串类型):

df.Date = pd.to_datetime(df.Date)

Then define 3 functions:然后定义3个函数:

  1. A function to process each row, from a monthly group:处理来自每月组的每一行的函数:

     def rowFn(row): return { 'Day All Cars': row['Total Cars'], 'Day All Lanes': row.Lanes }
  2. A function to process each monthly group:处理每个月组的函数:

     def monthGrpFun(grp): dct = { 'Month All Cars': grp['Total Cars'].sum(), 'Month All Lanes': grp.Lanes.sum() } for _, row in grp.iterrows(): dct[f'Day {row.Date.day:02}'] = rowFn(row) return dct
  3. A function to process each yearly group:处理每个年度组的函数:

     def yearGrpFun(grp): dct = { 'Year Total Cars': grp['Total Cars'].sum(), 'Year Total Lanes': grp.Lanes.sum() } for key, grp2 in grp.groupby(grp.Date.dt.month): dct[f'M{key:02}'] = monthGrpFun(grp2) return dct

And to get the result, run:要获得结果,请运行:

dct = {}
for key, grp in df.groupby(df.Date.dt.year):
    dct[f'y{key}'] = yearGrpFun(grp)

The result for your data (reformatted for readability) is:您的数据的结果(重新格式化以提高可读性)是:

{
  'y2019': {
    'Year Total Cars': 20,
    'Year Total Lanes': 5,
    'M10': {
      'Month All Cars': 20,
      'Month All Lanes': 5,
      'Day 20': {'Day All Cars': 5, 'Day All Lanes': 2},
      'Day 23': {'Day All Cars': 15, 'Day All Lanes': 3}
    }
  },
  'y2020': {
    'Year Total Cars': 23,
    'Year Total Lanes': 2,
    'M01': {
      'Month All Cars': 23,
      'Month All Lanes': 2,
      'Day 20': {'Day All Cars': 23, 'Day All Lanes': 2}
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM