简体   繁体   中英

Creating a hierarchical dictionary from DataFrame column

I have a dataframe like this;

+------------+------------+-------+
|    Date    | Total Cars | Lanes |  
+------------+------------+-------+
| 2019-10-20 |          5 |     2 |  
| 2019-10-23 |         15 |     3 |
| 2020-01-20 |         23 |     2 |
+------------+------------+-------+

I want to return a dictionary like this;

{
    "y2019":{
        "Year Total Cars":20,
        "Year Total Lanes":5,
        "M10":{
            "Month All Cars":20,
            "Month All Lanes":5,
            "Day20":{
                "Day All Cars":5,
                "Day All Lanes":2
            },
            "Day 23":{
                "Day All Cars":15,
                "Day All Lanes":3
            }
        }
    },
    "y2020":{
        "Year Total Cars":23,
        "Year Total Lanes":2,
        "M10":{
            "Month Total Cars":23,
            "Month Total Lanes":2,
            "Day20":{
                "Day All Cars":23,
                "Day All Lanes":2
            }
        }
    }
}

So far Tried to use df.resample and tried to create a nested dict out of this but wasn't successful. Are there any other elegant ways to tackle this in Pandas?

I tried to unpack dates into YMD and added these as into the dataframe. Then I created a groupby object to iterate through.

date_counter_df.loc[:,'year'] = [x.year for x in date_counter_df.index]
date_counter_df.loc[:,'month'] = [x.month for x in date_counter_df.index]
date_counter_df.loc[:,'day'] = [x.day for x in date_counter_df.index]

result = {}
for to_unpack, df_res in date_counter_df.groupby(['year','month','day']):
    year, month, day = to_unpack
    try:
        result[year]
    except KeyError:
        result[year] = {}
    try:
        result[year][month]
    except KeyError:
        result[year][month] = {}
    try:
        result[year][month][day]
    except KeyError:
        result[year][month][day] = {}

    result[year][month][day] = df_res

Result looks like this from the original dataset;

{2019: {10: {17:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-17  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-17  2702  2615  0  0   5317                           2614   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-17                             2703  2019     10   17  ,
   30:               sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                          
   2019-10-30  11219V22151                             0                     0   

                  1     2  3  4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                 
   2019-10-30  2729  2589  0  0   5318                           2589   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-10-30                             2729  2019     10   30  },
  12: {28:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2019-12-28  61942V2809673                          3134                  3461   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2019-12-28  333  494  2801  2967   6595                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2019-12-28                                0  2019     12   28  }},
 2020: {2: {19:                 sensor-id Totalt i retning Kristiansand Totalt i retning Oslo  \
   Date                                                                            
   2020-02-19  71445V2809674                          5006                  5202   

                 1    2     3     4 Totalt Totalt i retning Fianex Rv 415  \
   Date                                                                     
   2020-02-19  686  747  4320  4455  10208                              0   

              Totalt i retning Stølen X Rv 420  year  month  day  
   Date                                                           
   2020-02-19                                0  2020      2   19  }}}

Now I only will need to add the totals for both Years and Months as dict items.

Start from conversion of Date column to Datetime (if till now it is of string type):

df.Date = pd.to_datetime(df.Date)

Then define 3 functions:

  1. A function to process each row, from a monthly group:

     def rowFn(row): return { 'Day All Cars': row['Total Cars'], 'Day All Lanes': row.Lanes }
  2. A function to process each monthly group:

     def monthGrpFun(grp): dct = { 'Month All Cars': grp['Total Cars'].sum(), 'Month All Lanes': grp.Lanes.sum() } for _, row in grp.iterrows(): dct[f'Day {row.Date.day:02}'] = rowFn(row) return dct
  3. A function to process each yearly group:

     def yearGrpFun(grp): dct = { 'Year Total Cars': grp['Total Cars'].sum(), 'Year Total Lanes': grp.Lanes.sum() } for key, grp2 in grp.groupby(grp.Date.dt.month): dct[f'M{key:02}'] = monthGrpFun(grp2) return dct

And to get the result, run:

dct = {}
for key, grp in df.groupby(df.Date.dt.year):
    dct[f'y{key}'] = yearGrpFun(grp)

The result for your data (reformatted for readability) is:

{
  'y2019': {
    'Year Total Cars': 20,
    'Year Total Lanes': 5,
    'M10': {
      'Month All Cars': 20,
      'Month All Lanes': 5,
      'Day 20': {'Day All Cars': 5, 'Day All Lanes': 2},
      'Day 23': {'Day All Cars': 15, 'Day All Lanes': 3}
    }
  },
  'y2020': {
    'Year Total Cars': 23,
    'Year Total Lanes': 2,
    'M01': {
      'Month All Cars': 23,
      'Month All Lanes': 2,
      'Day 20': {'Day All Cars': 23, 'Day All Lanes': 2}
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM