简体   繁体   中英

Pandas Multi Index: How to merge multi-index (YYYY, MM) into a single (YYYY-MM) index?

I've been working on this problem for a little bit and am really close. Essentially, I want to create a time series of event counts by type from an event database. I'm really close. Here's what I've done so far:

Starting with an abbreviated version of my dataframe:

   event_date  year time_precision event_type  \
0  2020-10-24  2020              1    Battles   
1  2020-10-24  2020              1      Riots   
2  2020-10-24  2020              1      Riots   
3  2020-10-24  2020              1    Battles   
4  2020-10-24  2020              2    Battles   

I want the time series to be by month and year, so first I convert the dates to datetime :

nga_df.event_date = pd.to_datetime(nga_df.event_date)

Then, I want to create a time series of events by type, so I one-hot encode them:

nga_df = pd.get_dummies(nga_df, columns=['event_type'], prefix='', prefix_sep='')

Next, I need to extract the month, so that I can create monthly counts:

nga_df['month'] = nga_df.event_date.apply(lambda x: x.month)

finally, and I am so close here, I group my data by month and year and take the transpose:

conflict_series = nga_df.groupby(['year','month']).sum()
conflict_series.T

Which results in this lovely new dataframe:

year                       1997                       ...  2020             
month                        1   2   3    4   5   6   ...    5     6    7    
fatalities                   11  30  38  112  17  29  ...  1322  1015  619   
Battles                       4   4   5   13   2   2  ...    77    99   74   
Explosions/Remote violence    2   1   0    0   3   0  ...    38    28   17   
Protests                      1   0   0    1   0   1  ...    31    83   50   
Riots                         3   3   4    1   4   1  ...    27    14   18   
Strategic developments        1   0   0    0   0   0  ...     7     2    7   
Violence against civilians    3   5   7    3   2   1  ...   135   112   88 

So, I guess what I need to do is combine my index (columns after transpose) so that they are a single index. How do I do this?

The end goal is to combine this data with economic indicators to see if there is a trend, so I need both datasets to be in the same form, where the columns are monthly counts of different values.

Here's how I did it:

Step 1 : flatten index:

# convert the multi-index to a flat set of tuples: (YYYY, MM)
index = conflict_series.index.to_flat_index().to_series()

Step 2 : Add arbitrary but requiredend-of-month for conversion to date-time:

index = index.apply(lambda x: x + (28,))

Step 3 : Convert resulting 3-tuple to date time:

index = index.apply(lambda x: datetime.date(*x))

Step 4 : reset the DataFrame index:

conflict_series.set_index(index, inplace=True)

Results:

            fatalities  Battles  Explosions/Remote violence  Protests  Riots  \
1997-01-28          11        4                           2         1      3   
1997-02-28          30        4                           1         0      3   
1997-03-28          38        5                           0         0      4   
1997-04-28         112       13                           0         1      1   
1997-05-28          17        2                           3         0      4   

            Strategic developments  Violence against civilians  total_events  
1997-01-28                       1                           3            14  
1997-02-28                       0                           5            13  
1997-03-28                       0                           7            16  
1997-04-28                       0                           3            18  
1997-05-28                       0                           2            11  

And now the plot I was looking for:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM