I've been working on this problem for a little bit and am really close. Essentially, I want to create a time series of event counts by type from an event database. I'm really close. Here's what I've done so far:
Starting with an abbreviated version of my dataframe:
event_date year time_precision event_type \
0 2020-10-24 2020 1 Battles
1 2020-10-24 2020 1 Riots
2 2020-10-24 2020 1 Riots
3 2020-10-24 2020 1 Battles
4 2020-10-24 2020 2 Battles
I want the time series to be by month and year, so first I convert the dates to datetime
:
nga_df.event_date = pd.to_datetime(nga_df.event_date)
Then, I want to create a time series of events by type, so I one-hot encode them:
nga_df = pd.get_dummies(nga_df, columns=['event_type'], prefix='', prefix_sep='')
Next, I need to extract the month, so that I can create monthly counts:
nga_df['month'] = nga_df.event_date.apply(lambda x: x.month)
finally, and I am so close here, I group my data by month and year and take the transpose:
conflict_series = nga_df.groupby(['year','month']).sum()
conflict_series.T
Which results in this lovely new dataframe:
year 1997 ... 2020
month 1 2 3 4 5 6 ... 5 6 7
fatalities 11 30 38 112 17 29 ... 1322 1015 619
Battles 4 4 5 13 2 2 ... 77 99 74
Explosions/Remote violence 2 1 0 0 3 0 ... 38 28 17
Protests 1 0 0 1 0 1 ... 31 83 50
Riots 3 3 4 1 4 1 ... 27 14 18
Strategic developments 1 0 0 0 0 0 ... 7 2 7
Violence against civilians 3 5 7 3 2 1 ... 135 112 88
So, I guess what I need to do is combine my index (columns after transpose) so that they are a single index. How do I do this?
The end goal is to combine this data with economic indicators to see if there is a trend, so I need both datasets to be in the same form, where the columns are monthly counts of different values.
Here's how I did it:
Step 1 : flatten index:
# convert the multi-index to a flat set of tuples: (YYYY, MM)
index = conflict_series.index.to_flat_index().to_series()
Step 2 : Add arbitrary but requiredend-of-month for conversion to date-time:
index = index.apply(lambda x: x + (28,))
Step 3 : Convert resulting 3-tuple to date time:
index = index.apply(lambda x: datetime.date(*x))
Step 4 : reset the DataFrame index:
conflict_series.set_index(index, inplace=True)
Results:
fatalities Battles Explosions/Remote violence Protests Riots \
1997-01-28 11 4 2 1 3
1997-02-28 30 4 1 0 3
1997-03-28 38 5 0 0 4
1997-04-28 112 13 0 1 1
1997-05-28 17 2 3 0 4
Strategic developments Violence against civilians total_events
1997-01-28 1 3 14
1997-02-28 0 5 13
1997-03-28 0 7 16
1997-04-28 0 3 18
1997-05-28 0 2 11
And now the plot I was looking for:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.