I have a DataFrame which looks like this:
Col1 Col2 Col3 Col4
Datetime
2016-11-01 1 Male 01/11/2016 Durham
2016-11-01 2 Female 01/11/2016 Durham
2016-11-02 3 Female 02/11/2016 New
2016-11-02 4 Male 02/11/2016 Ips
2016-11-03 5 Male 03/11/2016 Durham
What I am trying to do, is return the count of Col4 entries per day and hence return information like:
ColA ColB
Datetime
2016-11-01 Durham 2
2016-11-02 New 1
2016-11-02 Ips 1
2016-11-03 Durham 1
IE Durham occurred twice on the 1st, so it has a count of 2. New and Ips both occurred once on the 2nd, so they both have a count of 1. Finally Durham occurred once on the 3rd, so it will be given a count of 1.
Ultimately I am trying to define a "frequency" so that I can define a "hotspot" by region. If something occurs at least once every day, then I'll call it a "hotspot".
You can use groupby
on ( Datetime
, Col4
) + count
here.
df = df.groupby([df.index, df.Col4]).Col4.count().reset_index(level=1, name='ColB')
Or,
df = df.groupby([df.index, df.Col4]).size().reset_index(level=1)
Next, set the column names:
df.columns = ['ColA', 'ColB']
df
ColA ColB
Datetime
2016-11-01 Durham 2
2016-11-02 Ips 1
2016-11-02 New 1
2016-11-03 Durham 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.