简体   繁体   中英

Return the count of unique column entries per day in a datetime DataFrame

I have a DataFrame which looks like this:

            Col1    Col2        Col3    Col4
Datetime                                    
2016-11-01     1    Male  01/11/2016  Durham
2016-11-01     2  Female  01/11/2016  Durham
2016-11-02     3  Female  02/11/2016     New
2016-11-02     4    Male  02/11/2016     Ips
2016-11-03     5    Male  03/11/2016  Durham

What I am trying to do, is return the count of Col4 entries per day and hence return information like:

            ColA        ColB
Datetime                                    
2016-11-01     Durham   2
2016-11-02     New      1
2016-11-02     Ips      1
2016-11-03     Durham   1

IE Durham occurred twice on the 1st, so it has a count of 2. New and Ips both occurred once on the 2nd, so they both have a count of 1. Finally Durham occurred once on the 3rd, so it will be given a count of 1.

Ultimately I am trying to define a "frequency" so that I can define a "hotspot" by region. If something occurs at least once every day, then I'll call it a "hotspot".

You can use groupby on ( Datetime , Col4 ) + count here.

df = df.groupby([df.index, df.Col4]).Col4.count().reset_index(level=1, name='ColB')

Or,

df = df.groupby([df.index, df.Col4]).size().reset_index(level=1)

Next, set the column names:

df.columns = ['ColA', 'ColB']

df

              ColA  ColB
Datetime                
2016-11-01  Durham     2
2016-11-02     Ips     1
2016-11-02     New     1
2016-11-03  Durham     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM