简体   繁体   中英

Computing daily occurrence for non-numeric column in pandas dataframe

I have the foll. dataframe (hourly time stamp index):

                      relative_humidity                 condition   fid
2017-08-02 10:00:00               0.49  Chance of a Thunderstorm     1
2017-08-02 11:00:00               0.50  Chance of a Thunderstorm     1
2017-08-02 12:00:00               0.54             Partly Cloudy     1
2017-08-02 13:00:00               0.58             Partly Cloudy     2
2017-08-02 14:00:00               0.68             Partly Cloudy     2

How can I compute the condition which occurs most often daily and put that in a dataframe with the date as index. Also need to separate by fid ?

I tried:

df.groupby(['fid', pd.Grouper(freq='D')])['condition']

You need value_counts with index[0] , because data are sorted and first value is top:

d = {'level_1':'date'}
df1 = df.groupby(['fid', pd.Grouper(freq='D')])['condition'] \
       .apply(lambda x: x.value_counts().index[0]).reset_index().rename(columns=d)
print (df1)
   fid       date                 condition
0    1 2017-08-02  Chance of a Thunderstorm
1    2 2017-08-02             Partly Cloudy
df.groupby(['fid',pd.Grouper(freq='D'),'condition']).size().groupby(level=[0,1]).head(1)

Output:

fid              condition               
1    2017-08-02  Chance of a Thunderstorm    2
2    2017-08-02  Partly Cloudy               2
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM