简体   繁体   English

计算熊猫数据框中的唯一日期

[英]Count unique dates in pandas dataframe

I have a dataframe of surface weather observations ( fzraHrObs ) organized by a station identifier code and date. 我有一个按台站标识符代码和日期组织的地面天气观测( fzraHrObs )数据fzraHrObs fzraHrObs has several columns of weather data. fzraHrObs具有几列天气数据。 The station code and date (datetime objects) look like: 站点代码和日期(datetime对象)如下所示:

usaf      dat
716270    2014-11-23 12:00:00
          2015-12-20 08:00:00
          2015-12-20 09:00:00
          2015-12-21 04:00:00
          2015-12-28 03:00:00
716280    2015-12-19 08:00:00
          2015-12-19 08:00:00

I would like to get a count of the number of unique dates (days) per year for each station - ie the number of days of obs per year at each station. 我想获得每个站点每年唯一日期(天)的数量的计数,即每个站点每年obs的天数。 In my example above this would give me: 在上面的示例中,这将给我:

    usaf      Year     Count
    716270    2014     1
              2015     3
    716280    2014     0
              2015     1

I've tried using groupby and grouping by station, year, and date: grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date]) 我试过使用groupby并按站点,年份和日期grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date])grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date])

Count, size, nunique, etc. on this just gives me the number of obs on each date, not the number of dates themselves per year. 计数,大小,唯一性等等都给了我每个日期的obs数,而不是每年的日期数。 Any suggestions on getting what I want here? 在这里得到我想要的任何建议吗?

Could be something like this, group the date by usaf and year and then count the number of unique values: 可能是这样,将日期按usafyear usaf ,然后计算唯一值的数量:

import pandas as pd
df.dat.apply(lambda dt: dt.date()).groupby([df.usaf, df.dat.apply(lambda dt: dt.year)]).nunique()

#   usaf   dat 
# 716270  2014    1
#         2015    3
# 716280  2015    1
# Name: dat, dtype: int64

The following should work: 以下应该工作:

df.groupby(['usaf', df.dat.dt.year])['dat'].apply(lambda s: s.dt.date.nunique())

What I did differently is group by two levels only, then use the nunique method of pandas series to count the number of unique dates in each group. 我做的不同的是仅按两个级别进行分组,然后使用pandas系列的nunique方法计算每个组中唯一日期的数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM