简体   繁体   English

Python:按小时,日和月(按年份分组)过滤熊猫中的DataFrame

[英]Python: Filter DataFrame in Pandas by hour, day and month grouped by year

Being new to Pandas I had to dig a lot in order to find a solution to this problem. 作为熊猫的新手,我不得不花很多时间才能找到解决该问题的方法。 I would like to know a better way to get this resolved, taking into account I still need to resolve the border problems. 考虑到我仍然需要解决边界问题,我想知道一种解决此问题的更好方法。

I have a set of 10 minutal measures of "Power" from 2009 till 2012 and want to get a window of hours and day/month for all the years (ie Filter by hour, day and month grouped by year). 我有一套从2009年到2012年的10项“动力”的小量指标,并希望获得所有年份的小时和日/月窗口(即按年份,按小时,日和月分组的过滤器)。

What I have come to is as follows: 我得出的结论如下:

import pandas as pd
import numpy as np
import datetime

dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="10min")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])

def filter(df, day, month, hour, daysWindow, hoursWindow):
    """
    Filter a Dataframe by a date window and hour window grouped by years

    @type df: DataFrame
    @param df: DataFrame with dates and values

    @type day: int
    @param day: Day to focus on

    @type month: int
    @param month: Month to focus on

    @type hour: int
    @param hour: Hour to focus on

    @type daysWindow: int
    @param daysWindow: Number of days to perform the days window selection

    @type hourWindow: int
    @param hourWindow: Number of hours to perform the hours window selection

    @rtype: DataFrame
    @return: Returns a DataFrame with the
    """
    df_filtered = None
    grouped = df.groupby(lambda x : x.year)
    for year, groupYear in grouped:
        groupedMonthDay = groupYear.groupby(lambda x : (x.month, x.day))
        for monthDay, groupMonthDay in groupedMonthDay:
            if monthDay >= (month,day - daysWindow) and monthDay <= (month,day + daysWindow):
                new_df = groupMonthDay.ix[groupMonthDay.index.indexer_between_time(datetime.time(hour - hoursWindow), datetime.time(hour + hoursWindow))]
                if df_filtered is None:
                    df_filtered = new_df
                else:
                    df_filtered = df_filtered.append(new_df)
    return df_filtered

df_filtered = filter(df,day=8, month=10, hour=8, daysWindow=1, hoursWindow=1)
print len(df)
print len(df_filtered)

Which returns as output: 返回作为输出:

>>> 
157825
117

Of course there would be an improvement this code needs regarding border issues when selecting an hour like 1 and hoursWindow 2. ie: 当然,在选择像1和hoursWindow 2这样的小时时,此代码在边界问题方面需要改进。即:

>>> filter(df,day=8, month=10, hour=1, daysWindow=1, hoursWindow=2)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "D:\tmp\test_filtro.py", line 40, in filter
    new_df = groupMonthDay.ix[groupMonthDay.index.indexer_between_time(datetime.time(hour - hoursWindow), datetime.time(hour + hoursWindow))]
ValueError: hour must be in 0..23

Similar issue would happen when selecting a day like 1 or 30. 选择1或30之类的日期时也会发生类似的问题。

How could this code be improved? 如何改进此代码?

Updated code for filter function ensures there is no border issues: filter功能的更新代码可确保没有边界问题:

import pandas as pd
import numpy as np
import datetime

dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="10min")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])

def filter(df, day, month, hour, minute=0, daysWindow=1, hoursWindow=1):
    """
    Filter a Dataframe by a date window and hour window grouped by years

    @type df: DataFrame
    @param df: DataFrame with dates and values

    @type day: int
    @param day: Day to focus on

    @type month: int
    @param month: Month to focus on

    @type hour: int
    @param hour: Hour to focus on

    @type daysWindow: int
    @param daysWindow: Number of days to perform the days window selection

    @type hoursWindow: int
    @param hourWindow: Number of hours to perform the hours window selection

    @rtype: DataFrame
    @return: Returns a DataFrame with the
    """
    df_filtered = None
    grouped = df.groupby(lambda x : x.year)
    for year, groupYear in grouped:
        date = datetime.date(year, month, day)
        dateStart = date - datetime.timedelta(days=daysWindow)
        dateEnd = date + datetime.timedelta(days=daysWindow+1)
        df_filtered_days = df[dateStart:dateEnd]
        timeStart = datetime.time(0 if hour-hoursWindow < 0 else hour-hoursWindow, minute)
        timeEnd = datetime.time(23 if hour+hoursWindow > 23 else hour+hoursWindow, minute)
        new_df = df_filtered_days.ix[df_filtered_days.index.indexer_between_time(timeStart, timeEnd)]
        if df_filtered is None:
            df_filtered = new_df
        else:
            df_filtered = df_filtered.append(new_df)
    return df_filtered

df_filtered = filter(df,day=8, month=10, hour=1, daysWindow=1, hoursWindow=2)
print len(df)
print len(df_filtered)

Output is: 输出为:

>>> 
157825
174

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 按年将 DataFrame 组织成列并按日月索引 - PYTHON - PANDAS - Organize DataFrame into columns by year and index by day-month - PYTHON - PANDAS python中的时间戳(年、月、日、小时、分钟) - Timestamp in python (year,month,day,hour,minute) 如何使用熊猫按月,日,年过滤 - How to filter by month, day, year with Pandas Python Pandas 数据框:对于一年中的每个月,如果月份不存在,则将当月最后一天的日期添加到索引中,或者删除重复项 - Python Pandas dataframe: For each month of the year, add the date with last day in the month to an index if month not present, or remove duplicates Python Dataframe 将日期时间解析为年、月、日、时、分、秒的列 - Python Dataframe parse datetime into columns for year, month, day, hour, minute, second Pandas:绘图时忽略索引中的年份,但保留小时/天/月 - Pandas: ignore year in index when plotting but keep hour/day/month 在python / pandas中将年/月转换为年/月/日 - Convert year/month to year/month/day in python/pandas 使用python / pandas将月,日,年转换为月,年? - Convert month,day,year to month,year with python/pandas? Python Pandas groupby月日年周 - Python Pandas groupby month day year week 用Python解析年,月,日,时,分,秒 - Parsing year, month, day, hour, minute, second in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM