繁体   English   中英

如何在多个条件下对Pandas数据进行分组?

[英]How to group data on Pandas with multiple conditions?

这是我的桌子

timestamp        date month  day   hour   price
0  2017-01-01 00:00  01/01/2017   Jan  Sun  00:00  60.23
1  2017-01-01 01:00  01/01/2017   Jan  Sun  01:00  60.73
2  2017-01-01 02:00  01/01/2017   Jan  Sun  02:00  75.99
3  2017-01-01 03:00  01/01/2017   Jan  Sun  03:00  60.76
4  2017-01-01 04:00  01/01/2017   Jan  Sun  04:00  49.01

我每天24小时都有数据,每天都有一整年的数据。

我希望将每个季节的数据分组为工作日和周末,例如Weekend_Winter = 11月,12月,1月,2月的所有星期六和星期日数据

相当新手,所以任何帮助都会有用

如果希望按条件过滤数据,则使用boolean indexing和比较dayofweek创建的布尔掩码,使用isin作为列表L检查成员资格:

#changed timestamp values only for better sample
print (df)
            timestamp        date month  day   hour  price
0 2017-01-01 00:00:00  01/01/2017   Jan  Sun  00:00  60.23
1 2017-01-03 00:00:00  01/01/2017   Jan  Sun  00:00  60.23
2 2017-02-01 01:00:00  01/01/2017   Jan  Sun  01:00  60.73
3 2017-02-05 01:00:00  01/01/2017   Jan  Sun  01:00  60.73
4 2017-03-01 02:00:00  01/01/2017   Jan  Sun  02:00  75.99
5 2017-04-01 03:00:00  01/01/2017   Jan  Sun  03:00  60.76
6 2017-11-01 04:00:00  01/01/2017   Jan  Sun  04:00  49.01

L = ['Nov','Dec','Jan','Feb']
mask = (df['timestamp'].dt.dayofweek > 4) & (df['month'].isin(L))
df1 = df[mask]
print (df1)
            timestamp        date month  day   hour  price
0 2017-01-01 00:00:00  01/01/2017   Jan  Sun  00:00  60.23
3 2017-02-05 01:00:00  01/01/2017   Jan  Sun  01:00  60.73
5 2017-04-01 03:00:00  01/01/2017   Jan  Sun  03:00  60.76

如果需要季节和日期类型的新列:

df['season'] = (df['timestamp'].dt.month%12 + 3) // 3
df['state'] = np.where(df['timestamp'].dt.dayofweek > 4, 'weekend','weekdays')
print (df)
            timestamp        date month  day   hour  price  season     state
0 2017-01-01 00:00:00  01/01/2017   Jan  Sun  00:00  60.23       1   weekend
1 2017-01-03 00:00:00  01/01/2017   Jan  Sun  00:00  60.23       1  weekdays
2 2017-02-01 01:00:00  01/01/2017   Jan  Sun  01:00  60.73       1  weekdays
3 2017-02-05 01:00:00  01/01/2017   Jan  Sun  01:00  60.73       1   weekend
4 2017-03-01 02:00:00  01/01/2017   Jan  Sun  02:00  75.99       2  weekdays
5 2017-04-01 03:00:00  01/01/2017   Jan  Sun  03:00  60.76       2   weekend
6 2017-11-01 04:00:00  01/01/2017   Jan  Sun  04:00  49.01       4  weekdays

并且它可以用于具有聚合的groupby ,例如通过sum

df2 = df.groupby(['season','state'], as_index=False)['price'].sum()
print (df2)
   season     state   price
0       1  weekdays  120.96
1       1   weekend  120.96
2       2  weekdays   75.99
3       2   weekend   60.76
4       4  weekdays   49.01

下面的解决方案与@jezrael在那个季节略有不同,并且明确定义了工作日。

import pandas as pd

df = pd.DataFrame([['2017-01-01 00:00', '01/01/2017', 'Jan', 'Mon', '00:00', 60.23],
                   ['2017-01-01 01:00', '01/01/2017', 'Jan', 'Sat', '01:00', 60.73],
                   ['2017-01-01 02:00', '01/01/2017', 'May', 'Tue', '02:00', 75.99],
                   ['2017-01-01 03:00', '01/01/2017', 'Jan', 'Sun', '03:00', 60.76],
                   ['2017-01-01 04:00', '01/01/2017', 'Sep', 'Sat', '04:00', 49.01]],
                   columns=['timestamp', 'date', 'month', 'day', 'hour', 'price'])

def InvertKeyListDictionary(input_dict):
    return {w: k for k, v in input_dict.items() for w in v}

season_map = {'Spring': ['Mar', 'Apr', 'May'],
              'Summer': ['Jun', 'Jul', 'Aug'],
              'Autumn': ['Sep', 'Oct', 'Nov'],
              'Winter': ['Dec', 'Jan', 'Feb']}

weekend_map = {'Weekday': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
               'Weekend': ['Sat', 'Sun']}

month_map = InvertKeyListDictionary(season_map)
day_map = InvertKeyListDictionary(weekend_map)

df['season'] = df['month'].map(month_map)
df['daytype'] = df['day'].map(day_map)

df_groups = df.groupby(['season', 'daytype'])

df_groups.get_group(('Winter', 'Weekend'))

# output
# timestamp date month day hour price season daytype
# 2017-01-01 01:00 01/01/2017 Jan Sat 01:00 60.73 Winter Weekend 
# 2017-01-01 03:00 01/01/2017 Jan Sun 03:00 60.76 Winter Weekend 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM