熊猫分组日期-特定日期

Question

I want to be able to group my data by user and then by specific date periods - to get counts and means and let them be created in new columns per date period. 我希望能够按用户然后按特定日期段对数据进行分组-获取计数和均值，并在每个日期段的新列中创建它们。

My data looks something like: 我的数据如下所示：

df = pd.DataFrame({
"USER_ID": ["AA1", "AB1", "AA3", "CD3", "AB4", "AA1", "AA1", "AA3", "AB4", "AB4"],
"ACTIVITY_CATEGORY": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"DATE": ['2018-09-19', '2018-09-13', '2018-09-06', '2018-09-18', '2018-09-15', '2018-09-19', '2018-09-16', '2018-09-06', '2018-09-04', '2018-09-04']})

So, I usually do it as follows: 因此，我通常按以下方式进行操作：

df.groupby(['USER_ID',pd.Grouper(key='DATE', freq='W')])['ACTIVITY_CATEGORY'].count()

But what I want now is to be able to get it for a specific week. 但是我现在想要的是能够在特定的一周内得到它。 Essentially being able to get something more like: 本质上可以得到更多类似的东西：

I have read the documentation on different manners of grouping in Grouper - and offsets that can be used. 我已经阅读了有关在Grouper中分组的不同方式的文档-以及可以使用的偏移量。 Still can not find something like this. 仍然找不到这样的东西。

There is a fairly cumbersome way of doing this using a for loop and by using timedelta and subtracting 7 days from latest day - but it is highly inefficient on a large dataset. 使用for循环和使用timedelta并从最近的日期减去7天有一种相当繁琐的方法-但在大型数据集上效率极低。 Looking for a more pythonic way. 寻找更Python化的方式。

Answer 1

is this similar to what you are trying to achieve 这与您要达到的目标相似吗

df['DATE'] = 'WEEK ' + pd.to_numeric(pd.to_datetime(df['DATE']).dt.day/7).apply(math.ceil).apply(str)
df.pivot_table(index=['USER_ID'],columns=['DATE'],aggfunc='count').fillna(0)

Out: 出：

         ACTIVITY_CATEGORY
DATE    WEEK 1  WEEK 2  WEEK 3
USER_ID         
AA1     0.0     0.0     3.0
AA3     2.0     0.0     0.0
AB1     0.0     1.0     0.0
AB4     2.0     0.0     1.0
CD3     0.0     0.0     1.0

Answer 2

IIUC, you can try this: IIUC，您可以尝试以下操作：

df_new=df.groupby(['USER_ID',pd.Grouper(key='DATE', freq='W')])['ACTIVITY_CATEGORY']\
.count().reset_index()
df_new['week_num']=(df_new.DATE.dt.day//7)+1
print(df_new.pivot_table(index='USER_ID',columns=['week_num']).fillna(0))

            ACTIVITY_CATEGORY          
week_num                 2    3    4
USER_ID                             
AA1                    0.0  1.0  2.0
AA3                    2.0  0.0  0.0
AB1                    0.0  1.0  0.0
AB4                    2.0  1.0  0.0
CD3                    0.0  0.0  1.0

If week 1 is present, it should automatically populate. 如果存在第1周，则应自动填充。

熊猫分组日期-特定日期

问题描述

2 个解决方案

解决方案1
1 2019-02-02 14:53:38

解决方案2
1 2019-02-02 15:01:36

熊猫分组日期-特定日期

问题描述

2 个解决方案

解决方案1 1 2019-02-02 14:53:38

解决方案2 1 2019-02-02 15:01:36

解决方案1
1 2019-02-02 14:53:38

解决方案2
1 2019-02-02 15:01:36