熊猫：根据第2列的条件分组和汇总第1列

Question

我正在尝试从R＆dplyr迁移到某些项目的python和Pandas中，并且希望弄清楚如何复制与dplyr一起使用的常见编码策略。

一种常见的情况是，我将按特定的列分组，然后计算一个涉及第三列中条件的派生列。 这是一个简单的例子：

dat = data.frame(user = rep(c("1",2,3,4),each=5),
           cancel_date = rep(c(12,5,10,11), each=5)
) %>%
  group_by(user) %>%
  mutate(login = sample(1:cancel_date[1], size = n(), replace = T)) %>%
  ungroup()

--

Source: local data frame [6 x 3]

  user cancel_date login
1    1          12     3
2    1          12     9
3    1          12    12
4    1          12     4
5    1          12     2
6    2           5     4

在此数据框中，我想计算每个用户在取消前三个月的登录次数。 在dplyr中，这很简单：

dat %>%
  group_by(user) %>%
  summarise(logins_three_mos_before_cancel = length(login[cancel_date-login>=3]))

  user logins_three_mos_before_cancel
1    1                              4
2    2                              1
3    3                              5
4    4                              3

但是我对如何做这只熊猫有些困惑。 据我所知，聚合仅在给定的分组列上应用函数，并且我不知道如何使它应用涉及多个列的函数。

这是熊猫中的相同数据：

d = { 'user' : np.repeat([1,2,3,4],5),
     'cancel_date' : np.repeat([12,5,10,11],5),
     'login' : np.array([3,  9, 12,  4,  2,  4,  3,  5,  5,  1,  3,  5,  4,  6,  3,  3,  5, 10,  7, 10])
     }
pd.DataFrame(data=d)

Answer 1

我希望我遵循了您的R，但这是您的意思吗？

>> df[df.cancel_date - df.login >= 3].user.value_counts().sort_index()
1    4
2    1
3    5
4    3
dtype: int64

熊猫：根据第2列的条件分组和汇总第1列

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-08 17:10:21

熊猫：根据第2列的条件分组和汇总第1列

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-08 17:10:21

解决方案1
2 已采纳 2015-06-08 17:10:21