Pandas groupby 然后分配

Question

我有一个带有列的长格式数据框：日期、股票代码、mcap、rank_mcap。 mcap 列是“市值”并衡量某只股票的规模，而 mcap_rank 只是它的排名版本（其中 1 是最大的市值）。

我想创建市值排名前 10 的加权资产（例如 S&P10）。 在 RI 中执行此操作

df %>%
    filter(day(date) == 1, rank_mcap < 11) %>%
    group_by(date) %>%
    mutate(weight = mcap / sum(mcap)) %>%
    ungroup() %>%

我在熊猫做什么？ 我收到以下错误

AttributeError：无法访问“DataFrameGroupBy”对象的可调用属性“assign”，请尝试使用“apply”方法

当我使用类似 R 方法的类似方法时，即在 python 中执行以下操作：

df.\
    query('included == True & date.dt.day == 1'). \
    groupby('date').\
    assign(w=df.mcap / df.mcap.sum())

我研究了http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html并没有得出结论。

Answer 1

Pandas 如何在 R 中实现 Mutate

df.query('included == True & date.dt.day == 1').\
    assign(weight = lambda x : x.groupby('date',group_keys=False).
           apply(lambda y: y.mcap / y.mcap.sum()))

Answer 2

您可以使用datar以与在 R 中所做的相同的方式执行此datar ：

from datar.all import f, filter, group_by, ungroup, mutate, sum

df >> \
    filter(f.date.day == 1, f.rank_mcap < 11) >> \
    group_by(f.date) >> \
    mutate(weight = f.mcap / sum(f.mcap)) >> \
    ungroup()

免责声明：我是datar包的作者。

Pandas groupby 然后分配

问题描述

2 个解决方案

解决方案1
1 2018-11-27 17:01:53

解决方案2
0 2021-06-24 22:49:52

Pandas groupby 然后分配

问题描述

2 个解决方案

解决方案1 1 2018-11-27 17:01:53

解决方案2 0 2021-06-24 22:49:52

解决方案1
1 2018-11-27 17:01:53

解决方案2
0 2021-06-24 22:49:52