如何对依赖于前一行的 pandas 代码进行矢量化？

Question

I am trying to vectorize a code snippet in pandas:我正在尝试对 pandas 中的代码片段进行矢量化：

I have a pandas dataframe generated like this:我有一个像这样生成的 pandas dataframe ：

	ids身份证	ftest测试	vals瓦尔斯
0 0	Q52EG Q52EG	0 0	0 0
1 1	Q52EG Q52EG	0 0	1 1
2 2	Q52EG Q52EG	1 1	2 2
3 3	Q52EG Q52EG	1 1	3 3
4 4	Q52EG Q52EG	1 1	4 4
5 5	QQ8Q4 QQ8Q4	0 0	5 5
6 6	QQ8Q4 QQ8Q4	0 0	6 6
7 7	QQ8Q4 QQ8Q4	1 1	7 7
8 8	QQ8Q4 QQ8Q4	1 1	8 8
9 9	QVIPW QVIPW	1 1	9 9

If any id in ids column has a value 1 in the ftest column, then all the subsequent rows with same id should be marked as 1 in has_hist column and it doesnt depend on the current ftest value as shown in the dataframe below:如果ids列中的任何 id 在has_hist列中具有值 1，则在ftest列中所有具有相同 id 的后续行都应标记为 1，并且它不依赖于当前ftest值，如下面的 dataframe 所示：

	ids身份证	ftest测试	vals瓦尔斯	has_hist has_hist
0 0	Q52EG Q52EG	0 0	0 0	0 0
1 1	Q52EG Q52EG	0 0	1 1	0 0
2 2	Q52EG Q52EG	1 1	2 2	0 0
3 3	Q52EG Q52EG	1 1	3 3	1 1
4 4	Q52EG Q52EG	1 1	4 4	1 1
5 5	QQ8Q4 QQ8Q4	0 0	5 5	0 0
6 6	QQ8Q4 QQ8Q4	0 0	6 6	0 0
7 7	QQ8Q4 QQ8Q4	1 1	7 7	0 0
8 8	QQ8Q4 QQ8Q4	1 1	8 8	1 1
9 9	QVIPW QVIPW	1 1	9 9	0 0

I am doing this using a iterative approach like this:我正在使用这样的迭代方法来做到这一点：

previous_present = {}
has_prv_history = []
for index, value in id_df.iterrows():
    my_id = value["ids"]
    ftest_mentioned = value["ftest"]
    previous_flag = 0
    if my_id in previous_present.keys():
        previous_flag = 1
    elif (ftest_mentioned == 1):
        previous_present[my_id] = 1
    has_prv_history.append(previous_flag)
id_df["has_hist"] = has_prv_history

Can this code be vectorized without using apply ?可以在不使用apply的情况下对这段代码进行矢量化吗？

Answer 1

Two key functions for this kind of tasks are shift and ffill , applied per group.此类任务的两个关键功能是shift和ffill ，每组应用。 For this specific question:对于这个特定的问题：

df2["has_hist"] = df.groupby("ids").ftest.shift().where(lambda s: s.eq(1))
df2["has_hist"] = df2.groupby("ids").has_hist.ffill().fillna(0).astype("int32")

Here is a variant with transform , which however is often slower than "pure" Pandas operations in my experience:这是一个带有transform的变体，但是根据我的经验，它通常比“纯” Pandas 操作慢：

df2 = (
    df
    .groupby("ids")
    .ftest.transform(
        lambda s: (
            s
            .shift()
            .where(lambda t: t.eq(1))
            .ffill()
            .fillna(0)
            .astype("int32")
        )
    )
)

如何对依赖于前一行的 pandas 代码进行矢量化？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-06-11 12:48:14

如何对依赖于前一行的 pandas 代码进行矢量化？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-06-11 12:48:14

解决方案1
2 已采纳 2021-06-11 12:48:14