How to vectorize pandas code where it depends on previous row?

Question

I am trying to vectorize a code snippet in pandas:

I have a pandas dataframe generated like this:

	ids	ftest	vals
0	Q52EG	0	0
1	Q52EG	0	1
2	Q52EG	1	2
3	Q52EG	1	3
4	Q52EG	1	4
5	QQ8Q4	0	5
6	QQ8Q4	0	6
7	QQ8Q4	1	7
8	QQ8Q4	1	8
9	QVIPW	1	9

If any id in ids column has a value 1 in the ftest column, then all the subsequent rows with same id should be marked as 1 in has_hist column and it doesnt depend on the current ftest value as shown in the dataframe below:

	ids	ftest	vals	has_hist
0	Q52EG	0	0	0
1	Q52EG	0	1	0
2	Q52EG	1	2	0
3	Q52EG	1	3	1
4	Q52EG	1	4	1
5	QQ8Q4	0	5	0
6	QQ8Q4	0	6	0
7	QQ8Q4	1	7	0
8	QQ8Q4	1	8	1
9	QVIPW	1	9	0

I am doing this using a iterative approach like this:

previous_present = {}
has_prv_history = []
for index, value in id_df.iterrows():
    my_id = value["ids"]
    ftest_mentioned = value["ftest"]
    previous_flag = 0
    if my_id in previous_present.keys():
        previous_flag = 1
    elif (ftest_mentioned == 1):
        previous_present[my_id] = 1
    has_prv_history.append(previous_flag)
id_df["has_hist"] = has_prv_history

Can this code be vectorized without using apply ?

Answer 1

Two key functions for this kind of tasks are shift and ffill , applied per group. For this specific question:

df2["has_hist"] = df.groupby("ids").ftest.shift().where(lambda s: s.eq(1))
df2["has_hist"] = df2.groupby("ids").has_hist.ffill().fillna(0).astype("int32")

Here is a variant with transform , which however is often slower than "pure" Pandas operations in my experience:

df2 = (
    df
    .groupby("ids")
    .ftest.transform(
        lambda s: (
            s
            .shift()
            .where(lambda t: t.eq(1))
            .ffill()
            .fillna(0)
            .astype("int32")
        )
    )
)

How to vectorize pandas code where it depends on previous row?

Question

1 answers

solution1
2 ACCPTED 2021-06-11 12:48:14

How to vectorize pandas code where it depends on previous row?

Question

1 answers

solution1 2 ACCPTED 2021-06-11 12:48:14

solution1
2 ACCPTED 2021-06-11 12:48:14