I am trying to vectorize a code snippet in pandas:
I have a pandas dataframe generated like this:
ids | ftest | vals | |
---|---|---|---|
0 | Q52EG | 0 | 0 |
1 | Q52EG | 0 | 1 |
2 | Q52EG | 1 | 2 |
3 | Q52EG | 1 | 3 |
4 | Q52EG | 1 | 4 |
5 | QQ8Q4 | 0 | 5 |
6 | QQ8Q4 | 0 | 6 |
7 | QQ8Q4 | 1 | 7 |
8 | QQ8Q4 | 1 | 8 |
9 | QVIPW | 1 | 9 |
If any id in ids
column has a value 1 in the ftest
column, then all the subsequent rows with same id should be marked as 1 in has_hist
column and it doesnt depend on the current ftest
value as shown in the dataframe below:
ids | ftest | vals | has_hist | |
---|---|---|---|---|
0 | Q52EG | 0 | 0 | 0 |
1 | Q52EG | 0 | 1 | 0 |
2 | Q52EG | 1 | 2 | 0 |
3 | Q52EG | 1 | 3 | 1 |
4 | Q52EG | 1 | 4 | 1 |
5 | QQ8Q4 | 0 | 5 | 0 |
6 | QQ8Q4 | 0 | 6 | 0 |
7 | QQ8Q4 | 1 | 7 | 0 |
8 | QQ8Q4 | 1 | 8 | 1 |
9 | QVIPW | 1 | 9 | 0 |
I am doing this using a iterative approach like this:
previous_present = {}
has_prv_history = []
for index, value in id_df.iterrows():
my_id = value["ids"]
ftest_mentioned = value["ftest"]
previous_flag = 0
if my_id in previous_present.keys():
previous_flag = 1
elif (ftest_mentioned == 1):
previous_present[my_id] = 1
has_prv_history.append(previous_flag)
id_df["has_hist"] = has_prv_history
Can this code be vectorized without using apply
?
Two key functions for this kind of tasks are shift
and ffill
, applied per group. For this specific question:
df2["has_hist"] = df.groupby("ids").ftest.shift().where(lambda s: s.eq(1))
df2["has_hist"] = df2.groupby("ids").has_hist.ffill().fillna(0).astype("int32")
Here is a variant with transform
, which however is often slower than "pure" Pandas operations in my experience:
df2 = (
df
.groupby("ids")
.ftest.transform(
lambda s: (
s
.shift()
.where(lambda t: t.eq(1))
.ffill()
.fillna(0)
.astype("int32")
)
)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.