简体   繁体   English

如何对依赖于前一行的 pandas 代码进行矢量化?

[英]How to vectorize pandas code where it depends on previous row?

I am trying to vectorize a code snippet in pandas:我正在尝试对 pandas 中的代码片段进行矢量化:

I have a pandas dataframe generated like this:我有一个像这样生成的 pandas dataframe :

ids身份证 ftest测试 vals瓦尔斯
0 0 Q52EG Q52EG 0 0 0 0
1 1 Q52EG Q52EG 0 0 1 1
2 2 Q52EG Q52EG 1 1 2 2
3 3 Q52EG Q52EG 1 1 3 3
4 4 Q52EG Q52EG 1 1 4 4
5 5 QQ8Q4 QQ8Q4 0 0 5 5
6 6 QQ8Q4 QQ8Q4 0 0 6 6
7 7 QQ8Q4 QQ8Q4 1 1 7 7
8 8 QQ8Q4 QQ8Q4 1 1 8 8
9 9 QVIPW QVIPW 1 1 9 9

If any id in ids column has a value 1 in the ftest column, then all the subsequent rows with same id should be marked as 1 in has_hist column and it doesnt depend on the current ftest value as shown in the dataframe below:如果ids列中的任何 id 在has_hist列中具有值 1,则在ftest列中所有具有相同 id 的后续行都应标记为 1,并且它不依赖于当前ftest值,如下面的 dataframe 所示:

ids身份证 ftest测试 vals瓦尔斯 has_hist has_hist
0 0 Q52EG Q52EG 0 0 0 0 0 0
1 1 Q52EG Q52EG 0 0 1 1 0 0
2 2 Q52EG Q52EG 1 1 2 2 0 0
3 3 Q52EG Q52EG 1 1 3 3 1 1
4 4 Q52EG Q52EG 1 1 4 4 1 1
5 5 QQ8Q4 QQ8Q4 0 0 5 5 0 0
6 6 QQ8Q4 QQ8Q4 0 0 6 6 0 0
7 7 QQ8Q4 QQ8Q4 1 1 7 7 0 0
8 8 QQ8Q4 QQ8Q4 1 1 8 8 1 1
9 9 QVIPW QVIPW 1 1 9 9 0 0

I am doing this using a iterative approach like this:我正在使用这样的迭代方法来做到这一点:

previous_present = {}
has_prv_history = []
for index, value in id_df.iterrows():
    my_id = value["ids"]
    ftest_mentioned = value["ftest"]
    previous_flag = 0
    if my_id in previous_present.keys():
        previous_flag = 1
    elif (ftest_mentioned == 1):
        previous_present[my_id] = 1
    has_prv_history.append(previous_flag)
id_df["has_hist"] = has_prv_history

Can this code be vectorized without using apply ?可以在不使用apply的情况下对这段代码进行矢量化吗?

Two key functions for this kind of tasks are shift and ffill , applied per group.此类任务的两个关键功能是shiftffill ,每组应用。 For this specific question:对于这个特定的问题:

df2["has_hist"] = df.groupby("ids").ftest.shift().where(lambda s: s.eq(1))
df2["has_hist"] = df2.groupby("ids").has_hist.ffill().fillna(0).astype("int32")

Here is a variant with transform , which however is often slower than "pure" Pandas operations in my experience:这是一个带有transform的变体,但是根据我的经验,它通常比“纯” Pandas 操作慢:

df2 = (
    df
    .groupby("ids")
    .ftest.transform(
        lambda s: (
            s
            .shift()
            .where(lambda t: t.eq(1))
            .ffill()
            .fillna(0)
            .astype("int32")
        )
    )
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对熊猫数据框计算进行矢量化,如果不满足条件,则输入前一行的数据? - How to vectorize a pandas dataframe calculation where if a conditional is not met the data from the previous row is entered? Python:如何填充依赖于前一个值(前一行)的Pandas列? - Python: How to populate Pandas column that depends on the previous value (previous row)? 矢量化或优化循环,其中每次迭代取决于前一次迭代的状态 - Vectorize or optimize an loop where each iteration depends on the state of the previous iteration Pandas 列取决于其先前的值(行)? - Pandas column that depends on its previous value (row)? Pandas 如何对依赖于先前行的计算进行矢量化 - Pandas how to vectorize a calculation that relies on previous rows Pandas:我想在时间序列中创建一列,其中的值取决于前一行的值 - Pandas: I want to create a column in a Time Series where the value depends on the previous row's value 在大熊猫数据框中对代码进行矢量化处理,其中每一行都应视为一个numpy数组 - Vectorize code in big pandas Dataframe, where each row should be treated as a numpy array 是否可以矢量化 NumPy 数组的递归计算,其中每个元素都依赖于前一个元素? - Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one? Python Pandas:如何对使用先前值的操作进行向量化? - Python Pandas: How to vectorize an operation that uses previous values? 如何向量化使用 Pandas 中先前值的操作 - How to vectorize an operation which uses previous values in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM