简体   繁体   中英

Alternative way of writing for loop and if in python when working with a dataframe to make it faster

I have a data frame named 'plans_to_csv' looking like this:

在此处输入图像描述

I need to do the following analysis to realize what is the actual mode. But this takes so long to run. Is there an alternative way for writing this code to make it faster? Thanks a lot for your help in advance.

for i in range (0, len(plans_to_csv)-2):
    if (plans_to_csv['mode'][i+1]=='walk' and plans_to_csv['type'][i+2]=='car interaction' and 
        plans_to_csv['person_id'][i]==plans_to_csv['person_id'][i+2]):

        plans_to_csv['actual_mode_car'][i]=1

You can shift the columsn and do comparisons. That will make use of vectorization and should be faster.

selection = (plans_to_csv['mode'].shift(-1) == 'walk') & (plans_to_csv['type'].shift(-2)=='car interaction') & (plans_to_csv['person_id'] == plans_to_csv['person_id'].shift(-2))
plans_to_csv['actual_mode_car']= selection.astype(int)

Note that this sets all the entries to 0 that don't match the comparison. If this is not wanted, you can just do plans_to_csv['actual_mode_car'][selection]= 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM