简体   繁体   中英

How can I iterate over each row of a pandas data frame with a fixed column and perform operation on the basis of conditions in python?

I have a data frame where I am calculating time difference and due to some issue, some of my time differences are less than 0(zero) , now I want to iterate through this data and check the condition that if time difference is less than 0 then I want to add a specific value to that.

Here is the image, This is the data that I am getting and I want to manipulate the column 'TIME'.

Although I have tried this

for row in df_all.rows:
    if df_all.iloc[row]['Time'].values >=43200:
        df_all.iloc[row]['Time']=df_all.iloc[row]['Time'].values-43200
    elif df_all.iloc[row]['Time'].values <0:
        df_all.iloc[row]['Time']=df_all.iloc[row]['Time'].values+43200
    else:
        pass 

You don't need an explicit loop ( df.iterrows ) or an implicit loop ( df.apply ).

Instead, you can use vectorised pandas functionality:

df.loc[df['Time'] >= 43200, 'Time'] -= 43200
df.loc[df['Time'] < 0, 'Time'] += 43200

This is going to be significantly faster, and possibly easier to maintain, than any loop.

Below is some benchmarking versus a loop-based solution.

Performance benchmarking

import numpy as np, pandas as pd

df_all = pd.DataFrame({'Time':np.random.uniform(-500,50000, size=(10000,))})

def jp(df):
    df.loc[df['Time'] >= 43200, 'Time'] -= 43200
    df.loc[df['Time'] < 0, 'Time'] += 43200
    return df

def dl(df):
    def _time(x):
        _out = x
        if _out >= 43200:
            _out -= 43200
        if _out <0:
            _out += 43200
        return _out
    df['Time'] = list(map(_time,df['Time'].values))
    return df

%timeit jp(df_all)  # 3.5ms
%timeit dl(df_all)  # 8.5ms

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM