I have a data frame where I am calculating time difference and due to some issue, some of my time differences are less than 0(zero) , now I want to iterate through this data and check the condition that if time difference is less than 0 then I want to add a specific value to that.
Here is the image, This is the data that I am getting and I want to manipulate the column 'TIME'.
Although I have tried this
for row in df_all.rows:
if df_all.iloc[row]['Time'].values >=43200:
df_all.iloc[row]['Time']=df_all.iloc[row]['Time'].values-43200
elif df_all.iloc[row]['Time'].values <0:
df_all.iloc[row]['Time']=df_all.iloc[row]['Time'].values+43200
else:
pass
You don't need an explicit loop ( df.iterrows
) or an implicit loop ( df.apply
).
Instead, you can use vectorised pandas
functionality:
df.loc[df['Time'] >= 43200, 'Time'] -= 43200
df.loc[df['Time'] < 0, 'Time'] += 43200
This is going to be significantly faster, and possibly easier to maintain, than any loop.
Below is some benchmarking versus a loop-based solution.
Performance benchmarking
import numpy as np, pandas as pd
df_all = pd.DataFrame({'Time':np.random.uniform(-500,50000, size=(10000,))})
def jp(df):
df.loc[df['Time'] >= 43200, 'Time'] -= 43200
df.loc[df['Time'] < 0, 'Time'] += 43200
return df
def dl(df):
def _time(x):
_out = x
if _out >= 43200:
_out -= 43200
if _out <0:
_out += 43200
return _out
df['Time'] = list(map(_time,df['Time'].values))
return df
%timeit jp(df_all) # 3.5ms
%timeit dl(df_all) # 8.5ms
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.