[英]Code using to convert time is taking too long
I have a dataframe as follows (reproducible data):我有一个 dataframe 如下(可重现的数据):
np.random.seed(365)
rows = 17000
data = np.random.uniform(20.25, 23.625, size=(rows, 1))
df = pd.DataFrame(data , columns=['Ta'])
'Set index'
Epoch_Start=1636757999
Epoch_End=1636844395
time = np.arange(Epoch_Start,Epoch_End,5)
df['Epoch']=pd.DataFrame(time)
df.reset_index(drop=True, inplace=True)
df=df.set_index('Epoch')
Epoch Ta
1636757999 23.427413
1636758004 22.415409
1636758009 22.560560
1636758014 22.236397
1636758019 22.085619
...
1636842974 21.342487
1636842979 20.863043
1636842984 22.582027
1636842989 20.756926
1636842994 21.255536
[17000 rows x 1 columns]
Me expected output is: 1.- Column with the date convert from Epochtime to Datetime (Column 'dates' in the return value of function).我预计 output 是: 1.- 日期从 Epochtime 转换为 Datetime 的列(函数返回值中的列“日期”)。 (example: 2021-11-12 22:59:59)
(例:2021-11-12 22:59:59)
Heres the code that im using:这是我使用的代码:
def obt_dat(path):
df2=df
df2['date'] = df.index.values
df2['date'] = pd.to_datetime(df2['date'],unit='s')
df2['hour']=''
df2['fecha']=''
df2['dates']=''
start = time.time()
for i in range(0,len(df2)):
df2['hour'].iloc[i]=df2['date'].iloc[i].hour
df2['fecha'].iloc[i]=str(df2['date'].iloc[i].year)+str(df2['date'].iloc[i].month)+str(df2['date'].iloc[i].day)
df2['dates'] = df2['fecha'].astype(str) + df2['hour'].astype(str)
end = time.time()
T=round((end-start)/60,2)
print('Tiempo de Ejecución Total: ' + str(T) + ' minutos')
return(df2)
obt_dat(df)
After that im using .groupby
to get the mean values from specific hours.之后我使用
.groupby
从特定时间获取平均值。 But, the problem is that the code is taking to long to execute.但是,问题是代码需要很长时间才能执行。 Can anyone have an idea to short the elapsed time of the function
obt_dat()
谁能想到缩短 function
obt_dat()
的运行时间
Use plain Python - lists or dicts instead of dataframes.使用普通的 Python - 列表或字典而不是数据帧。
If you really need a dataframe, construct it at the end of CPU-intensive operations.如果您确实需要 dataframe,请在 CPU 密集型操作结束时构建它。
But that's just my assumption - you might want to do some benchmarking to see how much time each part of the code really takes.但这只是我的假设——你可能想做一些基准测试来看看代码的每个部分真正需要多少时间。 "Very long" is relative, but I'm pretty sure that your bottleneck are the dataframe operations you do in the
for
loop. “很长”是相对的,但我很确定您的瓶颈是您在
for
循环中执行的 dataframe 操作。
You can use the dt
(date accessors) to eliminate the loops:您可以使用
dt
(日期访问器)来消除循环:
df2 = df.copy()
df2['date'] = df.index.values
df2['date'] = pd.to_datetime(df2['date'], unit='s')
df2['hour'] = df2['date'].dt.hour
df2['fecha'] = df2['date'].dt.strftime('%Y%m%d')
df2['dates'] = df2['date'].dt.strftime('%Y%m%d%H')
Timing with your reproducible example gives:使用您的可重现示例的时间给出:
156 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.