简体   繁体   English

如何改善python脚本中的执行时间?

[英]How do I improve execution time in my script in python?

I'm trying to remove values from files containing 3 dimensional arrays corresponding with the dimensions corresponding to [time][long][lat] . 我正在尝试从包含3维数组的文件中删除值,这些3维数组与[time][long][lat]

I have separate files for each years worth of data. 对于每年的数据,我都有单独的文件。 I have a list of observed data points T_obs for a duration of time start_time_index to end_time_index that I want to compare to the mean value over that time period in the file data. 我有一个持续时间为start_time_indexend_time_index的观察到的数据点T_obs的列表,我想将其与文件数据中该时间段内的平均值进行比较。

The data sets contained in the files are sufficiently large that my code is running very slowly and I want to optimize my execution time. 文件中包含的数据集足够大,以至于我的代码运行非常缓慢,我想优化执行时间。 The code I have currently is below. 我目前拥有的代码如下。 Are there any ways I could significantly save time? 有什么方法可以大大节省时间?

T_obs = [1.5, 3.6, 4.5]
start_time_index = [20, 300, 10]
end_time_index = [40, 328, 200]
long_obs = [45, 54, 180]
lat_obs = [34, 65, 32]
LE = np.zeros(len(T_obs))
t = 1984

for filename in os.listdir("C:\\Directory"):
    if filename.endswith(".nc"):
        print(filename)
        fh = Dataset("C:\\Directory %s"
                     % filename, 'r').variables['matrix']
        for i in range(0, len(long_obs)):
            if year[i] == t and start_time_index[i] > 0:
                LE_t = []
                for x in range(int(start_time_index[i]), int(end_time_index[i])):
                    LE_t = np.append(LE_t,float(fh[x][long_obs[i]+180][lat_obs[i]*-1+90])/10)
                LE[i] = np.mean(LE_t)
        t += 1
        continue
    else:
        continue 

Are you going to do this kind of work many times (with different values), or just once? 您是要多次(使用不同的值)还是一次进行这种工作?

If the former, you can try to put your files data in a database (such as MySQL , which is simple to setup) and create indexes on the start and end time. 如果是前者,则可以尝试将文件数据放入数据库中(例如, MySQL ,它易于设置),并在开始时间和结束时间创建索引。 This would make your reads mighty faster, as you would not need to do a full table scan (which is pretty much what you are doing by reading the whole file). 这将使您的读取速度更快,因为您无需进行全表扫描(这几乎是您通过读取整个文件所做的事情)。

If you are doing it just once, then I just suggest you wait it out. 如果您只做一次,那么我建议您等一下。 There is no trivial way to make I/O (which is your bottleneck) less expensive and it seems like to make your checks / compare the data, you actually need to go through the whole dataset. 没有简单的方法可以降低I / O(这是您的瓶颈)的成本,并且似乎要进行检查/比较数据,实际上您需要遍历整个数据集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM