简体   繁体   English

Python-提高读取/修改/写入速度?

[英]Python - Increase read/modify/write speed?

I have a geoJSON file, which contains a breakdown of a certain geographical area into ca. 我有一个geoJSON文件,其中包含将某个地理区域分解为ca的信息。 7000 cells. 7000个细胞。 I'd like to a) open this geoJSON b) modify some data (see code bellow) and c) write this modified geoJSON to the disk. 我想a)打开此geoJSON b)修改一些数据(请参见下面的代码),以及c)将修改后的geoJSON写入磁盘。 Now, my problem is, that since there's a lot of cells, this takes almost a minute. 现在,我的问题是,由于单元格很多,这需要近一分钟的时间。 Do you see any way to improve the speed of this function? 您是否看到提高此功能速度的任何方法? Thank you! 谢谢!

def writeGeoJSON(param1, param2, inputdf):
    with open('ingeo.geojson') as f:
        data = json.load(f)
    for feature in data['features']: 
        currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2)]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.Opacity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})
    end = time.time()
    with open('outgeo.geojson', 'w') as outfile:
        json.dump(data, outfile)

There is a serial code optimization possible in your code. 您的代码中可能有一个串行代码优化 You have the line: 您有一行:

currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2

Notice that the last two checks can be put outside the for loop. 注意,可以将最后两个检查放在for循环之外。 It is a redundant check which takes up many CPU clock cycles for each iteration in the for loop!!! 这是一个冗余检查,在for循环中,每次迭代都占用许多CPU时钟周期!!!! You can modify the same as: 您可以修改为:

paramMatch=inputdf['param1']==param1 & inputdf['param2']==param2
for feature in data['features']: 
    currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & paramMatch]

That must make your program run much faster! 那必须使您的程序运行快得多!

That said, if you need better execution times(most probably not necessary), try using the multiprocessing module to parallelize the processing part of the code. 也就是说,如果您需要更好的执行时间(最有可能不是必须的),请尝试使用multiprocessing模块来并行化代码的处理部分。 You can try to split the work load in the for loop. 您可以尝试在for循环中拆分工作负载。

Try using apply_async or map_async to a block of iterations to speed things up! 尝试将apply_asyncmap_async用于迭代块以加快处理速度!

[In addition to @varun optimization, and including a @romain-aga suggestion.] [除了@varun优化外,还包括@ romain-aga建议。]

Add this at the beginning of the function: 在函数的开头添加以下内容:

zero_style = {"opacity": 0}

And change the conditional to become: 并将条件更改为:

if (len(currentfeature) > 0):
    feature['properties']['style'] = {"opacity": currentfeature.Opacity.item()}
else:
    feature['properties']['style'] = zero_style

I have the impression that knowing more about inputdf type would lead to better optimization (maybe direct if currentfeature: is enough? maybe is better?) 我的印象是,更多地了解inputdf类型将导致更好的优化( if currentfeature:足够,也许更好?


Assuming CPython, I expect this to be better (better ask for forgiveness than for permission): 假设使用CPython,我希望它会更好(更好的是请求宽恕而不是允许):

try:
    value = {"opacity": currentfeature.Opacity.item()}
except NotSureWhatExceptionMaybeAttributeError:
    value = zero_style
feature['properties']['style'] = value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM