Pool.imap_unordered 从可迭代对象中跳过值

Question

I am trying to run the following code to parallalize a function that crops geotifs.我正在尝试运行以下代码来并行化裁剪 geotifs 的 function。 Geotifs are named as <location>__img_news1a_iw_rt30_<hex_code>_g_gpf_vv.tif . Geotif 被命名为<location>__img_news1a_iw_rt30_<hex_code>_g_gpf_vv.tif 。 The code works perfectly fine but it skips a particular set of geotif from even reading from the vv_tif iterable.该代码工作得非常好，但它甚至从 vv_tif 可迭代的读取中跳过了一组特定的 geotif。 In particular, out of locationA_img_news1a_iw_rt30_20170314t115609_g_gpf_vv.tif , locationA_img_news1a_iw_rt30_20170606t115613_g_gpf_vv.tif and locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif it skips locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif every single time from reading when I read these files along with other location geotifs. In particular, out of locationA_img_news1a_iw_rt30_20170314t115609_g_gpf_vv.tif , locationA_img_news1a_iw_rt30_20170606t115613_g_gpf_vv.tif and locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif it skips locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif every single time from reading when I read these files along with other location geotifs. However, it reads this file if I create an iterable from only these three geotif files.但是，如果我只从这三个 geotif 文件创建一个可迭代对象，它就会读取这个文件。 I have tried changing chunksize but it doesn't help.我曾尝试更改块大小，但没有帮助。 Am I missing something here?我在这里错过了什么吗？

from multiprocessing import Pool, cpu_count
try:
    pool = Pool(cpu_count())
    pool.imap_unordered(tile_geotif, vv_tif, chunksize=11)
finally:
    pool.close()

EDIT: I have 55 files in total and it only drops locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif file every single time.编辑：我总共有 55 个文件，它每次只删除locationA_img_news1a_iw_rt30_20170712t115615_g_gpf_vv.tif文件。

Answer 1

This is too much to show in comments, putting here in answer.这太多了，无法在评论中显示，在这里回答。

It seems to me that the map functions work in my toy examples below.在我看来，map 函数在我下面的玩具示例中起作用。 I think you have error in your input data to cause the corrupted output.我认为您的输入数据有错误导致 output 损坏。 Either that, or you found a bug.要么，要么你发现了一个错误。 If so, do try to create a reproducible example.如果是这样，请尝试创建一个可重现的示例。

from multiprocessing import Pool

vv_tif = list(range(10))
def square(x):
    return x**x

with Pool(5) as p:
    print(p.map(square, vv_tif))

with Pool(5) as p:
    print(list(p.imap(square, vv_tif)))

with Pool(5) as p:
    print(list(p.imap_unordered(square, vv_tif)))

with Pool(5) as p:
    print(list(p.imap_unordered(square, vv_tif, chunksize=11)))

Output: Output：

[1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]
[1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]
[1, 1, 256, 3125, 46656, 823543, 16777216, 4, 27, 387420489]
[1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489]

Usually all 4 lines were the same.通常所有 4 行都是相同的。 I ran it a few times till I got a different ordering on one.我跑了几次，直到我得到一个不同的订单。 It looks to me that it works.在我看来它有效。

Note that his demonstrates that the various map functions are not mutating underlying data.请注意，他证明了各种map函数不会改变基础数据。

Answer 2

Please notice the difference in results depending on whether the "time.sleep" is in or out.请注意结果的差异取决于“time.sleep”是否进入。

import time
from multiprocessing import Pool

def process(x):
    print(x)

def main():
    pool = Pool(4)
    pool.imap_unordered(process, (1,2,3,4,5))
    pool.close()
    #time.sleep(3)

if __name__ == "__main__":
    main()

Pool.imap_unordered 从可迭代对象中跳过值

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-03-08 02:08:46

解决方案2
0 2021-03-08 02:44:09

Pool.imap_unordered 从可迭代对象中跳过值

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-03-08 02:08:46

解决方案2 0 2021-03-08 02:44:09

解决方案1
0 已采纳 2021-03-08 02:08:46

解决方案2
0 2021-03-08 02:44:09