简体   繁体   中英

Why would ThreadPool cause more than one core worth of CPU usage?

I have code which looks like this:

def get_image_stats(fp):
    img = cv2.imread(fp)
    return img.shape[0], img.shape[1], img.shape[0]/img.shape[1]

with ThreadPool(16) as pool:
    res = list(tqdm(pool.imap_unordered(get_image_stats, df.file_path), total=len(df)))

heights, widths, ars = list(zip(*res))

The only library specific part there is cv2.imread which is simply loading an image file into a numpy array, so it's I/O bound.

Why would my CPU usage look like this?

在此处输入图像描述

Notes on that image:

  • Horizontal axis i time in seconds, and vertical axis is cpu % usage ranging from 0% to 100%. The update interval is 1 second.
  • 40s is where I started the script
  • It's not easy to see, but there are 16 cores.

Another note: I did not set n_workers to 16 because I have 16 cores. Just a coincidence.

So why is this using up 75% of 16 cores at once?

Because your thread pool is going to use 1 core per thread if it can. That's what gives maximum parallelism and maximizes throughput.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM