简体   繁体   English

多线程未实现性能差异 Python

[英]Multithreading not achieving performance difference Python

Below is a program that makes multiple get requests and writes the response images to my directory.下面是一个发出多个 get 请求并将响应图像写入我的目录的程序。 These get requests are meant to be in separate threads, and thus be quicker than w/o threads but I'm not seeing the performance difference.这些获取请求意味着在单独的线程中,因此比没有线程更快,但我没有看到性能差异。

Printing active_count() shows there are 9 threads created.打印 active_count() 显示创建了 9 个线程。 However, the performance time still takes around 40 seconds whether or not I use threading.但是,无论我是否使用线程,性能时间仍然需要大约 40 秒。

Below is me using threading.下面是我使用线程。

from threading import active_count
import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1516117172878-fd2c41f4a759',
    'https://images.unsplash.com/photo-1532009324734-20a7a5813719',
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
    'https://images.unsplash.com/photo-1549692520-acc6669e2f0c'
]

t1 = time.perf_counter()


def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)
    print(active_count())


t2 = time.perf_counter()

print(f'Finished in {t2-t1} seconds')

Below is without threading下面是没有穿线的

def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


for img_url in img_urls:
    download_image(img_url)

Could someone explain why this is happening?有人可以解释为什么会这样吗? Thanks谢谢

I can see some performance improvement when using multiprocessing package.使用多处理 package 时,我可以看到一些性能改进。

import multiprocessing
from multiprocessing import Pool


def download_image(img_url: str) -> None:
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


if __name__ == '__main__':
    t1 = time.perf_counter()

    with Pool(processes=multiprocessing.cpu_count() - 1 or 1) as pool:
        pool.map(download_image, img_urls)

    t2 = time.perf_counter()

    print(f'Finished in {t2 - t1} seconds')

This is the result i got with your piece of code, with start and end time next to the download.这是我用你的代码得到的结果,下载旁边有开始和结束时间。 The overall time is around the same (on my "normal.network", not the slow one i talked in my comment)总体时间大致相同(在我的“normal.network”上,而不是我在评论中谈到的慢速)

The reason is that multiple thread doesn't increase I/O or bandwith, the limitation could also be the website itself.原因是多线程不会增加 I/O 或带宽,限制也可能是网站本身。 This looks like the issue is not from your code.这看起来问题不是来自您的代码。

EDIT (misleading statement): as mentionned by MisterMiyagi in the comment below (read his comment, he explain why), it should increase I/O, that's the reason i get 10s increase on a slow.network (limited connection on my work lab).编辑(误导性陈述):正如MisterMiyagi在下面的评论中提到的(阅读他的评论,他解释了原因),它应该增加 I/O,这就是我在 slow.network 上增加 10s 的原因(我的工作实验室的连接有限). This doesn't increase the I/O or bandwith in that specific case (with full bandwith on my "normal" connection), and this may be from a lot of source, but in my opinion, not the code itself.这不会增加特定情况下的 I/O 或带宽(在我的“正常”连接上使用全带宽),这可能来自很多来源,但在我看来,不是代码本身。

I also tried with max_workers=5, the same overall time appears.我也尝试过使用 max_workers=5,出现相同的总时间。

    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 1.0464828 - 1.7136098
    photo-1532009324734-20a7a5813719.jpg was downloaded... 1.7140197 - 5.6327612
    photo-1524429656589-6633a470097c.jpg was downloaded... 5.6339666 - 8.3146478
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 8.3160157 - 10.474087
    photo-1564135624576-c5c88640f235.jpg was downloaded... 10.4749598 - 11.2431941
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 11.2436369 - 15.6939695
    photo-1522364723953-452d3431c267.jpg was downloaded... 15.6954112 - 18.3257819
    photo-1513938709626-033611b8cc03.jpg was downloaded... 18.3269668 - 21.0607191
    photo-1507143550189-fed454f93097.jpg was downloaded... 21.0621265 - 22.2371699
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 22.2375931 - 26.4375676
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 26.4393404 - 28.3477933
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 28.348679 - 30.4626719
    photo-1516972810927-80185027ca84.jpg was downloaded... 30.4636931 - 32.2621345
    photo-1550439062-609e1531270e.jpg was downloaded... 32.2628976 - 34.7331719
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 34.7341393 - 35.5910094
    Finished in 34.545366900000005 seconds
    21
    photo-1516117172878-fd2c41f4a759.jpg was downloaded... 35.5960486 - 46.1692758
    photo-1564135624576-c5c88640f235.jpg was downloaded... 35.6110777 - 47.3780254
    photo-1507143550189-fed454f93097.jpg was downloaded... 35.6265503 - 47.4433963
    photo-1549692520-acc6669e2f0c.jpg was downloaded... 35.6692061 - 49.7097683
    photo-1516972810927-80185027ca84.jpg was downloaded... 35.6420564 - 57.2326763
    photo-1504198453319-5ce911bafcde.jpg was downloaded... 35.6340008 - 61.4597509
    photo-1550439062-609e1531270e.jpg was downloaded... 35.6637577 - 62.0488296
    photo-1530224264768-7ff8c1789d79.jpg was downloaded... 35.6072146 - 63.4139648
    photo-1513938709626-033611b8cc03.jpg was downloaded... 35.6223106 - 63.8149815
    photo-1524429656589-6633a470097c.jpg was downloaded... 35.6032493 - 63.8284464
    photo-1530122037265-a5f1f91d3b99.jpg was downloaded... 35.6352735 - 65.0513042
    photo-1522364723953-452d3431c267.jpg was downloaded... 35.6182243 - 65.5005548
    photo-1532009324734-20a7a5813719.jpg was downloaded... 35.5994888 - 66.2930857
    photo-1541698444083-023c97d3f4b6.jpg was downloaded... 35.6144996 - 67.8115219
    photo-1493976040374-85c8e12f0c0e.jpg was downloaded... 35.6301133 - 68.5357319
    Finished in 32.946069800000004 seconds

EDIT 2 (more testing): I tried with one of my webserver (Same code, just different image list), and I got an overall decrease of 60-70% of downloading time.编辑 2 (更多测试):我尝试使用我的一个网络服务器(相同的代码,只是不同的图像列表),我的下载时间总体减少了 60-70%。 Work best with limited workers in that case.在这种情况下,与有限的工人一起工作效果最好。 The problem come from the website, not your code.问题来自网站,而不是您的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM