简体   繁体   English

具有 2 个列表作为变量的 concurrent.futures 多线程

[英]concurrent.futures multithreading with 2 lists as variables

So I would like to multi-thread the following working piece of code with concurrent futures but nothing I've tried so far seems to work.所以我想用并发期货对以下工作代码进行多线程处理,但到目前为止我尝试过的任何东西似乎都不起作用。

def download(song_filename_list, song_link_list):

    with requests.Session() as s:
    
        login_request = s.post(login_url, data= payload, headers= headers)

        for x in range(len(song_filename_list)):

            download_request = s.get(song_link_list[x], headers= download_headers, stream=True)

            if download_request.status_code == 200:
                print(f"Downloading {x+1} out of {len(song_filename_list)}!\n")
                pass
            else:
                print(f"\nStatus Code: {download_request.status_code}!\n")
                sys.exit()

            
            with open (song_filename_list[x], "wb") as file:
                file.write(download_request.content)

The 2 main variables are the song_filename_list and the song_link_list .两个主要变量是song_filename_listsong_link_list

The first list has names of each file and the second has all their respective download links.第一个列表包含每个文件的名称,第二个列表包含它们各自的下载链接。
So the name and link of each file are located at the same position.所以每个文件的名称和链接都位于相同的位置。
For example: name_of_file1 = song_filename_list[0] and link_of_file1 = song_link_list[0]例如: name_of_file1 = song_filename_list[0]link_of_file1 = song_link_list[0]


This is the most recent attempt at multi-threading:这是多线程的最新尝试:

def download(song_filename_list, song_link_list):

    with requests.Session() as s:
    
        login_request = s.post(login_url, data= payload, headers= headers)

        x = []
        for i in range(len(song_filename_list)):
            x.append(i)


        with concurrent.futures.ThreadPoolExecutor() as executor:
            executor.submit(get_file, x)


def get_file(x):
    
    download_request = s.get(song_link_list[x], headers= download_headers, stream=True)

    if download_request.status_code == 200:
        print(f"Downloading {x+1} out of {len(song_filename_list)}!\n")
        pass
    else:
        print(f"\nStatus Code: {download_request.status_code}!\n")
        sys.exit()

        
    with open (song_filename_list[x], "wb") as file:
        file.write(download_request.content)

Could someone explain to me what am I doing wrong?有人可以向我解释我做错了什么吗?
Cause nothing happens after the get_file function call.因为在get_file函数调用后什么也没有发生。
It skips all the code and exits without any errors, so where is my logic wrong?它跳过所有代码并退出而没有任何错误,那么我的逻辑哪里错了?


EDIT 1:编辑 1:

After adding prints to:添加打印后:

print(song_filename_list, song_link_list)
        with concurrent.futures.ThreadPoolExecutor() as executor:
            print("Before executor.map")
            executor.map(get_file, zip(song_filename_list, song_link_list))
            print("After executor.map")
            print(song_filename_list, song_link_list)

And to the start and end get_file and its file.write .以及开始和结束get_file及其file.write

The output is as follows:输出如下:


Succesfully logged in!

["songs names"] ["songs links"]    <- These are correct.
Before executor.map
After executor.map
["songs names"] ["songs links"]    <- These are correct.

Exiting.

In other words values are correct but it skips the get_file in the executor.map .换句话说,值是正确的,但它跳过了executor.mapget_file


EDIT 2:编辑2:

Here are the values used.以下是使用的值。

  • song_filename_list = ['100049 Himeringo - Yotsuya-san ni Yoroshiku.osz', '1001507 ZUTOMAYO - Kan Saete Kuyashiiwa.osz']

  • song_link_list = ['https://osu.ppy.sh/beatmapsets/100049/download', 'https://osu.ppy.sh/beatmapsets/1001507/download']


EDIT 3:编辑 3:

After some tinkering around it would seem that this works.经过一些修补后,这似乎有效。

for i in range(len(song_filename_list)):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.submit(get_file, song_filename_list, song_link_list, i, s)
def get_file(song_filename_list, song_link_list, i, s):
    
    download_request = s.get(song_link_list[i], headers= download_headers, stream=True)

    if download_request.status_code == 200:
        print("Downloading...")
        pass
    else:
        print(f"\nStatus Code: {download_request.status_code}!\n")
        sys.exit()
    
    with open (song_filename_list[i], "wb") as file:
        file.write(download_request.content)

In your download() function you submit the whole array while you should submit each items:在您的download()函数中,您提交整个数组,而您应该提交每个项目:

def download(song_filename_list, song_link_list):

    with requests.Session() as s:
    
        login_request = s.post(login_url, data=payload, headers= headers)

        for i in range(len(song_filename_list)):
            with concurrent.futures.ThreadPoolExecutor() as executor:
                executor.submit(get_file, i)

You can simplify this with the executor .map() method:您可以使用 executor .map()方法简化此操作:

def download(song_filename_list, song_link_list):
  with requests.Session() as session:
    session.post(login_url, data=payload, headers=headers)

  with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(get_file, song_filename_list, song_link_list)

Where the get_file function is: get_file函数在哪里:

def get_file(song_name, song_link):
  with requests.Session() as session:
    download_request = session.get(song_link, headers=download_headers, stream=True)

  if download_request.status_code == 200:
    print(f"Downloaded {song_name}")
  else:
    print(f"\nStatus Code: {download_request.status_code}!\n")
  
  with open(song_name, "wb") as file:
    file.write(download_request.content)

This avoid sharing state between threads, which avoids potential data races.这避免了线程之间共享状态,从而避免了潜在的数据竞争。

If you need to monitor how much songs have been downloaded, you can use tqdm which has a thread_map iterator wrapper that does exactly this.如果您需要监控下载了多少歌曲,您可以使用tqdm ,它有一个thread_map迭代器包装器,可以做到这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM