如何在 Python 中的线程之间共享数组索引？

Question

我有以下代码：

def task1():
    for url in splitarr[0]:
        print(url) #these are supposed to be scrape_induvidual_page() . print is just for debugging
def task2():
    for url in splitarr[1]:
        print(url)
def task3():
    for url in splitarr[2]:
        print(url)
def task4():
    for url in splitarr[3]:
        print(url)
def task5():
    for url in splitarr[4]:
        print(url)
def task6():
    for url in splitarr[5]:
        print(url)
def task7():
    for url in splitarr[6]:
        print(url)     
def task8():
    for url in splitarr[7]:
        print(url)   

splitarr=np.array_split(urllist, 8)
t1 = threading.Thread(target=task1, name='t1') 
t2 = threading.Thread(target=task2, name='t2')   
t3 = threading.Thread(target=task3, name='t3')
t4 = threading.Thread(target=task4, name='t4') 
t5 = threading.Thread(target=task5, name='t5')
t6 = threading.Thread(target=task6, name='t6')
t7 = threading.Thread(target=task7, name='t7')
t8 = threading.Thread(target=task8, name='t8')

t1.start() 
t2.start()
t3.start() 
t4.start() 
t5.start()
t6.start() 
t7.start()
t8.start() 

t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join() 
t8.join()

它确实具有所需的输出，没有重复或任何东西

https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60fps-eng-flac-webdl-2160p-x264-zmachine-t1041079.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-nhd-x264-nhanc3-t127050.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html

但是，我觉得所有重复的def taskx()的代码有点多余：所以我尝试通过使用单个任务来压缩代码：

x=0
def task1():
    global x
    for url in splitarr[x]:
        print(url)
        x=x+1
t1 = threading.Thread(target=task1, name='t1') 
t2 = threading.Thread(target=task1, name='t2')   
t3 = threading.Thread(target=task1, name='t3')
t4 = threading.Thread(target=task1, name='t4') 
t5 = threading.Thread(target=task1, name='t5')
t6 = threading.Thread(target=task1, name='t6')
t7 = threading.Thread(target=task1, name='t7')
t8 = threading.Thread(target=task1, name='t8')

t1.start() 
t2.start()
t3.start() 
t4.start() 
t5.start()
t6.start() 
t7.start()
t8.start() 

t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join() 
t8.join()

但是，这会产生重复的不想要的输出：

https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html

如何在具有多个线程的程序中正确地使 x 递增？

Answer 1

for url in splitarr[x]:创建用于该序列的迭代器splitarr[x] 稍后增加 x 并不重要 - 迭代器已经构建。 由于您在那里有一个打印，很可能所有线程都会在x仍然为零时抓取x并迭代相同的序列。

一种解决方案是通过threading.Thread的args参数将递增值传递给 task1。 但是线程池更容易。

from multiprocessing.pool import ThreadPool

# generate test array
splitarr = []
for i in range(8):
    splitarr.append([f"url_{i}_{j}" for j in range(4)])

def task(splitarr_column):
    for url in splitarr_column:
        print(url)

with ThreadPool(len(splitarr)) as pool:
    result = pool.map(task, splitarr)

在此示例中， len(splitarr)用于为splitarr每个序列创建一个线程。 然后这些序列中的每一个都被映射到task函数。 由于我们创建了正确数量的线程来处理所有序列，因此它们都同时运行。 当映射完成时， with子句退出并关闭池，终止线程。

Answer 2

编辑：这不是并行工作

这似乎奏效了：

def task1(x):
    for url in splitarr[x]:
        print(url)
        x=x+1

t1 = threading.Thread(target=task1(0), name='t1') 
t2 = threading.Thread(target=task1(1), name='t2')   
t3 = threading.Thread(target=task1(2), name='t3')
t4 = threading.Thread(target=task1(3), name='t4') 
t5 = threading.Thread(target=task1(4), name='t5')
t6 = threading.Thread(target=task1(5), name='t6')
t7 = threading.Thread(target=task1(6), name='t7')
t8 = threading.Thread(target=task1(7), name='t8')

t1.start() 
t2.start()
t3.start() 
t4.start() 
t5.start()
t6.start() 
t7.start()
t8.start() 

t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join() 
t8.join()

输出：

https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60fps-eng-flac-webdl-2160p-x264-zmachine-t1041079.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-nhd-x264-nhanc3-t127050.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html

根据 tdelaney 的回答，这就是我所做的，它更加紧凑并且可以并行工作：

from multiprocessing.pool import ThreadPool

def task(splitarr_column):
    for url in splitarr_column:
        print(url)

with ThreadPool(len(splitarr)) as pool:
    result = pool.map(task, splitarr)

它提供了所需的输出：

https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60fps-eng-flac-webdl-2160p-x264-zmachine-t1041079.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-nhd-x264-nhanc3-t127050.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html

如何在 Python 中的线程之间共享数组索引？

问题描述

2 个解决方案

解决方案1
5 已采纳 2020-09-15 04:39:08

解决方案2
1 2020-09-15 04:44:04

如何在 Python 中的线程之间共享数组索引？

问题描述

2 个解决方案

解决方案1 5 已采纳 2020-09-15 04:39:08

解决方案2 1 2020-09-15 04:44:04

解决方案1
5 已采纳 2020-09-15 04:39:08

解决方案2
1 2020-09-15 04:44:04