[英]Why is my multiprocessing.Pool apply_async only executed once inside a for loop
I am trying to write a crawler for a web security project, and I'm having strange behaviour with a method using multiprocessing.我正在尝试为 web 安全项目编写爬虫,并且使用多处理的方法出现了奇怪的行为。
What should this method do?这个方法应该怎么做? It iterates over found target web pages, with a list of found query parameters.它使用找到的查询参数列表遍历找到的目标 web 页面。 For each web page, it should apply the method phase1 (my attack logic) to every query parameter associated with that page.对于每个 web 页面,它应该将方法phase1 (我的攻击逻辑)应用于与该页面关联的每个查询参数。
Meaning, if I have http://example.com/sub.php , having page &secret as query parameters, and http://example.com/s2.php , having topsecret as parameter, it should do the following: Meaning, if I have http://example.com/sub.php , having page &secret as query parameters, and http://example.com/s2.php , having topsecret as parameter, it should do the following:
I know if an attack is happening, based on time and output of phase1 .我知道是否发生了攻击,基于时间和 phase1 的output 。
What actually happens实际发生了什么
Only the first attack is executed.只执行第一次攻击。 The following calls to apply_async are ignored.以下对 apply_async 的调用将被忽略。 However, it still cycles through the loop, since it still prints the output from above for loop.但是,它仍然在循环中循环,因为它仍然从上面的 for 循环打印 output。
What is going wrong here?这里出了什么问题? Why is the attack routine not triggered?为什么没有触发攻击例程? I have looked up the docs for multiprocessing, but it doesn't help explaining this phenomenon.我查看了多处理的文档,但这无助于解释这种现象。
Some answers in related problems suggested using terminate and join, but insn't this done implicitely here, since I'm using the with statement?相关问题中的一些答案建议使用终止和加入,但是这不是在这里隐式完成的,因为我使用的是 with 语句?
Also, this question ( Multiprocessing pool 'apply_async' only seems to call function once ) sounds very similar, but is different from my problem.另外,这个问题( 多处理池'apply_async'似乎只调用一次function )听起来很相似,但与我的问题不同。 In contrary to that question, I don't have the problem that only 1 worker executes the code, but that my X workers are only spawned once (instead of Y times).与那个问题相反,我不存在只有 1 个工作人员执行代码的问题,但我的 X 个工作人员只产生了一次(而不是 Y 次)。
What I've tried: putting with..Pool outside of loops, but nothing changed我尝试过的:将 with..Pool 放在循环之外,但没有任何改变
The method in question is the following:有问题的方法如下:
def analyzeParam(siteparams, paysplit, victim2, verbose, depth, file, authcookie):
result = {}
subdir = parseUrl(viclist[0])
for victim, paramlist in siteparams.items():
sub = {}
print("\n{0}[INFO]{1} param{4}|{2} Attacking {3}".format(color.RD, color.END + color.O, color.END, victim, color.END+color.RD))
time.sleep(1.5)
for param in paramlist:
payloads = []
nullbytes = []
print("\n{0}[INFO]{1} param{4}|{2} Using {3}\n".format(color.RD, color.END + color.O, color.END, param, color.END+color.RD))
time.sleep(1.5)
with Pool(processes=processes) as pool:
res = [pool.apply_async(phase1, args=(1,victim,victim2,param,None,"",verbose,depth,l,file,authcookie,"",)) for l in paysplit]
for i in res:
#fetch results
tuples = i.get()
payloads += tuples[0]
nullbytes += tuples[1]
sub[param] = (payloads, nullbytes)
time.sleep(3)
result[victim] = sub
if not os.path.exists(cachedir+subdir):
os.makedirs(cachedir+subdir)
with open(cachedir+subdir+"spider-phase2.json", "w+") as f:
json.dump(result, f, sort_keys=True, indent=4)
return result
Some technical information:一些技术资料:
How do I fix this?我该如何解决? Thanks!谢谢!
Big kudos to jasonharper for finding the issue, The issue was not the code structure above, but the variable paysplit.非常感谢 jasonharper 发现问题,问题不是上面的代码结构,而是变量 paysplit。 which was a generator and went exhausted after the first call.这是一台发电机,在第一次通话后就筋疲力尽了。
Again, thank you for pointing out!再次感谢您的指出!
Bests最好的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.