简体   繁体   中英

Why is my multiprocessing.Pool apply_async only executed once inside a for loop

I am trying to write a crawler for a web security project, and I'm having strange behaviour with a method using multiprocessing.

What should this method do? It iterates over found target web pages, with a list of found query parameters. For each web page, it should apply the method phase1 (my attack logic) to every query parameter associated with that page.

Meaning, if I have http://example.com/sub.php , having page &secret as query parameters, and http://example.com/s2.php , having topsecret as parameter, it should do the following:

I know if an attack is happening, based on time and output of phase1 .

What actually happens

Only the first attack is executed. The following calls to apply_async are ignored. However, it still cycles through the loop, since it still prints the output from above for loop.

What is going wrong here? Why is the attack routine not triggered? I have looked up the docs for multiprocessing, but it doesn't help explaining this phenomenon.

Some answers in related problems suggested using terminate and join, but insn't this done implicitely here, since I'm using the with statement?

Also, this question ( Multiprocessing pool 'apply_async' only seems to call function once ) sounds very similar, but is different from my problem. In contrary to that question, I don't have the problem that only 1 worker executes the code, but that my X workers are only spawned once (instead of Y times).

What I've tried: putting with..Pool outside of loops, but nothing changed

The method in question is the following:

def analyzeParam(siteparams, paysplit, victim2, verbose, depth, file, authcookie):
    result = {}
    subdir = parseUrl(viclist[0])
    for victim, paramlist in siteparams.items():
        sub = {}
        print("\n{0}[INFO]{1} param{4}|{2} Attacking {3}".format(color.RD, color.END + color.O, color.END, victim, color.END+color.RD))
        time.sleep(1.5)
        for param in paramlist:
            payloads = []
            nullbytes = []
            print("\n{0}[INFO]{1} param{4}|{2} Using {3}\n".format(color.RD, color.END + color.O, color.END, param, color.END+color.RD))
            time.sleep(1.5)
            with Pool(processes=processes) as pool:
                res = [pool.apply_async(phase1, args=(1,victim,victim2,param,None,"",verbose,depth,l,file,authcookie,"",)) for l in paysplit]
                for i in res:
                    #fetch results
                    tuples = i.get()
                    payloads += tuples[0]
                    nullbytes += tuples[1]
            sub[param] = (payloads, nullbytes)
            time.sleep(3)
        result[victim] = sub
    if not os.path.exists(cachedir+subdir):
        os.makedirs(cachedir+subdir)
    with open(cachedir+subdir+"spider-phase2.json", "w+") as f:
        json.dump(result, f, sort_keys=True, indent=4)
    return result

Some technical information:

  • Python version: 3.8.5
  • I doubt that the bug lies in phase1 , since when called with Pool outside of a loop, but multiple times, it acts as intended. If you want to look it up, the source code is here: https://github.com/VainlyStrain/Vailyn

How do I fix this? Thanks!

Big kudos to jasonharper for finding the issue, The issue was not the code structure above, but the variable paysplit. which was a generator and went exhausted after the first call.

Again, thank you for pointing out!

Bests

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM