How can I use concurrent threads to make a function faster?

Question

I want to build a tool that scan a website for sub domains, I know how to do his, but my function is slower, I looked up in the gobuster usage, and I saw that the gobuster can use many concurrent threads, how can I implement this too?

I have asked Google many times, but I can't see anything about this, can someone give me an example?

gobuster usage: -t Number of concurrent threads (default 10)

My current program:

def subdomaines(url, wordlist):
    checks(url, wordlist) # just checking for valid args
    num_lines = get_line_count(wordlist) # number of lines in a file
    count = 0
    for line in open(wordlist).readlines():
        resp = requests.get(url + line) # resp
        if resp.status_code in (301, 200):
            print(f'Valid - {line}')
        print(f'{count} / {num_lines}')
        count += 1

Note*: gobuster is a very fast tool for searching subdomains in websites

Answer 1

If you're trying to use threading in python you should start from the basics and learn what's available. But here's a simple example taken from https://pymotw.com/2/threading/

import threading

def worker():
    """thread worker function"""
    print 'Worker'
    return

threads = []
for i in range(5):
    t = threading.Thread(target=worker)
    threads.append(t)
    t.start()

To apply this to your task, a simple approach would be to spawn a thread for each request. Something like the code below. Note: if your wordlist is long this might be very expensive. Look into some of the thread pool libraries in python for better thread management that you won't need to explicitly control yourself.

import threading

def subdomains(url, wordlist):
    checks(url, wordlist) # just checking for valid args
    num_lines = get_line_count(wordlist) # number of lines in a file
    count = 0
    threads = []
    for line in open(wordlist).readlines():
        t = threading.Thread(target=checkUrl,args=(url,line))
        threads.append(t)
        t.start()
    for thread in threads:  #wait for all threads to complete
       thread.join()


 def checkUrl(url,line):
    resp = requests.get(url + line) 
    if resp.status_code in (301, 200):
          print(f'Valid - {line}')

To implement the counter you'll need to control shared access between threads to prevent race conditions (two threads accessing the variable at the same time resulting in... problems). A counter object with protected access is provided in the link above:

class Counter(object):
    def __init__(self, start=0):
        self.lock = threading.Lock()
        self.value = start
    def increment(self):
        #Waiting for lock
        self.lock.acquire()
        try:
            #Acquired lock
            self.value = self.value + 1
        finally:
            #Release lock, so other threads can count
            self.lock.release()

#usage:

#in subdomains()...
  counter = Counter()
  for ...
     t = threading.Thread(target=checkUrl,args=(url,line,counter))


#in checkUrl()...
  c.increment()

Final note: I have not compiled or tested any of this code.

Answer 2

Python have threading module.

The simplest way to use a Thread is to instantiate it with a target function and call start() to let it begin working.

import threading

def subdomains(url, wordlist):
    checks(url, wordlist) # just checking for valid args
    num_lines = get_line_count(wordlist) # number of lines in a file
    count = 0
    for line in open(wordlist).readlines():
        resp = requests.get(url + line) # resp
        if resp.status_code in (301, 200):
            print(f'Valid - {line}')
        print(f'{count} / {num_lines}')
        count += 1

threads = []
for i in range(10):
    t = threading.Thread(target=subdomains)
    threads.append(t)
    t.start()

How can I use concurrent threads to make a function faster?

Question

2 answers

solution1
0 ACCPTED 2020-05-08 18:24:32

solution2
-1 2020-05-08 18:03:50

How can I use concurrent threads to make a function faster?

Question

2 answers

solution1 0 ACCPTED 2020-05-08 18:24:32

solution2 -1 2020-05-08 18:03:50

solution1
0 ACCPTED 2020-05-08 18:24:32

solution2
-1 2020-05-08 18:03:50