简体   繁体   中英

I think I have a memory leak in my python script

This is my code:

from xgoogle.search import GoogleSearch, SearchError
import urllib, urllib2, sys, argparse

global stringArr

stringArr = ["string 1",
             "string 2",
             "string 3",
             "string etc"]

def searchIt(url):
        if(args.verbose>='1'): print "[INFO] Opening URL: "+url
        response = urllib.urlopen(url)
    except urllib2.URLError, e:
        print "[ERROR] "+e.reason
        return False
    except KeyboardInterrupt:
        print "Suspended by user..."
        if(args.verbose=='0'): print "[INFO] String found in URL: "+url
        if(args.verbose>='1'): print "[INFO] No string found in URL: "+url

def checkForStr(html):
    global stringArr
        if any(checkStr in html for checkStr in stringArr):
            return True
            return False
    except KeyboardInterrupt:
        print "Suspended by user..."

def main():
        gs = GoogleSearch(args.keyword)
        gs.results_per_page = 100
        results = []
        while True:
            tmp = gs.get_results()
            i = i+1 # page number
            if not tmp: # no more results (pages) were found
            for r in results: # process results for page
                searchIt(r.url) # check for string
            del results[:] # clean results
        # finished
    except SearchError, e:
        print "[ERROR] Search failed: %s" % e
    except KeyboardInterrupt:
        print "Suspended by user..."

if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
        group = parser.add_argument_group('required arguments')
        group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
        args = parser.parse_args()
    except KeyboardInterrupt:
        print "Suspended by user..."

I've shorten it a little to make it easier to read, but it should still be functional. This code will be part of a bigger script.

I am using this lib: XGOOGLE to scrape the results from google, and then I visit each result to search if the website contains any of the strings from stringArr.

I made the first tests without any problem (I ctrl+C it after less than 10 results), but the first time I let it run, after about 100 urls tested I got this error:

  File "./StringScan.py", line 99, in <module>
  File "./StringScan.py", line 83, in main
  File "./StringScan.py", line 39, in checkForStr
    response = urllib.urlopen(url)
  File "/usr/lib/python2.6/urllib.py", line 86, in urlopen
    return opener.open(url)
  File "/usr/lib/python2.6/urllib.py", line 205, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.6/urllib.py", line 344, in open_http
  File "/usr/lib/python2.6/httplib.py", line 904, in endheaders
  File "/usr/lib/python2.6/httplib.py", line 776, in _send_output
  File "/usr/lib/python2.6/httplib.py", line 735, in send
  File "/usr/lib/python2.6/httplib.py", line 716, in connect
  File "/usr/lib/python2.6/socket.py", line 500, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known

(lines numbers are not the same because I modified the code to post it here)

After that I got back my linux terminal like if the script has finished. But I noticed my pc wasn't working quite well, I checked System Monitor and I saw the process Python using 1.3gb of memory, I had to kill the process to get back my pc to normal.

Is it something in my code that is causing this or why could it happen?

I know my code could have some errors, but right now I am mainly interested in any error that could be causing the memory problem. Any help will be appreciated.

I refactored your code a little to make it easier for me to read. I can't see anything here that would leak memory though

from itertools import count
import urllib, urllib2, sys, argparse
from xgoogle.search import GoogleSearch, SearchError

stringArr = ["string 1",
             "string 2",
             "string 3",
             "string etc"]

def searchIt(url):
            print "[INFO] Opening URL: "+url
        response = urllib.urlopen(url)
    except urllib2.URLError, e:
        print "[ERROR] "+e.reason
        return False
    if checkForStr(response.read()):
            print "[INFO] String found in URL: "+url
            print "[INFO] No string found in URL: "+url

def checkForStr(html):
    return any(checkStr in html for checkStr in stringArr)

def main():
        gs = GoogleSearch(args.keyword)
        gs.results_per_page = 100
        for i in count():
            results = gs.get_results()
            if not results: # no more results (pages) were found
            for r in results: # process results for page
                searchIt(r.url) # check for string
        # finished
    except SearchError, e:
        print "[ERROR] Search failed: %s" % e

if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument('-v', dest='verbose', default='0', help='Verbosity level', choices='012')
        group = parser.add_argument_group('required arguments')
        group.add_argument('-k', dest='keyword', help='Keyword to use on google query', required=True)
        args = parser.parse_args()
    except KeyboardInterrupt:
        print "Suspended by user..."

It could be urllib.urlopen(). See http://bugs.python.org/issue1208304

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM