简体   繁体   中英

Using thread causes “python.exe has stopped working”

Recently I tried to add thread to my scraper so that it can have higher efficiency while scraping.

But somehow it randomly causes the python.exe to "has stopped working" with no further information given hence I have no idea how to debug it.

Here is some relevant code:

  1. Where the threads are initiated:

     def run(self): """ create the threads and run the scraper :return: """ self.__load_resource() self.__prepare_threads_args() # each thread is allocated a different set of links to scrape from, these should be no collision. for item in self.threads_args: try: t = threading.Thread(target=self.urllib_method, args=(item,)) # use the following expression to use the selenium scraper # t = threading.Thread(target=self.__scrape_site, args=(item,)) self.threads.append(t) t.start() except Exception as ex: print ex 
  2. What the Scraper is like:

     def urllib_method(self, thread_args): """ :param thread_args: arguments containing the files to scrape and the proxy to use :return: """ site_scraper = SiteScraper() for file in thread_args["files"]: current_folder_path = self.__prepare_output_folder(file["name"]) articles_without_comments_file = os.path.join(current_folder_path, "articles_without_comments") articles_without_comments_links = get_links_from_file(articles_without_comments_file) if isfile(articles_without_comments_file) else [] articles_scraped_file = os.path.join(current_folder_path, "articles_scraped") articles_scraped_links = get_links_from_file(articles_without_comments_file) if isfile(articles_without_comments_file) else [] links = get_links_from_file(file["path"]) for link in links: article_id = extract_article_id(link) if isfile(join(current_folder_path, article_id)): print "skip: ", link if link not in articles_scraped_links: append_text_to_file(articles_scraped_file, link) continue if link in articles_without_comments_links: continue comments = site_scraper.call_comments_endpoint(article_id, thread_args["proxy"]) if comments != "Pro article" and comments != "Crash" and comments != "No Comments" and comments is not None: print article_id, comments[0:14] write_text_to_file(os.path.join(current_folder_path, article_id), comments) sleep(1) append_text_to_file(articles_scraped_file, link) elif comments == "No Comments": print "article without comments: ", article_id if link not in articles_without_comments_links: append_text_to_file(articles_without_comments_file, link) sleep(1) 

I have tried to run the script on both Windows 10 and 8.1, the issue exists on both of them.

Also, the more data it scraped, the more frequent it happens. And the more threads used, the more frequent it happens.

Threads in Python pre 3.2 are very unsafe to use, due to the diabolical Global Interpreter Lock.

The preferred way to utilize multiple cores and processes in python is via the multiprocessing package.

https://docs.python.org/2/library/multiprocessing.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM