簡體   English   中英

使用線程會導致“ python.exe停止工作”

[英]Using thread causes “python.exe has stopped working”

最近,我嘗試向刮板添加線程,以便在刮板時具有更高的效率。

但是不知何故,它隨機地導致python.exe“已停止工作”,沒有給出進一步的信息,因此我不知道如何調試它。

這是一些相關的代碼:

  1. 啟動線程的位置:

     def run(self): """ create the threads and run the scraper :return: """ self.__load_resource() self.__prepare_threads_args() # each thread is allocated a different set of links to scrape from, these should be no collision. for item in self.threads_args: try: t = threading.Thread(target=self.urllib_method, args=(item,)) # use the following expression to use the selenium scraper # t = threading.Thread(target=self.__scrape_site, args=(item,)) self.threads.append(t) t.start() except Exception as ex: print ex 
  2. 刮板是什么樣的:

     def urllib_method(self, thread_args): """ :param thread_args: arguments containing the files to scrape and the proxy to use :return: """ site_scraper = SiteScraper() for file in thread_args["files"]: current_folder_path = self.__prepare_output_folder(file["name"]) articles_without_comments_file = os.path.join(current_folder_path, "articles_without_comments") articles_without_comments_links = get_links_from_file(articles_without_comments_file) if isfile(articles_without_comments_file) else [] articles_scraped_file = os.path.join(current_folder_path, "articles_scraped") articles_scraped_links = get_links_from_file(articles_without_comments_file) if isfile(articles_without_comments_file) else [] links = get_links_from_file(file["path"]) for link in links: article_id = extract_article_id(link) if isfile(join(current_folder_path, article_id)): print "skip: ", link if link not in articles_scraped_links: append_text_to_file(articles_scraped_file, link) continue if link in articles_without_comments_links: continue comments = site_scraper.call_comments_endpoint(article_id, thread_args["proxy"]) if comments != "Pro article" and comments != "Crash" and comments != "No Comments" and comments is not None: print article_id, comments[0:14] write_text_to_file(os.path.join(current_folder_path, article_id), comments) sleep(1) append_text_to_file(articles_scraped_file, link) elif comments == "No Comments": print "article without comments: ", article_id if link not in articles_without_comments_links: append_text_to_file(articles_without_comments_file, link) sleep(1) 

我嘗試在Windows 10和8.1上都運行該腳本,但兩者都存在問題。

此外,它抓取的數據越多,發生的頻率就越高。 並且使用的線程越多,發生的頻率就越高。

由於極高的全局解釋器鎖,Python 3.2之前的版本的線程非常不安全。

在python中利用多個內核和進程的首選方法是通過multiprocessing包。

https://docs.python.org/2/library/multiprocessing.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM