简体   繁体   English

在Django中使用QT刮取一次,在main()线程中未创建使用QApplication的下一次运行时崩溃

[英]Scraping with QT in Django works once, crashes on next run with QApplication was not created in the main() thread

I am working on creating a Django-based scraper in which a user can enter a search term. 我正在创建一个基于Django的刮板,用户可以在其中输入搜索词。 I use that search term(s) to build a URL and query the site, then returning un-rendered HTML and JS. 我使用该搜索词来构建URL并查询站点,然后返回未渲染的HTML和JS。 I am then able to take the post request, render the page by creating a Qwebpage, passing it the URL and grabbing the frame's rendered HTML. 然后,我可以接受发布请求,通过创建Qwebpage来呈现页面,将其传递给URL并获取框架的呈现HTML。 This works one time in my Django app, and the next POST request crashes the site. 这在我的Django应用程序中有效一次,并且下一个POST请求使网站崩溃。

My first concern is that in this current set up, I am forced to use the xvfb-run wrapper to run. 我首先关心的是,在当前设置中,我被迫使用xvfb-run包装器运行。 Is this going to pose an issue when I deploy - better question is: can I use an xvfb wrapper in production somehow? 部署时这会带来问题吗?更好的问题是:我可以在生产中使用xvfb包装器吗?

With that said I am able to make one post request and this returns the page that I am looking for. 这样,我就可以发出一个帖子请求,这将返回我正在寻找的页面。 If I hit back, and send another request, I see the following errors in console, and this then shuts down the ./manage.py server: 如果回击并发送另一个请求,则会在控制台中看到以下错误,然后关闭./manage.py服务器:

WARNING: QApplication was not created in the main() thread.
QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
Segmentation fault (core dumped)

I will admit that I do not understand what in particular the error is here since I'm rather new to threading concepts. 我承认我不了解这里的错误,因为我是线程概念的新手。 I am uncertain if this error means that it can't reconnect to the xvfb wrapper thats already running, or if indeed it is a threading issue. 我不确定此错误是否意味着它无法重新连接到已经运行的xvfb包装器,或者确实是线程问题。 The code that works once is here. 曾经起作用的代码在这里。 This has been changed slightly since I don't want to show the site I'm actually scraping. 由于我不想显示我实际上正在抓取的网站,因此此更改略有更改。 Also, I am not hunting for data in this sample. 另外,我不是在此样本中寻找数据。 This sample will simply bring rendered HTML to your browser as a test: 此示例将简单地将呈现的HTML引入浏览器作为测试:

import sys
from django.shortcuts import render

# Create your views here.
from django.http import HttpResponse
from django.http import HttpResponseRedirect
from django.views.generic import View
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import * 
from bs4 import BeautifulSoup 

from .forms import QueryForm

def query(request):
        results = google.search("Real Estate")
        context = {'results': results}
        return render(request, 'searchlistings/search.html', context)

class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  

  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()

class SearchView(View):
    form_class = QueryForm
    template_name = 'searchlistings/index.html'

    def get(self, request, *args, **kwargs):
        form = self.form_class()
        return render(request, self.template_name, {'form': form})

    def post(self, request, *args, **kwargs):
        form = self.form_class(request.POST)
        if form.is_valid():
            query = form.cleaned_data['query']
            context = self.isOnSite(query)
            #return context
            #return render(request, 'searchlistings/search.html', {'context': context})
            return HttpResponse(context)

    def isOnSite(self, query):
        url = "http://google.com"
        #This does the magic.Loads everything
        r = Render(url)  
        #result is a QString.
        result = r.frame.toHtml()
        r.app.quit()
        return result;

So my primary questions are this: 所以我的主要问题是:

  1. Is XVFB wrapper appropriate here and can I use this set up in production on a different host. XVFB包装器在这里合适吗?我可以在其他主机上的生产环境中使用此设置吗? Will this work not on my local vagrant box? 这项工作不会在我的本地无家可归的盒子上吗?

  2. The main() thread issue - is this a threading issue or an issue not connecting back to the xvfb server? main()线程问题-这是线程问题还是未连接回xvfb服务器的问题? Can this issue be resolved with Celery or something similar? Celery或类似工具可以解决此问题吗?

  3. Is this an appropriate way to do what I want? 这是做我想要的事情的合适方法吗? I've seen lots of other solutions including scrapyjs, spynner, selenium and so on but they seem either overtly complicated or based on QT. 我看过很多其他解决方案,包括scrapyjs,spynner,Selenium等,但它们似乎过于复杂或基于QT。 A better question is do any of these alternative packages solve the main() thread issue? 一个更好的问题是,这些替代软件包中的任何一个都可以解决main()线程问题吗?

Thanks for your help! 谢谢你的帮助!

OK the solution here was to use twill as documented here http://twill.idyll.org/python-api.html - I am able to run this without the xvfb wrapper and it is much faster than previous methods with much less overhead. 好的,这里的解决方案是使用此处记录的斜纹,方法如下: http ://twill.idyll.org/python-api.html-我能够在没有xvfb包装器的情况下运行它,并且比以前的方法要快得多,开销却少得多。 I can recommend this. 我可以推荐这个。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 该代码在Windows上有效,但在Linux上显示错误“未在主线程中创建QApplication”,如何解决, - this code works on windows but giving error on Linux “QApplication was not created in main thread” ,how to resolve it, 如何避免PyCharm控制台崩溃“警告:在使用matplotlib进行绘图时,未在main()线程中创建QApplication? - How to avoid PyCharm console crash “WARNING: QApplication was not created in the main() thread” when plotting with matplotlib? Django和Matplotlib:信号仅适用于主线程 - Django and Matplotlib: signal only works in main thread 如何安排 function 在 python 的 Qt 的主 UI 线程上运行? - How to schedule a function to run on the main UI thread in Qt for python? ValueError:信号仅适用于主线程 - Django - mod_wsgi - ValueError: signal only works in main thread - Django - mod_wsgi QApplication与主window连接 - QApplication and main window connection Django 服务器在部署后崩溃 - Django server crashes once deployed 信号仅在主线程中有效 - signal only works in main thread QApplication 在 pyQt4 的第二个进程的非主线程中:这段代码是否合法,如果不合法,为什么它有效? - QApplication in a non-main thread of second process with pyQt4: is this code legal, and if not, why does it work? 在使用py2app构建PySide应用程序包时,QApplication没有在主线程中运行 - QApplication is not running in main thread when building PySide app bundle with py2app
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM