簡體   English   中英

使用python進行動態網頁抓取

[英]Dynamic web page scraping using python

我正在運行下面的代碼,但是我得到一個empty list 您能幫我找出問題嗎?

執行: xvfb-run python dynamic_scrapy.py

import sys
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage
import bs4 as bs


class Client(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def on_page_load(self):
        self.app.quit()

url = "https://pythonprogramming.net/parsememcparseface/"
client_response = Client(url)
source = client_response.mainFrame().toHtml()
soup = bs.BeautifulSoup(source, 'lxml')
print(soup)
js_test = soup.find_all('p', class_='jstest')
print(js_test)

您需要將QString轉換為string以將其傳遞到BeautifulSoup 您可以執行以下操作:

import sys
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl
from PyQt4.QtWebKit import QWebPage
import bs4 as bs

class Client(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.on_page_load)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()
    def on_page_load(self):
        self.app.quit()

url = "https://pythonprogramming.net/parsememcparseface/"
client_response = Client(url)

source = client_response.mainFrame().toHtml()
source_utf = unicode(source.toUtf8(), encoding="UTF-8") # Added
soup = bs.BeautifulSoup(source_utf, 'lxml')
js_test = soup.find_all('p', class_='jstest')
print(js_test)

這將導致:

[<p class="jstest" id="yesnojs">Look at you shinin!</p>]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM