使用Qt5抓取动态Javascript

Question

I've ran into a little problem. 我遇到了一个小问题。 I have an online auction site for a video game which uses javascript. 我有一个使用javascript的视频游戏在线拍卖网站。 Exactly, the data I'd like to scrape is in an x-template type of script block. 确实，我要抓取的数据在x模板类型的脚本块中。 I can't get the actual datas but only the script in the source. 我无法获取实际数据，而只能获取源代码中的脚本。

Here's my code: 这是我的代码：

def render(source_url):

    import sys
    from PyQt5.QtWidgets import QApplication
    from PyQt5.QtCore import QUrl
    from PyQt5.QtWebEngineWidgets import QWebEngineView

    class Render(QWebEngineView):
        def __init__(self, url):
            self.html = None
            self.app = QApplication(sys.argv)
            QWebEngineView.__init__(self)
            self.loadFinished.connect(self._loadFinished)
            #self.setHtml(html)
            self.load(QUrl(url))
            self.app.exec_()

        def _loadFinished(self, result):
            # This is an async call, you need to wait for this
            # to be called before closing the app
            self.page().toHtml(self._callable)

        def _callable(self, data):
            self.html = data
            # Data has been stored, it's safe to quit the app
            self.app.quit()

    return Render(source_url).html

url = "https://www.pathofexile.com/trade/search/Bestiary/blkdmmofg"

f = open("html_out.txt", "w", encoding = "utf8")
f.write(str(render(url)))
f.close()

While I manually check for the 1st item's currency-text and try to find it in my file, it can't find it since it's dynamic. 当我手动检查第一项的货币文本并尝试在文件中找到它时，由于它是动态的，因此无法找到它。

Here's how the script's start looks in the html_out.txt file: 以下是html_out.txt文件中脚本开始的外观：

<script type="x-template" id="trade-exchange-item-template">

And after that there comes the data I'm searching for in this form: 之后，出现了我要以这种形式搜索的数据：

<span v-else class="currency-text">{{currencyText(priceInfo.currency)}}</span>

How could I make it work to fully load the site and the script and get the HTML afterwards with the correct data? 如何使它完全加载网站和脚本，然后再使用正确的数据获取HTML？

Thanks in advance! 提前致谢！

Answer 1

Seems like I can't scrape it without an actual client. 好像没有一个真正的客户我就刮不掉它。 It worked fine with Selenium though. 不过，它与Selenium配合良好。

使用Qt5抓取动态Javascript

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-04-05 10:46:16

使用Qt5抓取动态Javascript

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-04-05 10:46:16

解决方案1
0 已采纳 2018-04-05 10:46:16