简体   繁体   English

如何从教科书回购网站获取信息?

[英]How to get scrape information from a textbook buyback website?

I am making a program and one of the parts is to figure out the best buyback price of a textbook.我正在制作一个程序,其中一个部分是找出教科书的最佳回购价格。 I am trying to web scrape the value from " https://bookscouter.com " for example, " https://bookscouter.com/prices.php?isbn=1285428226&searchbutton=Sell " value is 34$.我正在尝试从“ https://bookscouter.com ”中抓取价值,例如,“ https://bookscouter.com/prices.php?isbn=1285428226&searchbutton=Sell ”价值是 34 美元。 The problem is that the website is definitely not static and simple python scraping doesn't really work.问题是该网站绝对不是静态的,简单的 python 抓取并不能真正起作用。 How would I go about this?我该怎么办? Some sort of request?某种要求? I am not a very experienced with web work so any advice would be appreciated.我对网络工作不是很有经验,所以任何建议都将不胜感激。 Best,最好的事物,

This page use Ajax to fetch some additional information.此页面使用 Ajax 来获取一些附加信息。 The source code of https://bookscouter.com/prices.php?isbn=1285428226&searchbutton=Sell shows https://bookscouter.com/prices.php?isbn=1285428226&searchbutton=Sell的源码显示

<script language="javascript" type="text/javascript">
    function fetchresults_cb(search_id, text) {
        replaceContent('price_results', text);
        if(text.match(/INCOMPLETE/i)) {
            currentTime = new Date();
            time = currentTime.getTime();
            delayfunc = "AjaxRetrieve('/ajax_prices.php?type=PREFERRED&isbn=1285428226&search_id="+search_id+"&ts="+time+"', 'fetchresults_cb(\\'"+search_id+"\\', THISREQ.responseText)', 'true');";
            setTimeout(delayfunc, 3000);
        }

</script>

There is a different way to parse this kind of page.有一种不同的方式来解析这种页面。

The first way is re-implement above source code in Python and fetch additional resources like browsers do it during JavaScript execution.第一种方法是在 Python 中重新实现上面的源代码,并像浏览器一样在 JavaScript 执行期间获取额外的资源。 You can analysis full source code of page or use network monitor to identify URL address where required information is available.您可以分析页面的完整源代码或使用 网络监视器来识别所需信息可用的 URL 地址。

The second way is to use Selenium which use browser engine to execute JavaScript and provide full source code with all required information.第二种方法是使用 Selenium,它使用浏览器引擎来执行 JavaScript 并提供包含所有必需信息的完整源代码。

I believe that you have permissions of database owner of bookscouter.com to perform this kind of activity.我相信您拥有 bookscouter.com 的数据库所有者的权限来执行此类活动。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM