简体   繁体   English

在网页中抓取一些Javascript代码

[英]Crawl some of Javascript codes in a web-page

The page I am trying to crawl has includes javascript code. 我尝试抓取的页面包含javascript代码。 (Possibly using AJAX?) When I crawl the page based on the html code, it can't get the javascript part. (可能使用AJAX吗?)当我基于html代码抓取页面时,无法获取javascript部分。 How can I do that? 我怎样才能做到这一点?

I think I need some libraries in python which can crawl the javascript code including html codes. 我想我需要python中的一些库来抓取包括html代码在内的javascript代码。

Please give me some advice. 请给我一些建议。

Below is the page link: view-source: http://www.bobaedream.co.kr/mycar/popup/mycarChart_4.php?zone=C&cno=652691&tbl=cyber 下面是页面链接:视图源: http : //www.bobaedream.co.kr/mycar/popup/mycarChart_4.php?zone= C&cno= 652691&tbl=cyber

I recommend two ways. 我推荐两种方法。

First, request ajax url directly and parse HTML. 首先,直接请求ajax网址并解析HTML。

import requests
url = "http://www.bobaedream.co.kr/mycar/proc/mycar_regist_option.php"
data = {'param': 'ALL'}
response = requests.post(url, data=data)
# parse
...

Second, use web driver , like geckodriver, phantomjs and so on, using selenium library. 二,使用web driver ,像geckodriver,phantomjs等,使用selenium库。

That library make virtual browser, run javascript and then render the DOM made by javascript. 该库创建虚拟浏览器,运行javascript,然后呈现javascript生成的DOM。

This is public documents about selenium 是关于selenium公开文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM