在网页中抓取一些Javascript代码

Question

The page I am trying to crawl has includes javascript code. 我尝试抓取的页面包含javascript代码。 (Possibly using AJAX?) When I crawl the page based on the html code, it can't get the javascript part. （可能使用AJAX吗？）当我基于html代码抓取页面时，无法获取javascript部分。 How can I do that? 我怎样才能做到这一点？

I think I need some libraries in python which can crawl the javascript code including html codes. 我想我需要python中的一些库来抓取包括html代码在内的javascript代码。

Please give me some advice. 请给我一些建议。

Below is the page link: view-source: http://www.bobaedream.co.kr/mycar/popup/mycarChart_4.php?zone=C&cno=652691&tbl=cyber 下面是页面链接：视图源： http : //www.bobaedream.co.kr/mycar/popup/mycarChart_4.php?zone= C&cno= 652691&tbl=cyber

Answer 1

I recommend two ways. 我推荐两种方法。

First, request ajax url directly and parse HTML. 首先，直接请求ajax网址并解析HTML。

import requests
url = "http://www.bobaedream.co.kr/mycar/proc/mycar_regist_option.php"
data = {'param': 'ALL'}
response = requests.post(url, data=data)
# parse
...

Second, use web driver , like geckodriver, phantomjs and so on, using selenium library. 二，使用web driver ，像geckodriver，phantomjs等，使用selenium库。

That library make virtual browser, run javascript and then render the DOM made by javascript. 该库创建虚拟浏览器，运行javascript，然后呈现javascript生成的DOM。

This is public documents about selenium 这是关于selenium公开文件

在网页中抓取一些Javascript代码

问题描述

1 个解决方案

解决方案1
0 2017-01-03 08:53:20

在网页中抓取一些Javascript代码

问题描述

1 个解决方案

解决方案1 0 2017-01-03 08:53:20

解决方案1
0 2017-01-03 08:53:20