[英]How do I extract javascript from within html
I am creating a web scraping programme written in javascript, using request and cheerio. 我正在使用request和cheerio创建一个用javascript编写的网络抓取程序。 The webpage I'm trying to extract contains javascript within the html.
我尝试提取的网页在html中包含javascript。 It is the javascript that I'm interested in, however can't find a way to access it.
这是我感兴趣的javascript,但是找不到访问它的方法。 Is there a way to extract the javascript, using cheerio?
有没有一种使用cheerio提取javascript的方法?
Many thanks for any suggestions, I've just started with web scraping. 非常感谢您的任何建议,我刚刚开始进行网络抓取。
My code is: 我的代码是:
var request = require('request');
var cheerio = require('cheerio');
var credentials = {
username: 'username',
password: 'password'
};
request.post({
uri: 'http://webpage',
headers: { 'content-type': 'application/x-www-form-urlencoded' },
body: require('querystring').stringify(credentials)
}, function(err, res, body){
if(err) {
callback.call(null, new Error('Login failed'));
return;
}
request('http://webpage', function(err, res, body)
{
if(err) {
callback.call(null, new
Error('Request failed'));
return;
}
var $ = cheerio.load(body);
var text = $('#element').text();
console.log($.html());
});
});
If you're looking for the javascript inside the webpage, you can use cheerio to collect all <script>
tags from the html and then get the content from them. 如果要在网页中查找javascript,则可以使用cheerio从html收集所有
<script>
标记,然后从中获取内容。
var scripts = [];
request('http://webpage', function(err, res, body)
{
if(err) {
callback.call(null, new Error('Request failed'));
return;
}
var $ = cheerio.load(body);
$('script').each(function(i, element) {
scripts[i] = $(element).text();
}
});
You'll now have an array with all available javascript in the HTML. 现在,您将拥有一个数组,其中包含HTML中所有可用的javascript。 Now if they are imported javascript, then you won't get any content.
现在,如果它们是导入的javascript,那么您将不会获得任何内容。 You can search if the element has a
src
url. 您可以搜索该元素是否具有
src
url。
...
$('script').each(function(i, element) {
if ($(element).attr('src') === undefined) {
scripts[i] = $(element).text();
}
else {
// Collect or ignore this.
}
}
...
I haven't tested this, but it should work based on cheerio's documentation. 我尚未对此进行测试,但是它应该根据cheerio的文档进行工作。
https://github.com/cheeriojs/cheerio https://github.com/cheeriojs/cheerio
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.