简体   繁体   English

如何从HTML中提取JavaScript

[英]How do I extract javascript from within html

I am creating a web scraping programme written in javascript, using request and cheerio. 我正在使用request和cheerio创建一个用javascript编写的网络抓取程序。 The webpage I'm trying to extract contains javascript within the html. 我尝试提取的网页在html中包含javascript。 It is the javascript that I'm interested in, however can't find a way to access it. 这是我感兴趣的javascript,但是找不到访问它的方法。 Is there a way to extract the javascript, using cheerio? 有没有一种使用cheerio提取javascript的方法?

Many thanks for any suggestions, I've just started with web scraping. 非常感谢您的任何建议,我刚刚开始进行网络抓取。

My code is: 我的代码是:

var request = require('request');
var cheerio = require('cheerio');

var credentials = {
    username: 'username',
    password: 'password'
};

request.post({
    uri: 'http://webpage',
    headers: { 'content-type': 'application/x-www-form-urlencoded' },
    body: require('querystring').stringify(credentials)
}, function(err, res, body){
if(err) {
    callback.call(null, new Error('Login failed'));
    return;
}

request('http://webpage', function(err, res, body)
{
    if(err) {
        callback.call(null, new
            Error('Request failed'));
        return;
    }

    var $ = cheerio.load(body);
    var text = $('#element').text();
    console.log($.html());

}); 

});

If you're looking for the javascript inside the webpage, you can use cheerio to collect all <script> tags from the html and then get the content from them. 如果要在网页中查找javascript,则可以使用cheerio从html收集所有<script>标记,然后从中获取内容。

var scripts = [];

request('http://webpage', function(err, res, body)
{
  if(err) {
    callback.call(null, new Error('Request failed'));
    return;
  }

  var $ = cheerio.load(body);
  $('script').each(function(i, element) {
    scripts[i] = $(element).text();
  }   
});

You'll now have an array with all available javascript in the HTML. 现在,您将拥有一个数组,其中包含HTML中所有可用的javascript。 Now if they are imported javascript, then you won't get any content. 现在,如果它们是导入的javascript,那么您将不会获得任何内容。 You can search if the element has a src url. 您可以搜索该元素是否具有src url。

...

$('script').each(function(i, element) {
  if ($(element).attr('src') === undefined) {
    scripts[i] = $(element).text();
  }
  else {
    // Collect or ignore this.
  }
}

...

I haven't tested this, but it should work based on cheerio's documentation. 我尚未对此进行测试,但是它应该根据cheerio的文档进行工作。

https://github.com/cheeriojs/cheerio https://github.com/cheeriojs/cheerio

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在html中提取特定json的内容? - How do I extract the contents of a specific json within the html? 如何从JavaScript中包含HTML的字符串中提取background =值? - How do I extract a background= value from a string containing HTML in JavaScript? 如何从命令行在html内运行javascript并包含参数? - How do I run a javascript within html from the command line and include arguments? 如何通过JavaScript / cheerio从以下html中提取文本? - How to extract text from the following html as I want by JavaScript / cheerio? 如何从此Javascript / HTML代码中提取嵌套的值? - How can I extract the nested value from this Javascript/HTML code? 如何在 Javascript 循环中重新绘制 HTML 元素? - How do I make an HTML element repaint within a Javascript loop? 如何在Javascript函数中编码HTML字符? - How do I encode HTML characters within Javascript functions? 如何调试HTML脚本标记中编写的JavaScript - How do I debug JavaScript written within a script tag in the HTML 如何从HTML集合中提取名称? - How do I extract the name from HTML collection? 如何提取子字符串并将其从JavaScript中的原始字符串中减去? - How do I extract a substring and subtract it from the original string in javascript?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM