简体   繁体   中英

Javascript HTML Scraping

I am working with a web page with just plain text - how can I go about 'scraping' the data and then storing it into an array variable. There are no tags (ie no 'div','id' etc.)

The html looks like something like this (ie if you were to view the source code it would just be completely plain text w/o markup)

HTML (view-source:www.blablabla.com/path.txt):

Hello World My Name is John

I would like to store each word into an array along the lines of:

var array = ["Hello", "World", "My", "Name", "is", "John"];

If you're using node's http , you can just read the data directly.

var http = require('http');

http.get('http://www.example.com', function(res) {

}).on('data', function(chunk) {
  // do something with the chunk here, for example print it out
  console.log('body: ' + chunk);
});

An easier way to do this would be via the request package

var request = require('request');

request('http://www.example.com', function(err, resp, body) {
  if(!error && resp.statusCode == 200) {
    // do something with body
    var array = body.split(/(\s+)/);
  }
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM