Javascript HTML Scraping

Question

I am working with a web page with just plain text - how can I go about 'scraping' the data and then storing it into an array variable. There are no tags (ie no 'div','id' etc.)

The html looks like something like this (ie if you were to view the source code it would just be completely plain text w/o markup)

HTML (view-source:www.blablabla.com/path.txt):

Hello World My Name is John

I would like to store each word into an array along the lines of:

var array = ["Hello", "World", "My", "Name", "is", "John"];

Answer 1

If you're using node's http , you can just read the data directly.

var http = require('http');

http.get('http://www.example.com', function(res) {

}).on('data', function(chunk) {
  // do something with the chunk here, for example print it out
  console.log('body: ' + chunk);
});

An easier way to do this would be via the request package

var request = require('request');

request('http://www.example.com', function(err, resp, body) {
  if(!error && resp.statusCode == 200) {
    // do something with body
    var array = body.split(/(\s+)/);
  }
});

Javascript HTML Scraping

Question

1 answers

solution1
0 ACCPTED 2015-03-11 18:21:49

Javascript HTML Scraping

Question

1 answers

solution1 0 ACCPTED 2015-03-11 18:21:49

solution1
0 ACCPTED 2015-03-11 18:21:49