简体   繁体   中英

Skip waiting time using node-fetch

I use node-fetch and cheerio to craw data from a comic website . I just use a simple code to display the body html like below:

var fetch = require('node-fetch');
var cheerio = require('cheerio');

var url = 'http://readcomiconline.to';

function getComic() {
    fetch(url)
        .then(res => res.text())
        .then(body => console.log(body));
}

getComic();

The problem is this page use a javascript code that the client need to wait in 5 second before it redirect to the main page, so I cannot crawl anything before the main pages loaded.

How can I skip this time and starting to crawl data from the pages.

Thank you.

Looks like you're going to need more than those 2 modules.

The website you're trying to scrape uses JS to send verification to /cdn-cgi/l/chk_jschl and get cookies. You can either use selenium or reverse the js.

More info here: Python web scraping : 503 Response with specific site (how come?)

You don't need wait 5s, because it will run in browser.

You have form #challenge-form , use cheerio to get url , method and data(value of input) of form, and request it (save cookie).

You can use devtool (chrome, or something like that check form of request in browser).

This is project I try to login facebook : index.js , it may be help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM