I use node-fetch and cheerio to craw data from a comic website . I just use a simple code to display the body html like below:
var fetch = require('node-fetch');
var cheerio = require('cheerio');
var url = 'http://readcomiconline.to';
function getComic() {
fetch(url)
.then(res => res.text())
.then(body => console.log(body));
}
getComic();
The problem is this page use a javascript code that the client need to wait in 5 second before it redirect to the main page, so I cannot crawl anything before the main pages loaded.
How can I skip this time and starting to crawl data from the pages.
Thank you.
Looks like you're going to need more than those 2 modules.
The website you're trying to scrape uses JS to send verification to /cdn-cgi/l/chk_jschl
and get cookies. You can either use selenium or reverse the js.
More info here: Python web scraping : 503 Response with specific site (how come?)
You don't need wait 5s, because it will run in browser.
You have form #challenge-form
, use cheerio to get url
, method
and data(value of input) of form, and request it (save cookie).
You can use devtool
(chrome, or something like that check form of request in browser).
This is project I try to login facebook
: index.js , it may be help you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.