简体   繁体   中英

How to wait code on the page with JSOUP? I have 503 error with Bot detection

I try parse HTML code from here: https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG+08+%7C+DARK+WATER+%28Field-Tested%29&sort=lh

But the site Opskins.com has protection "Bot detection", when you firstly visit site - you should wait about 5 seconds and then you will be redirect or reload to right page, that I need.

How to wait this 5 seconds or some HTML code on this page?

Document doc = Jsoup.connect("https://opskins.com" + url)
            .header("authority", "opskins.com")
            .header("method", "GET")
            .header("path", url)
            .header("scheme", "https")
            //до сюда с двоеточниями запросы
            .header("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
            .header("accept-encoding", "gzip, deflate, sdch, br")
            .header("accept-language", "ru,en-US;q=0.8,en;q=0.6")
            .header("cache-control", "max-age=0")
            //.header("cookie", "__cfduid=d76231c8cccdbd5303a7d4feeb3f3a11f1466541718; _gat=1; _ga=GA1.2.1292204706.1466541721; request_method=POST; _session_id=5dc49c7814d5087ac51f9d9da20b2680")
            .cookie("steamLogin", "76561198065140894%7C%7C0C35CE73983BCA63E456B6A4831DD772D095AE77")
            .cookie("steamLoginSecure", "76561198065140894%7C%7CCC21BEC8A5E8AD53E9C7086E51BDB8CE407C100A")
            .cookie("steamMachineAuth76561198065140894", "8857F82DB9960F7B66F7842B5F880229A9AF63AB")
            .header("dnt", "1")
            .header("upgrade-insecure-requests", "1")
            .userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")
            //.header("user-agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36")

            .followRedirects(true)
            .ignoreHttpErrors(true)
            //.timeout(5000)
            .get();

With code above I could take HTML code of page Bot detection.

I did some homework for your problem, even if I could not give you one simple solution. Careful observation helped me to figure out somewhat smarter solution. Below is the code which helps you to pass through the bot.

public class BotDetection {

    public static void main(String[] args) throws IOException {
        Document document = Jsoup.connect("https://opskins.com/?loc=shop_search&app=730_2&search_item=SSG%2008%20%7C%20DARK%20WATER%20%28Field-Tested%29&sort=lh")
        .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();

        /**
         * I'm interested in these three elements
         *     <form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get"> 
         *       <input type="hidden" name="jschl_vc" value="53ebdc738d543e1f1fd40f8d4abec414"> 
         *       <input type="hidden" name="pass" value="1467568987.973-p8bu/jSSDf"> 
         *       <input type="hidden" id="jschl-answer" name="jschl_answer"> 
         *      </form> 
         */
        Element elementById = document.getElementById("challenge-form"),jschlchild = elementById.child(0), passChild = elementById.child(1);

        String url = "https://opskins.com".concat(elementById.attr("action")).concat("?")
                .concat(jschlchild.attr("name")).concat("=").concat(jschlchild.attr("value")).concat("&")
                .concat(passChild.attr("name")).concat("=").concat(passChild.attr("value")).concat("&jschl-answer=65");

        document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0").ignoreHttpErrors(true).followRedirects(true).timeout(100000).ignoreContentType(true).get();

        //Bingo You are done.
        System.out.println(document.body());
    }

It worked for me even if I didn't pass jschl-answer=65.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM