简体   繁体   中英

Getting 503 error when using JSoup to read an element from a webpage

So I've been using the following code to find a specific element on a page given the specific id in the method. However, it seems that the website returns a 403 when I don't define a user-agent, and I get a 503 error when I do use a user-agent. The website seems to use cloudflare, which from what I've heard is used to prevent DDOS attacks - so I'm a little confused as to why I'm not being able to read the page?

public static String getMargin(final int id) {
        String url = "https://rsbuddy.com/exchange?id=" + id + "&";
        Document document = null;
        try {
            document = Jsoup.connect(url).timeout(30000)
                    .userAgent(
                            "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36")
                    .get();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return document.select("#buy-price").text();

    }

How would I be able to have it so that I can finally read the element from the webpage instead of receiving a 403 forbidden or 503 unavailable error? Thanks.

You need to connect with ignoreHttpErrors set

Jsoup.connect(url).timeout(30000)
    .ignoreHttpErrors(true)
    ...

The page content will be what you see when you connect using your browser. This page contains a small script (looks like it's generated on each request). The script will calculate a value which it sets into the jschl-answer field of the following form before submitting it

<form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get"> 
    <input type="hidden" name="jschl_vc" value="some-generated-value"> 
    <input type="hidden" name="pass" value="some-generated-value"> 
    <input type="hidden" id="jschl-answer" name="jschl_answer"> 
</form> 

The form has to be submitted using the correct values (also, don't forget to get/set cookies).

So the key point will be to calculate the jschl-answer by either finding their algorithm (that's gonna be tough) or by reading the script tag, modifying it to be able to run locally and executing it locally.

All in all it's not gonna be an easy task, but I think it's doable.

Jsoup isn't the best tool here. There's a challenge to solve before accessing the actual page. I'd suggest you to use one of the tools below:

You'll have less headaches...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM