简体   繁体   中英

Server returned HTTP response code: 523 for URL: http

I want to crawl a webpage, the request type is post,but I get an error: java.io.IOException: Server returned HTTP response code: 523 for URL: http://

public static String readContentFromPost(String urlStr, String content) {
    URL url = null;
    HttpURLConnection con = null;
    StringBuffer sb = new StringBuffer();

    try {
        url = new URL(urlStr);
        con = (HttpURLConnection) url.openConnection();
        con.setDoOutput(true);
        con.setDoInput(true);
        con.setRequestMethod("POST");
        con.setUseCaches(false);
        con.setInstanceFollowRedirects(true);
        con.setRequestProperty("Content-Type", "text/html;charset=utf-8");
        con.connect();

        DataOutputStream out = new DataOutputStream(con.getOutputStream());
        out.writeBytes(content);

        out.flush();
        out.close();

        BufferedReader br = new BufferedReader(new InputStreamReader(
                con.getInputStream()));

        String line;
        while ((line = br.readLine()) != null) {
            sb.append(line);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return sb.toString();
}

The error 523 doesn't have any standard meaning: http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml

So it's a propietary error of the server you're trying to crawl... Try to contact the web administrator to know what it means.

523 doesn't mean Unreachable origin... it only means that in Cloudflare: https://support.cloudflare.com/hc/en-us/articles/200171946-Error-523-Origin-is-unreachable

Try your code with a well-know server like Google or Wikipedia in order to know if it works fine.

To crawl a webPage which solve in javascript maybe can use selenium to simulation the browser to get the data. selenium: http://www.seleniumhq.org

First create a maven Project and add:

<dependency>
        <groupId>org.seleniumhq.selenium</groupId>
        <artifactId>selenium-java</artifactId>
        <version>2.45.0</version>
    </dependency>

Then download a ChromeDriver: http://chromedriver.storage.googleapis.com/index.html?path=2.14/

and put it in the directory of /usr/local/bin

last you can crawl the page:

public static void testSelenium(String url) {
    // System.getProperties().setProperty("webdriver.chrome.driver","/Users/freezhan/IDE/tools/chromedriver");
    WebDriver webDriver = new ChromeDriver();

    webDriver.get(url);
    //WebElement webElement = webDriver.findElement(By.xpath("/html"));

    System.out.println(webDriver.getPageSource());

    webDriver.close();

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM