錯誤：java.io.IOException：服務器返回 HTTP 響應代碼：URL 403

Question

我是開發新手，我正在使用 IntelliJ IDEA 2022.1（社區版）。 例如我想連接和檢索網頁： www.carrefour.fr我有以下錯誤：

java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.carrefour.fr/
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1919)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
at Main.main(Main.java:111)

Process finished with exit code 0

我在互聯網上搜索了很多並進行了測試，但它並沒有解決問題。 如何糾正問題？

import java.io.IOException;
import java.net.URL;
import java.net.*;
import java.io.*;
import java.nio.charset.Charset;

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class Main {
    public static void main(String[] args) {
        StringBuilder content=new StringBuilder();
        // Use try and catch to avoid the exceptions
        try
        {
            URL url=new URL("https://www.carrefour.fr"); // creating a url object

            // First set the default cookie manager.
            CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));  

            URLConnection urlConnection=url.openConnection(); // creating a urlconnection object

            urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0 Unique/100.7.9656.57");        

            // wrapping the urlconnection in a bufferedreader
            BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), Charset.forName("UTF-8")));
            String line;
            
            // reading from the urlconnection using the bufferedreader
            while((line=bufferedReader.readLine())!=null)
            {
                content.append(line+"\n");
            }
            bufferedReader.close();
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
        System.out.println(content.toString());

    }
}

Answer 1

您的代碼沒有問題，但該網站不希望人們在其上運行爬蟲。

見https://www.carrefour.fr/robots.txt
另見機器人排除標准

錯誤：java.io.IOException：服務器返回 HTTP 響應代碼：URL 403

問題描述

1 個解決方案

解決方案1
0 2022-05-10 09:20:51

錯誤：java.io.IOException：服務器返回 HTTP 響應代碼：URL 403

問題描述

1 個解決方案

解決方案1 0 2022-05-10 09:20:51

解決方案1
0 2022-05-10 09:20:51