错误：java.io.IOException：服务器返回 HTTP 响应代码：URL 403

Question

I'm new to development and I'm on IntelliJ IDEA 2022.1 (Community Edition).我是开发新手，我正在使用 IntelliJ IDEA 2022.1（社区版）。 For example, I want to connect and retrieve the web page: www.carrefour.fr I have the following error:例如我想连接和检索网页： www.carrefour.fr我有以下错误：

java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.carrefour.fr/
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1919)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
at Main.main(Main.java:111)

Process finished with exit code 0

I have searched a lot on the internet and tested but it does not solve the problem.我在互联网上搜索了很多并进行了测试，但它并没有解决问题。 How to correct the problem?如何纠正问题？

import java.io.IOException;
import java.net.URL;
import java.net.*;
import java.io.*;
import java.nio.charset.Charset;

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class Main {
    public static void main(String[] args) {
        StringBuilder content=new StringBuilder();
        // Use try and catch to avoid the exceptions
        try
        {
            URL url=new URL("https://www.carrefour.fr"); // creating a url object

            // First set the default cookie manager.
            CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));  

            URLConnection urlConnection=url.openConnection(); // creating a urlconnection object

            urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0 Unique/100.7.9656.57");        

            // wrapping the urlconnection in a bufferedreader
            BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), Charset.forName("UTF-8")));
            String line;
            
            // reading from the urlconnection using the bufferedreader
            while((line=bufferedReader.readLine())!=null)
            {
                content.append(line+"\n");
            }
            bufferedReader.close();
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
        System.out.println(content.toString());

    }
}

Answer 1

There's no wrong with your code, but the website does not want people to run crawlers on it.您的代码没有问题，但该网站不希望人们在其上运行爬虫。

See https://www.carrefour.fr/robots.txt见https://www.carrefour.fr/robots.txt
See also Robots exclusion standard另见机器人排除标准

错误：java.io.IOException：服务器返回 HTTP 响应代码：URL 403

问题描述

1 个解决方案

解决方案1
0 2022-05-10 09:20:51

错误：java.io.IOException：服务器返回 HTTP 响应代码：URL 403

问题描述

1 个解决方案

解决方案1 0 2022-05-10 09:20:51

解决方案1
0 2022-05-10 09:20:51