[英]Error : java.io.IOException: Server returned HTTP response code: 403 for URL
I'm new to development and I'm on IntelliJ IDEA 2022.1 (Community Edition).我是开发新手,我正在使用 IntelliJ IDEA 2022.1(社区版)。 For example, I want to connect and retrieve the web page: www.carrefour.fr I have the following error:
例如我想连接和检索网页: www.carrefour.fr我有以下错误:
java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.carrefour.fr/
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1919)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
at Main.main(Main.java:111)
Process finished with exit code 0
I have searched a lot on the internet and tested but it does not solve the problem.我在互联网上搜索了很多并进行了测试,但它并没有解决问题。 How to correct the problem?
如何纠正问题?
import java.io.IOException;
import java.net.URL;
import java.net.*;
import java.io.*;
import java.nio.charset.Charset;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Main {
public static void main(String[] args) {
StringBuilder content=new StringBuilder();
// Use try and catch to avoid the exceptions
try
{
URL url=new URL("https://www.carrefour.fr"); // creating a url object
// First set the default cookie manager.
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
URLConnection urlConnection=url.openConnection(); // creating a urlconnection object
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0 Unique/100.7.9656.57");
// wrapping the urlconnection in a bufferedreader
BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), Charset.forName("UTF-8")));
String line;
// reading from the urlconnection using the bufferedreader
while((line=bufferedReader.readLine())!=null)
{
content.append(line+"\n");
}
bufferedReader.close();
}
catch(Exception e)
{
e.printStackTrace();
}
System.out.println(content.toString());
}
}
There's no wrong with your code, but the website does not want people to run crawlers on it.您的代码没有问题,但该网站不希望人们在其上运行爬虫。
See https://www.carrefour.fr/robots.txt见https://www.carrefour.fr/robots.txt
See also Robots exclusion standard另见机器人排除标准
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.