简体   繁体   English

403 禁止使用 Java 而不是 Web 浏览器?

[英]403 Forbidden with Java but not web browser?

I am writing a small Java program to get the amount of results for a given Google search term.我正在编写一个小型 Java 程序来获取给定 Google 搜索词的结果数量。 For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers.出于某种原因,在 Java 中,我收到了 403 Forbidden,但我在 Web 浏览器中得到了正确的结果。 Code:代码:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;


public class DataGetter {

    public static void main(String[] args) throws IOException {
        getResultAmount("test");
    }

    private static int getResultAmount(String query) throws IOException {
        BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
                .getInputStream()));
        String line;
        String src = "";
        while ((line = r.readLine()) != null) {
            src += line;
        }
        System.out.println(src);
        return 1;
    }

}

And the error:和错误:

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
    at DataGetter.getResultAmount(DataGetter.java:15)
    at DataGetter.main(DataGetter.java:10)

Why is it doing this?为什么要这样做?

You just need to set user agent header for it to work:您只需要设置用户代理标头即可使其工作:

URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();

BufferedReader r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));

StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
    sb.append(line);
}
System.out.println(sb.toString());

The SSL was transparently handled for you as could be seen from your exception stacktrace.从您的异常堆栈跟踪中可以看出,SSL 已为您透明处理。

Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.但是,获取结果数量并不是那么简单,在此之后,您必须通过获取 cookie 并解析重定向令牌链接来假装您是浏览器。

String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
    String url = m.group(1);
    connection = new URL(url).openConnection();
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
    connection.setRequestProperty("Cookie", cookie );
    connection.connect();
    r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
    sb = new StringBuilder();
    while ((line = r.readLine()) != null) {
        sb.append(line);
    }
    response = sb.toString();
    pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
    m = pattern.matcher(response);
    if( m.find() ) {
        long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
        return amount;
    }

}

Running the full code I get 2930000000L as a result.运行完整代码我得到2930000000L结果。

对我来说,它通过添加标题起作用:“接受”:“*/*”

You probably aren't setting the correct headers.您可能没有设置正确的标题。 Use LiveHttpHeaders (or equivalent) in the browser to see what headers the browser is sending, then emulate them in your code.在浏览器中使用LiveHttpHeaders (或等效项)查看浏览器发送的标头,然后在您的代码中模拟它们。

It's because the site uses SSL.这是因为该站点使用 SSL。 Try using the Jersey HTTP Client.尝试使用 Jersey HTTP 客户端。 You will probably also have to learn a little about HTTPS and the certificates, but I think Jersey can bet set to ignore most of the details relating to the actual security.您可能还需要了解一些关于 HTTPS 和证书的知识,但我认为 Jersey 可以打赌忽略与实际安全性相关的大多数细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM