简体   繁体   English

使用JSoup遇到错误。 为什么?

[英]Getting an error using JSoup. Why?

I'm trying to login and extract data from a fantasyfootball website. 我正在尝试从Fantasyfootball网站登录并提取数据。

I get the following error, 我收到以下错误,

Jul 24, 2015 8:01:12 PM StatsCollector main SEVERE: null org.jsoup.HttpStatusException: HTTP error fetching URL. 2015年7月24日8:01:12 PM StatsCollector主要SEVERE:null org.jsoup.HttpStatusException:HTTP错误获取URL。 Status=403, URL= http://fantasy.premierleague.com/ at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205) at StatsCollector.main(StatsCollector.java:26) Status = 403,URL = org.jsoup.helper.HttpConnection $ Response.execute(HttpConnection.java:537)上的org.jsoup.helper.HttpConnection $ Response.execute(HttpConnection。上的http://fantasy.premierleague.com/ org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)处的StatsCollector.main(StatsCollector.java:26)的java:493)

whenever I try this code. 每当我尝试此代码。 Where am I going wrong? 我要去哪里错了?

    public class StatsCollector {

    public static void main (String [] args){

        try {
            String url = "http://fantasy.premierleague.com/";
            Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();

            Response res= Jsoup
                    .connect(url)
                    .data("ismEmail", "example@googlemail.com", "id_password", "examplepassword")
                    .method(Method.POST)
                    .execute();


            Map<String, String> loginCookies = res.cookies();

            Document doc = Jsoup.connect("http://fantasy.premierleague.com/transfers")
                    .cookies(loginCookies)
                    .get();

            String title = doc.title();
            System.out.println(title);
        }  

        catch (IOException ex) {
            Logger.getLogger(StatsCollector.class.getName()).log(Level.SEVERE,null,ex);
        }
    }

}
Response res= Jsoup
                .connect(url)
                .data("ismEmail", "example@googlemail.com", "id_password", "examplepassword")
                .method(Method.POST)
                .execute();

Are you trying to execute this actual code? 您是否要执行此实际代码? This seems to be an example code with placeholders instead of login credentials. 这似乎是带有占位符而不是登录凭据的示例代码。 This would explain the error you received, HTTP 403 . 这将解释您收到的错误HTTP 403

Edit 1 编辑1

My bad. 我的错。 I took a look at the login form on that site, and it seems to me that you confused the id of the input elements ("ismEmail" and "id_password" with the name which gets sent with the form ("email", "password"). Is this working for you? 我查看了该站点上的登录表单,在我看来,您将输入元素的id (“ ismEmail”和“ id_password”)与通过表单发送的name (“ email”,“ password” ”)。这对您有用吗?

Response res= Jsoup
                .connect(url)
                .data("email", "example@googlemail.com", "password", "examplepassword")
                .method(Method.POST)
                .execute();

Edit 2 编辑2

Okay, this was stuck in my head, beacause signing into a website with JSoup should not be that hard. 好的,这一直困扰着我,因为使用JSoup登录网站应该不那么困难。 I created an account there and tried for myself. 我在那里建立了一个帐户,并为自己尝试。 Code first: 代码优先:

 String url = "https://users.premierleague.com/PremierUser/j_spring_security_check";

        Response res = Jsoup
                .connect(url)
                .followRedirects(false)
                .timeout(2_000)
                .data("j_username", "<USER>")
                .data("j_password", "<PASSWORD>")
                .method(Method.POST)
                .execute();

        Map<String, String> loginCookies = res.cookies();

        Document doc = Jsoup.connect("http://fantasy.premierleague.com/squad-selection/")
                .cookies(loginCookies)
                .get();

So what is happening here? 那么这里发生了什么? First I realized, that the target of the login form was wrong. 首先,我意识到登录表单的目标是错误的。 The page seems to be built on spring, so the form attributes and target use spring defaults j_spring_security_check , j_username and j_password . 该网页似乎要在春建,所以表单属性和目标使用Spring默认j_spring_security_checkj_usernamej_password Then a read timeout occurred to me, until I set the flag followRedirects(false) . 然后我发生了读取超时,直到我将标志设置为followRedirects(false)为止。 I can only guess why this helped, but maybe this is a protection against crawlers? 我只能猜测为什么这样做有帮助,但是也许这是对爬虫的保护?

In the end i try to connect to the squad selection page, and the parsed response contains my personal view and data. 最后,我尝试连接到小队选择页面,解析的响应包含我的个人观点和数据。 This code seems to work for me, would you give it a try? 该代码似乎对我有用,您可以尝试一下吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM