简体   繁体   English

Jsoup 在 Google App Engine 上运行不正确

[英]Jsoup runs incorrectly on Google App Engine

I try to login facebook with JSoup, when test with console app or Tomcat server webapp, it runs well:我尝试使用 JSoup 登录 facebook,当使用控制台应用程序或 Tomcat 服务器 webapp 进行测试时,它运行良好:

String userAgent = BrowserVersion.FIREFOX_38.getUserAgent();
Connection conn1 = Jsoup.connect("https://m.facebook.com");
Document document = conn1.userAgent(userAgent).execute().parse();
Element form = document.select("form").get(0);
Elements inputs = form.select("input");
Connection conn2 = Jsoup.connect(form.absUrl("action")).userAgent(userAgent);
for (Element input: inputs) {
    conn2.data(input.attr("name"), input.val());
}
conn2.data("email", "mail@gmail.com").data("pass", "mypass");
Connection.Response response1 = conn2.method(Connection.Method.POST).execute();
System.out.println(response1.url());
Document doc1 = response1.parse();
Elements h3 = doc1.select("h3");
for (Element element: h3) {
    System.out.println(element.text());
}

response1.url() is https://m.facebook.com/home.php?_rdr response1.url() 是https://m.facebook.com/home.php?_rdr

But when I try with Google App Engine app, it cannot login but shows the page with message "you have to login first" (I guess that the request try to access some other url)但是当我尝试使用 Google App Engine 应用程序时,它无法登录但显示带有消息“您必须先登录”的页面(我猜该请求尝试访问其他一些网址)

response1.url() is https://m.facebook.com/login.php?next=https%3A%2F%2Fm.facebook.com%2Fhome.php%3Frefsrc%3Dhttps%253A%252F%252Fm.facebook.com%252F%26refid%3D8&refsrc=https%3A%2F%2Fm.facebook.com%2F&refid=8&_rdr response1.url() 是https://m.facebook.com/login.php?next=https%3A%2F%2Fm.facebook.com%2Fhome.php%3Frefsrc%3Dhttps%253A%252F%252Fm.facebook。 com%252F%26refid%3D8&refsrc=https%3A%2F%2Fm.facebook.com%2F&refid=8&_rdr

I change the password to make it incorrect, it shows the incorrect-password page.我更改密码以使其不正确,它显示错误密码页面。 So I think with correct password, the request goes to the proper page.所以我认为使用正确的密码,请求会转到正确的页面。 But why with correct password, it redirects to other page?但是为什么使用正确的密码,它会重定向到其他页面? I don't think there is any different thing between console app and GAE that makes the request different!我认为控制台应用程序和 GAE 之间没有任何不同的东西使请求不同!

App Engine can do a few strange things from time to time when you're creating outbound HTTP requests from the server.当您从服务器创建出站 HTTP 请求时,App Engine 有时会做一些奇怪的事情。 Depending on how JSoup implements the underlying request, it's possible something is getting mixed up.根据 JSoup 实现底层请求的方式,可能会出现一些混淆。 Under the hood, it would likely need to use urlfetch .幕后,它可能需要使用urlfetch

One thing you could try here is to use App Engine Flexible .您可以在这里尝试的一件事是使用App Engine Flexible Outbound requests aren't going to be piped through UrlFetch or the Sockets API there, since ultimately your app is just running on Google Compute Engine.出站请求不会通过 UrlFetch 或那里的 Sockets API 进行管道传输,因为最终您的应用程序只是在 Google Compute Engine 上运行。

If your app isn't using a ton of the App Engine APIs, this could be an easy way to get around the issue.如果您的应用没有使用大量 App Engine API,这可能是解决问题的一种简单方法。

Hope this helps!希望这有帮助!

I would suggest that after successful login, facebook returns some redirect (several sequential redirects?) as a response.我建议在成功登录后,facebook 返回一些重定向(几个顺序重定向?)作为响应。

When you run your app outside GAE, java just follows those redirects automatically, and eventually returns https://m.facebook.com/home.php?_rdr as a result.当您在 GAE 之外运行您的应用程序时,java 只会自动跟踪这些重定向,并最终返回https://m.facebook.com/home.php?_rdr作为结果。

On the other hand, when running inside GAE, GAE's URLFetch library follows up to 5 redirects only (or none, if configured so).另一方面,当在 GAE 内部运行时,GAE 的 URLFetch 库最多只跟踪 5 个重定向(或没有,如果配置如此)。

If that's the case, then you could follow those redirects in your code.如果是这种情况,那么您可以在代码中遵循这些重定向。

Can you also print response1.statusCode() to check this hypothesis?你也可以打印response1.statusCode()来检查这个假设吗?

因为这是处理某种身份验证,我想知道您是否需要保留第一个响应中的 cookie 并在后续请求中利用它们来提供该身份验证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM