简体   繁体   中英

Unknown Host Exception with GAE app when trying to connect to a website

I am learning how to scrape using htmlUnit in Java 8 and I am trying to deploy an app to the google app engine that will scrape certain websites every so often. I am developing the app in Eclipse and it works as expected when run locally, however after deploying to GAE my app is no longer able to make connections to any websites.

 try (final WebClient webClient = new WebClient()) {

      webClient.setCookieManager(new CookieManager() {
          protected int getPort(final java.net.URL url) {
          final int r = super.getPort(url);
          return r != -1 ? r : 80;
          }
          });  

      final HtmlPage page = webClient.getPage("https://www.google.com");
      }

  catch(Exception e){
      System.out.println(e.getMessage());         
  }

The error occurs at "webClient.getPage(....)"

java.net.UnknownHostException: www.google.com

Partial stack trace:

[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>: java.lang.RuntimeException: java.net.UnknownHostException: www.recreation.gov
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.UrlFetchWebConnection.getResponse(UrlFetchWebConnection.java:162)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1394)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1312)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:396)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:317)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:465)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:450)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at pack.HelloAppEngine.doGet(HelloAppEngine.java:49)
[s~permitseacherbpd/20180314t161057.408306947286449649].<stderr>:   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)

This error occurs for any website I try to access and is NOT exclusive to htmlUnit as I have encountered this error before in other projects. Why can I not connect after deploying to the app engine?

My little test servlet started to throw similar UnknownHostException s too. I came across #63916008 which links to the <url-stream-handler> documentation that notes (emphasis mine)

For the Java 8 runtime, the default value is native, which means that standard Java network classes use the standard Java HTTP(S) transport, as described in Java 8 runtime vs Java 7 behavior. This setting requires the app to have billing enabled , otherwise the following runtime errors will result from requests:

java.net.UnknownHostException
java.net.SocketTimeoutException
java.io.IOException

Configure your appengine-web.xml to use urlfetch and your problem should be solved!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM