简体   繁体   中英

jsoup error fetching URL. Status=503 only on Heroku

when using Jsoup to connect to https://rateyourmusic.com through localhost it works just fine, however, on Heroku, I always receive error 503, even using an userAgent

String url = "https://rateyourmusic.com/charts/top/album/2016";
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/70.0").followRedirects(true).timeout(100000).ignoreContentType(true).get();

Heroku log:

2019-10-26T23:20:06.674831+00:00 heroku[router]: at=info method=GET path="/searchTrack?searchRadio=2&playlistName=&searchNameArtist=&searchNameAlbum=https%3A%2F%2Frateyourmusic.com%2Fcharts%2Ftop%2Falbum%2F2016&amountChart=3&amountRadio=3" host=gettoptracks.herokuapp.com request_id=026060b4-71ab-4510-9809-fe5cffc3f325 fwd="176.32.19.237" dyno=web.1 connect=1ms service=313ms status=200 bytes=11534 protocol=https

    2019-10-26T23:20:06.670478+00:00 app[web.1]: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=503, URL=https://rateyourmusic.com/charts/top/album/2016

    2019-10-26T23:20:06.670652+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:760)

    2019-10-26T23:20:06.670655+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:705)

    2019-10-26T23:20:06.670661+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:295)

    2019-10-26T23:20:06.670663+00:00 app[web.1]:    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:284)

    2019-10-26T23:20:06.670668+00:00 app[web.1]:    at com.spotifyapi.demo.service.ServiceApiImpl.getRYM(ServiceApiImpl.java:561)

   ...

    2019-10-26T23:20:06.671189+00:00 app[web.1]:    at java.lang.Thread.run(Thread.java:748)

If I try to connect to another website on Heroku using Jsoup it works.

Thanks in advance.

That's not a problem in your code. The error 503 is returned by the server. That means there's something the server didn't like about your request or your client and refused to return normal response. Probably Heroku is the reason and it's blocked to avoid scraping their site.
To be 100% sure you can use something else to download contents of this page to avoid using Jsoup use plain HttpClient or even pure Java: How to download and save a file from Internet using Java?
If the result is the same that confirms they block Heroku. You may try connecting through some proxy to overcome this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM