简体   繁体   中英

How to Determine the Download Link from a Secure Link?

This is my first ever post on stackoverflow. StackOverflow has always been a helper for me in case of any difficulties, I face during coding.

Well not wasting much time, let me go to the problem in which I have stuck,

For a project, I am building up the database.

I have a database of hyperlinks in this format,

http://link.xyz.com/?id=108
http://link.xyz.com/?id=109
httpp://link.xyz.com/?id=110

and so on.

These links when fired up in browser, redirects me to a download link which starts downloading the content.

Example:

When httpp://link.xyz.com/?id=108 is opened in browser, it redirects me to the below url.

httpp://xyz.com/abc/pqr/some_content.avi [download link].

SO i am looking for a solutions which converts my huge list of hyperlinks into download link.

Solution is any programming language is acceptable, as far as the secure links are converted into download links.

I tried using HttpURLConnection and several libraries in JAVA but no sucess.

I throws away the below exception,

Request URL ... httpp://link.xyz.com/?id=3108
Response Code ... 403
java.io.IOException: Server returned HTTP response code: 403 for URL: httpp://link.xyz.com/?id=3108
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at Fetch.main(Fetch.java:56)
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: httpp://link.xyz.com/?id=3108
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at java.net.HttpURLConnection.getResponseCode(Unknown Source)
    at Fetch.main(Fetch.java:26)



PS The above exception is caused only by these link, the program runs fine with other links.

Guys Please Help, this problem is killing me and i am not able to progress in the project because of this.

Note : httpp are added purposely as i was not able to post with more than 2 hyperlinks

Thank you

The 403 HTTP error code is the code for "Forbidden". The server does not want you to access that resource.

One reason for getting this response-code is that you are not logged in. The server expects you to log in with username and password before you are allowed to download, likely with a HTTP-POST-request to the login-form somewhere on the website. It will then reply with a Set-Cookie: in the header which includes a session-id which serves as a proof that you are authenticated. It will expect you to include the same value in the Cookie: header of any future request.

Another reason could be that the website detects that you are not using a web browser and wants to prevent you from scraping their content. You should respect that! When you really want to ignore the website administrators wishes, you need to find out what exactly causes them to detect your program as a non-browser. It might just check your User-Agent header, but there are millions of other ways in which your program likely behaves different which could trigger the detection. Without knowing what the server checks, one can not give you any correct answer.

The next problem will be following the redirects. You could get a response with the HTTP status codes 301, 307 or 308. You will then find the real URL in the Location -header of the response. Another way to implement redirects is via Javascript on the client-side (popular for download-portals, because it gives the opportunity to show more advertisement). That means you will have to parse the content-body and extract the real URL from its sourcecode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM