简体   繁体   中英

webpage returns different status codes at different requests

This is part of code of my project.

URL="http://www.amazon.com",
HTTPOpts = [{autoredirect, false}],
case httpc:request(get, {URL, [{"User-Agent", "Mozilla"}]}, HTTPOpts, []) of
 {ok, {{_, Code, _}, Headers, Body}}  when Code == 200  ->

  %%code for process code=200 %%

  {ok, {{_, Code, _}, Headers, _}}  when Code < 310 , Code >= 300 ->

    %% redirection

  {ok, {{_, Code, _}, Headers, _}}  when Code ==503 ->

    %%service unavailable

The problem is when I perform http request, it returns different status Code.

In case of URL above I'm getting two responses, Code = 200 and Code = 503 , how do I handle this, so that I always get Code = 200

I also tried it using wget "www.amazon.com" , it gives same result.

My idea: re-request in case of Code = 503 , but problem with this it may go into loop and may never return Code = 200 or return after several iterations, which produce delay in client request.

How to resolve it?

As developers, we do not have control over the responses of third-party systems we try to talk to. With the example you provided, it seems like amazon is deliberately denying you access because they suspect you as a bot or scraper. You can prove this by looking at the response body whenever you get a 503 .

What you can do as a developer, is to adapt to every possible situation that can occur when connecting to a certain system.

For HTTP, when you encounter 5xx error codes, normally you will need to retry your request. To prevent yourself being stuck on a loop, implement an exponential backoff with a limit on how much you allow your code to retry.

HTTP 4xx error codes usually means there's something wrong with your request. You don't want to retry here, just take a look at what could be wrong with your request.

For your particular case, since Amazon thinks you're an automated visitor, try to mimic a normal web browser. Start with the User-agent header, cookies, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM