简体   繁体   中英

getting 403 error while using requests.get() python

While requesting multiple URL after getting couple of responses it starts giving me 403 error for the other urls.

i tried using user agents and proxy still problem exist. i also tried a delay of 0.5 sec .

im using - requests version = 2.22.0

here is what it looks like

Here is what (r.status_code, r.headers, r.text) looks like:

403 {'Allow': 'GET, POST, HEAD, PUT, PATCH, DELETE, OPTIONS', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html; charset=UTF-8', 'Accept-Ranges': 'bytes, bytes, bytes, bytes', 'Content-Length': '1519', 'Date': 'Thu, 06 Feb 2020 10:34:40 GMT', 'Connection': 'keep-alive', 'set-cookie': 'machine_cookie=9581501972230; expires=Wed, 05 Feb 2025 10:34:40 GMT; path=/;', 'X-Served-By': 'cache-sea4466-SEA, cache-maa18327-MAA', 'X-Cache': 'MISS, MISS', 'X-Cache-Hits': '0, 0', 'X-Timer': 'S1580985280.913451,VS0,VE312', 'Vary': 'User-Agent, Accept-Encoding'} <!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Access to this page has been denied.</title>
  <link href="https://fonts.googleapis.com/css?family=Open+Sans:300" rel="stylesheet">
  <style>
    html, body {
      margin: 0;
      padding: 0;
      font-family: 'Open Sans', sans-serif;
      color: #000;
    }

    .container {
      align-items: center;
      display: flex;
      flex: 1;
      justify-content: space-between;
      flex-direction: column;
      height: 100%;
    }

    .container > div {
      width: 100%;
      display: flex;
      justify-content: center;
    }

    .container > div > div {
      display: flex;
      width: 80%;
    }

    .customer-logo-wrapper {
      padding-top: 2rem;
      flex-grow: 0;
      background-color: #fff;
    }

    .customer-logo {
      border-bottom: 1px solid #000;
    }

    .customer-logo > img {
      padding-bottom: 1rem;
      max-height: 50px;
      max-width: 100%;
    }

    .page-title-wrapper {
      flex-grow: 0;  /* was 2, but that pushed it too far down the page */
    }

    .page-title {
      flex-direction: column-reverse;
    }

    .content-wrapper {
      flex-grow: 5;
    }

    .content {
      flex-direction: column;
    }

    @media (min-width: 768px) {
      html, body {
        height: 100%;
      }
    }
  </style>
  <script>
    window._pxAppId = 'PXxgCxM9By';
    window._pxJsClientSrc = '/xgCxM9By/init.js';
    window._pxHostUrl = '/xgCxM9By/xhr';

    startTime = Date.now();
    window._pxOnCaptchaSuccess = function(isValid){
      var solutionTime = Math.floor((Date.now() - startTime) / 1000);
      var reload = function(){ top.location.reload(); };
      sendEvent("captcha/solved?px_uuid=" + window._pxUuid + "&time_to_solution=" + solutionTime + '&isValid=' + isValid, reload);
      setTimeout(reload, 700);
    };

    function sendEvent(event, onload){
      var xhr = new XMLHttpRequest();
      xhr.open("GET", "/_sa_track/" + event);
      if (onload) xhr.addEventListener("load", onload);
      xhr.send();
    }
  </script>
<script type="text/javascript">window._pxVid = "";window._pxUuid = "47a70d80-48cc-11ea-860b-c96869955a6b";</script></head>
<body>
<section class="container">
  <div class="page-title-wrapper">
    <div class="page-title">
      <h1>Please click “I am not a robot” to continue</h1>
    </div>
  </div>
  <div class="content-wrapper">
    <div class="content">
      <div id="px-captcha"></div>
      <p></p>
      <p>
        To ensure this doesn’t happen in the future, please enable Javascript and cookies in your browser.<br/>
        Is this happening to you frequently? Please <a href="https://seekingalpha.userecho.com?source=captcha">report it on our feedback forum</a>.
      </p>
      <p>
        If you have an ad-blocker enabled you may be blocked from proceeding. Please disable your ad-blocker and refresh.
      </p>
      <p>Reference ID: <span id="refid"></span></p>
    </div>
  </div>
  <script>
    document.getElementById("refid").innerHTML = window._pxUuid;
    sendEvent("captcha/shown?px_uuid=" + window._pxUuid);
  </script>
</section>

<script src="/xgCxM9By/captcha/PXxgCxM9By/captcha.js?a=c&m=0"></script>

</body>
</html>

Server prevents you from obtaining desired information by showing a 403 Forbidden HTTP status code and a captcha to ensure that request is initiated by a human, not by a Python script. It is likely that remote served temporarily banned your session or your IP address.

There are some workarounds to avoid such ban from server, but there is no guarantee that you can overcome that restriction .

So I can only give you some advices:

  1. It's better to use Session instead of one-shot requests because it preserves state between requests.
  2. Use User-Agent like your browser does.
  3. Moderately increase a cooldown period between requests.
  4. Proxy can also be banned by remote server (usually based on its IP), so sometimes it is a good idea to use multiple proxies in round-robin mode.
  5. Your main objective to make your requests look like a requests from ordinary browser. You can examine requests coming from your browser to remote server in developer tab. Try to copy browser's behaviour.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM