简体   繁体   中英

Login and scrape a site that uses CORS(AWS) and JS to populate itself with Google Apps Script

I am trying to access the following Japanese site and scrape data from a table, but I am struggling to login using Google Apps Script. I need to use a solution that does not rely on a desktop and can be done completely online. I am not that experienced with web development/web scraping, so I'm basically learning as I go.

I have the username and password, but:

  1. I am unable to figure out what headers I need to send to login. I get 403 errors when using the actual url, https://calllog-dev.123robo.com/#/login , and 502 when using the request url from the browser, https://dbp3xa4g5g.execute-api.us-west-2.amazonaws.com/dev/users/authenticate

2.The login page uses CORS and AWS api to Authenticate, so there are no cookies until I have successfully logged in and send a GET request via browser.

3.There are multple tokens: x-logview-token which is within the response to the POST request for logging in, and a Page Token is generated for each page.

Response to Login Post Request:

    {"username":"user@gmail.com","token":"this-is-the-token-value","enableDigits":true}

I am thinking of using cookies from the browser's GET Request to recreate it and send it through Google Apps Scripts. Is there someway to bypass the login or use the cookies to login?

<!DOCTYPE html>
<html lang=en>

<head>
  <meta charset=utf-8>
  <meta http-equiv=X-UA-Compatible content="IE=edge">
  <meta name=viewport content="width=device-width,initial-scale=1">
  <link rel=icon href=/favicon.ico>
  <link rel=stylesheet href=//cdn.materialdesignicons.com/3.4.93/css/materialdesignicons.min.css>
  <title>123ROBO 通話履歴</title>
  <link href=/css/app.5339eed8.css rel=preload as=style>
  <link href=/css/chunk-vendors.8b9ade74.css rel=preload as=style>
  <link href=/js/app.32f2c21e.js rel=preload as=script>
  <link href=/js/chunk-vendors.cd62bd72.js rel=preload as=script>
  <link href=/css/chunk-vendors.8b9ade74.css rel=stylesheet>
  <link href=/css/app.5339eed8.css rel=stylesheet>
</head>

<body><noscript><strong>We're sorry but logview doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
  <div id=app></div>
  <script src=/js/chunk-vendors.cd62bd72.js></script>
  <script src=/js/app.32f2c21e.js></script>
</body>

</html>

Here is the website: https://calllog-dev.123robo.com/#/login

Here is the Code I have been trying to use:

function loginTest(){

//Added Basic Authorization
  var userID = 'user@gmail.com';
                  
  var userPW = 'password'  ;
 
  var url = 'https://dbp3xa4g5g.execute-api.us-west-2.amazonaws.com/dev/users/authenticate';
  
  //Added a body as pointed out by Mark. Added Request Headers as suggested by pguardiario
  const requestOptions = {
    method: 'POST',
    headers: { 'Content-Type': 'application/json',
             'authority': 'dbp3xa4g5g.execute-api.us-west-2.amazonaws.com',
              'path': '/dev/users/authenticate',
              'scheme': 'https',
              'accept': '*/*',
              'accept-encoding': 'gzip, deflate, br',
              'accept-language': 'en-US,en;q=0.9,ja;q=0.8',
              'content-type': 'application/json',
              'origin': 'https://calllog-dev.123robo.com',
              'referer': 'https://calllog-dev.123robo.com/',
              'sec-fetch-dest': 'empty',
              'sec-fetch-mode': 'cors',
              'sec-fetch-site': 'cross-site',
              'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
             },
               
  
var response = UrlFetchApp.fetch(url, requestOptions);
  
  Logger.log(response);
  Logger.log(response.getContentText("UTF-8"));
  
}

You can't just send an Authorization header, when it expects a body :

const requestOptions = {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ username, password })
  }

And this page has one general DOM issue, along with some typos:

[DOM] Password field is not contained in a form: (More info: https://www.chromium.org/developers/design-documents/create-amazing-password-forms )
<input type="password" autocomplete="on" class="input">

That's the wrong url, take a look: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM