![](/img/trans.png)
[英]Getting a CORS err and GCP functions err "Function invocation was interrupted. Error: memory limit exceeded" while trying to scrape a site using React
[英]Login and scrape a site that uses CORS(AWS) and JS to populate itself with Google Apps Script
我正在嘗試訪問以下日語站點並從表中抓取數據,但我正在努力使用 Google Apps 腳本登錄。 我需要使用不依賴於桌面且可以完全在線完成的解決方案。 我對 web 開發/網絡抓取沒有那么豐富的經驗,所以我基本上是在學習 go。
我有用戶名和密碼,但是:
2.登錄頁面使用CORS和AWS api進行身份驗證,所以沒有cookies,直到我成功登錄並通過瀏覽器發送GET請求。
3.token有多個:x-logview-token,在登錄POST請求的響應中,每個頁面都會生成一個Page Token。
響應登錄后請求:
{"username":"user@gmail.com","token":"this-is-the-token-value","enableDigits":true}
我正在考慮使用瀏覽器的 GET 請求中的 cookies 重新創建它並通過 Google Apps 腳本發送它。 有沒有辦法繞過登錄或使用cookies登錄?
<!DOCTYPE html>
<html lang=en>
<head>
<meta charset=utf-8>
<meta http-equiv=X-UA-Compatible content="IE=edge">
<meta name=viewport content="width=device-width,initial-scale=1">
<link rel=icon href=/favicon.ico>
<link rel=stylesheet href=//cdn.materialdesignicons.com/3.4.93/css/materialdesignicons.min.css>
<title>123ROBO 通話履歴</title>
<link href=/css/app.5339eed8.css rel=preload as=style>
<link href=/css/chunk-vendors.8b9ade74.css rel=preload as=style>
<link href=/js/app.32f2c21e.js rel=preload as=script>
<link href=/js/chunk-vendors.cd62bd72.js rel=preload as=script>
<link href=/css/chunk-vendors.8b9ade74.css rel=stylesheet>
<link href=/css/app.5339eed8.css rel=stylesheet>
</head>
<body><noscript><strong>We're sorry but logview doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
<div id=app></div>
<script src=/js/chunk-vendors.cd62bd72.js></script>
<script src=/js/app.32f2c21e.js></script>
</body>
</html>
這是網站: https://calllog-dev.123robo.com/#/login
這是我一直在嘗試使用的代碼:
function loginTest(){
//Added Basic Authorization
var userID = 'user@gmail.com';
var userPW = 'password' ;
var url = 'https://dbp3xa4g5g.execute-api.us-west-2.amazonaws.com/dev/users/authenticate';
//Added a body as pointed out by Mark. Added Request Headers as suggested by pguardiario
const requestOptions = {
method: 'POST',
headers: { 'Content-Type': 'application/json',
'authority': 'dbp3xa4g5g.execute-api.us-west-2.amazonaws.com',
'path': '/dev/users/authenticate',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,ja;q=0.8',
'content-type': 'application/json',
'origin': 'https://calllog-dev.123robo.com',
'referer': 'https://calllog-dev.123robo.com/',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
},
var response = UrlFetchApp.fetch(url, requestOptions);
Logger.log(response);
Logger.log(response.getContentText("UTF-8"));
}
你不能只發送Authorization
header,當它需要一個body
時:
const requestOptions = {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password })
}
這個頁面有一個普遍的 DOM 問題,還有一些拼寫錯誤:
[DOM] 密碼字段未包含在表單中:(更多信息: https://www.chromium.org/developers/design-documents/create-amazing-password-forms )
<input type="password" autocomplete="on" class="input">
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.