![](/img/trans.png)
[英]Getting a CORS err and GCP functions err "Function invocation was interrupted. Error: memory limit exceeded" while trying to scrape a site using React
[英]Login and scrape a site that uses CORS(AWS) and JS to populate itself with Google Apps Script
我正在尝试访问以下日语站点并从表中抓取数据,但我正在努力使用 Google Apps 脚本登录。 我需要使用不依赖于桌面且可以完全在线完成的解决方案。 我对 web 开发/网络抓取没有那么丰富的经验,所以我基本上是在学习 go。
我有用户名和密码,但是:
2.登录页面使用CORS和AWS api进行身份验证,所以没有cookies,直到我成功登录并通过浏览器发送GET请求。
3.token有多个:x-logview-token,在登录POST请求的响应中,每个页面都会生成一个Page Token。
响应登录后请求:
{"username":"user@gmail.com","token":"this-is-the-token-value","enableDigits":true}
我正在考虑使用浏览器的 GET 请求中的 cookies 重新创建它并通过 Google Apps 脚本发送它。 有没有办法绕过登录或使用cookies登录?
<!DOCTYPE html>
<html lang=en>
<head>
<meta charset=utf-8>
<meta http-equiv=X-UA-Compatible content="IE=edge">
<meta name=viewport content="width=device-width,initial-scale=1">
<link rel=icon href=/favicon.ico>
<link rel=stylesheet href=//cdn.materialdesignicons.com/3.4.93/css/materialdesignicons.min.css>
<title>123ROBO 通話履歴</title>
<link href=/css/app.5339eed8.css rel=preload as=style>
<link href=/css/chunk-vendors.8b9ade74.css rel=preload as=style>
<link href=/js/app.32f2c21e.js rel=preload as=script>
<link href=/js/chunk-vendors.cd62bd72.js rel=preload as=script>
<link href=/css/chunk-vendors.8b9ade74.css rel=stylesheet>
<link href=/css/app.5339eed8.css rel=stylesheet>
</head>
<body><noscript><strong>We're sorry but logview doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript>
<div id=app></div>
<script src=/js/chunk-vendors.cd62bd72.js></script>
<script src=/js/app.32f2c21e.js></script>
</body>
</html>
这是网站: https://calllog-dev.123robo.com/#/login
这是我一直在尝试使用的代码:
function loginTest(){
//Added Basic Authorization
var userID = 'user@gmail.com';
var userPW = 'password' ;
var url = 'https://dbp3xa4g5g.execute-api.us-west-2.amazonaws.com/dev/users/authenticate';
//Added a body as pointed out by Mark. Added Request Headers as suggested by pguardiario
const requestOptions = {
method: 'POST',
headers: { 'Content-Type': 'application/json',
'authority': 'dbp3xa4g5g.execute-api.us-west-2.amazonaws.com',
'path': '/dev/users/authenticate',
'scheme': 'https',
'accept': '*/*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,ja;q=0.8',
'content-type': 'application/json',
'origin': 'https://calllog-dev.123robo.com',
'referer': 'https://calllog-dev.123robo.com/',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
},
var response = UrlFetchApp.fetch(url, requestOptions);
Logger.log(response);
Logger.log(response.getContentText("UTF-8"));
}
你不能只发送Authorization
header,当它需要一个body
时:
const requestOptions = {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username, password })
}
这个页面有一个普遍的 DOM 问题,还有一些拼写错误:
[DOM] 密码字段未包含在表单中:(更多信息: https://www.chromium.org/developers/design-documents/create-amazing-password-forms )
<input type="password" autocomplete="on" class="input">
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.