在保护评论表单和相关 API 端点时，是否应该在浏览器、服务器或两者中对输入进行清理、验证和编码？

Question

I am trying to secure, as best as possible, a comment form in a non-CMS environment with no user authentication.我试图在没有用户身份验证的非 CMS 环境中尽可能地保护评论表单。

The form should be secure against both browser and curl/postman type requests.该表单应该对浏览器和 curl/postman 类型的请求都是安全的。

Environment环境

Backend - Node.js, MongoDB Atlas and Azure web app.后端 - Node.js、MongoDB Atlas 和 Azure Web 应用程序。
Frontend - jQuery.前端 - jQuery。

Below is a detailed, but hopefully not too overwhelming, overview of my current working implementation.下面是对我当前工作实现的详细概述，但希望不要太压倒性。

Following that are my questions about the implementation.接下来是我关于实施的问题。

Related Libraries Used使用的相关库

Helmet - helps secure Express apps by setting various HTTP headers, including Content Security Policy Helmet - 通过设置各种 HTTP 标头（包括内容安全策略）来帮助保护 Express 应用程序
reCaptcha v3 - protects against spam and other types of automated abuse reCaptcha v3 - 防止垃圾邮件和其他类型的自动滥用
DOMPurify - an XSS sanitizer DOMPurify - XSS 消毒剂
validator.js - a library of string validators and sanitizers validator.js - 字符串验证器和消毒器库
he - an HTML entity encoder/decoder he - HTML 实体编码器/解码器

The general flow of data is:一般的数据流是：

/*
on click event:  
- get sanitized data
- perform some validations
- html encode the values
- get recaptcha v3 token from google
- send all data, including token, to server
- send token to google to verify
- if the response 'score' is above 0.5, add the submission to the database  
- return the entry to the client and populate the DOM with the submission   
*/

POST request - browser POST 请求 - 浏览器

// test input:  
// <script>alert("hi!")</script><h1>hello there!</h1> <a href="">link</a>

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true) {

/* 
encode the sanitized input 
not sure if i should encode BEFORE adding to MongoDB  
or just add to database "as is" and encode BEFORE displaying in the DOM with $("#ouput").html(html_content);
*/  
var sanitized_encoded_input_1_text = he.encode(input_1_text);
var sanitized_encoded_input_2_text = he.encode(input_2_text);

// define parameters to send to database  
var parameters = {};
parameters.input_1_text = sanitized_encoded_input_1_text; 
parameters.input_2_text = sanitized_encoded_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
    grecaptcha.execute('site-key-here', { action: 'submit' }).then(function(token) {
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    });
});
}

POST request - server POST 请求 - 服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};

/*
if google's response 'score' is greater than 0.5, 
add submission to the database and populate client DOM with $("#output").prepend(html); 
see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
*/
if (score >= 0.5) {

    // add submission to database 
    // return submisson to client to update the DOM
    // DOM will just display this text:  <h1>hello there!</h1> <a href="">link</a>
});

GET request on page load页面加载时的 GET 请求

Logic/Assumptions:逻辑/假设：

Get all submissions, return to client and add to DOM with $("#output").html(html_content);获取所有提交，返回客户端并使用$("#output").html(html_content);添加到 DOM 中$("#output").html(html_content); . .
Don't need to encode values before populating DOM because values are already encoded in database?在填充 DOM 之前不需要对值进行编码，因为值已经在数据库中编码了吗？

POST request from curl, postman etc来自 curl、邮递员等的 POST 请求

Logic/Assumptions:逻辑/假设：

They don't have google token, and therefore can't verify it from server, and can't add entries to the database?他们没有谷歌令牌，因此无法从服务器验证它，也无法向数据库添加条目？

Helmet configuration on server服务器上的头盔配置

app.use(
    helmet({
        contentSecurityPolicy: {
            directives: {
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://somedomain.io", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:"],
                frameSrc: ["'self'", "https://www.google.com"]
            }
        },
    })
);

Questions问题

Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?我应该将值作为 HTML 编码实体添加到 MongoDB 数据库中还是“按原样”存储它们并在用它们填充 DOM 之前对它们进行编码？
If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example "<h1>hello there!</h1> <a href="">link</a> wouldn't return any results because the value in the database was <h1>hello there!</h1> <a href="">link</a>如果值是被保存在MongoDB的HTML实体，将本作搜索内容很困难，因为搜索，例如数据库"<h1>hello there!</h1> <a href="">link</a>不会返回任何结果，因为数据库中的值是<h1>hello there!</h1> <a href="">link</a>
In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.在我关于保护 Web 表单的阅读中，很多人说客户端实践是相当多余的，因为 DOM 中的任何内容都可以更改，可以禁用 JavaScript，并且可以使用 curl 或 postman 直接向 API 端点发出请求，因此可以绕过任何客户端方法。
With that said should sanitization ( DOMPurify ), validation ( validator.js ) and encoding ( he ) be performed either: 1) client side only 2) client side and server side or 3) server side only?话虽如此，是否应该执行消毒（ DOMPurify ）、验证（ validator.js ）和编码（ he ）：1）仅客户端 2）客户端和服务器端或 3）仅服务器端？

For thoroughness, here is another related question:为了彻底，这是另一个相关问题：

Do any of the following components do any automatic escaping or HTML encoding when sending data from client to server?从客户端向服务器发送数据时，以下任何组件是否执行任何自动转义或 HTML 编码？ I ask because if they do, it may make some manual escaping or encoding unnecessary.我问是因为如果他们这样做，它可能会使一些手动转义或编码变得不必要。

jQuery ajax() requests jQuery ajax() 请求
Node.js节点.js
Express表达
Helmet头盔
bodyParser (node package) bodyParser（节点包）
MongoDB native driver MongoDB 本机驱动程序
MongoDB MongoDB

Answer 1

You should always unsure that every data you use is sanitized on the backend before usage !您应该始终不确定您使用的每个数据在使用前是否都在后端进行了消毒！

See https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html见https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

Answer 2

After reading more around the topic, this is the approach I came up with:在阅读了有关该主题的更多信息后，这是我想出的方法：

On click event:点击事件：

Sanitize data ( DOMPurify )清理数据 ( DOMPurify )
Validate data ( validator.js )验证数据 ( validator.js )
Get recaptcha v3 token from google ( reCaptcha v3 )从谷歌获取 recaptcha v3 令牌（ reCaptcha v3 ）
Send all data, including token, to server将所有数据（包括令牌）发送到服务器
Server is using Helmet服务器正在使用头盔
Server is using Express Rate Limit and Rate Limit Mongo to limit POST requests on a certain route to X per X milliseconds (by IP address)服务器使用的是快速速率限制和速率限制蒙戈限制POST对某个请求路由到X每X毫秒（通过IP地址）
Server is behind Cloudflare proxy which provides some security and caching features (requires setting app.set('trust proxy', true) in node server file in order for rate limiter to pick up the user's actual IP address - see Express behind proxies )服务器位于Cloudflare代理后面，它提供一些安全和缓存功能（需要在节点服务器文件中设置app.set('trust proxy', true)以便速率限制器获取用户的实际 IP 地址 - 请参阅代理后面的 Express ）
Send token to google from server to verify ( reCaptcha v3 )从服务器向谷歌发送令牌以进行验证（ reCaptcha v3 ）
If the response 'score' is above 0.5 , perform the same santization and validations again如果响应 'score' 高于0.5 ，则再次执行相同的清理和验证
If the validations pass, add entry to database with a moderated flag value of false如果验证通过，则将moderated标志值为false条目添加到数据库中

Rather than immediately return entries to the browser, I decided instead to require a process of manual moderation which involves changing the moderated value of an entry to true .我没有立即将条目返回到浏览器，而是决定需要一个手动审核过程，其中涉及将条目的moderated值更改为true 。 Whilst it takes away the immediacy of the response for the user, it makes it less tempting for spammers etc if responses aren't immediately published.虽然它消除了用户响应的即时性，但如果没有立即发布响应，它会降低垃圾邮件发送者等的吸引力。

The GET request on page load then returns all entries that are moderated: true页面加载时的GET请求然后返回所有经过moderated: true条目moderated: true
HTML encode the values before displaying them ( he ) HTML 在显示之前对值进行编码 ( he )
Populate the DOM with the HTML encoded entries用 HTML 编码的条目填充 DOM

The code looked something like this:代码如下所示：

POST request - browser POST 请求 - 浏览器

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });

// validation - regex to only allow certain characters
// for pattern, see:  https://stackoverflow.com/q/63895992
var pattern = /^(?!.*([ ,'-])\1)[a-zA-Z]+(?:[ ,'-]+[a-zA-Z]+)*$/;
var input_1_text_valid_characters = validator.matches(sanitized_input_1_text, pattern, "gm");
var input_2_text_valid_characters = validator.matches(sanitized_input_2_text, pattern, "gm");

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true && input_1_text_valid_characters === true && input_2_text_valid_characters === true) {

// define parameters to send to database  
var parameters = {};
parameters.input_1_text = sanitized_input_1_text; 
parameters.input_2_text = sanitized_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
    grecaptcha.execute('site-key-here', { action: 'submit_entry' }).then(function(token) {
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    });
});
}

POST request - server POST 请求 - 服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};

// if google's response 'score' is greater than 0.5, 
// see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score  

if (score >= 0.5) {

// perform all the same sanitizations and validations to protect against
// POST requests direct to the API via curl or postman etc  
// if validations pass, add entry to the database with `moderated: false` property   


});

GET request - browser GET 请求 - 浏览器

Logic:逻辑：

Get all entries with moderated: true property获取所有带有moderated: true属性的条目
HTML encode values before populating DOM HTML 在填充 DOM 之前对值进行编码

Helmet configuration on server服务器上的头盔配置

app.use(
    helmet({
        contentSecurityPolicy: {
            directives: {
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                connectSrc: ["'self'", "https://some-domain.com", "https://some.other.domain.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:", "https://another-domain.com"],
                frameSrc: ["'self'", "https://www.google.com"]
            }
        },
    })
);

In answer to my questions in the OP:回答我在 OP 中的问题：

Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?我应该将值作为 HTML 编码实体添加到 MongoDB 数据库中还是“按原样”存储它们并在用它们填充 DOM 之前对它们进行编码？

As long as the input is sanitised and validated on both client and server, you should only need to HTML encode just before populating the DOM.只要输入在客户端和服务器上都经过清理和验证，您应该只需要在填充 DOM 之前进行 HTML 编码。

If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example <h1>hello there!</h1> <a href="">link</a> wouldn't return any results because the value in the database was <h1>hello there!</h1> <a href="">link</a>如果将值保存为 MongoDB 中的 html 实体，这是否会使在数据库中搜索内容变得困难，因为搜索例如<h1>hello there!</h1> <a href="">link</a>不会返回任何结果，因为数据库中的值是<h1>hello there!</h1> <a href="">link</a> <h1>hello there!</h1> <a href="">link</a>

I figured it would make database entries look messy if they were filled with HTML encoded values, so I store the sanitized, validated entries "as is".我认为如果数据库条目充满 HTML 编码的值，它会使数据库条目看起来很混乱，所以我“按原样”存储经过消毒、验证的条目。

In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.在我关于保护 Web 表单的阅读中，很多人说客户端实践是相当多余的，因为 DOM 中的任何内容都可以更改，可以禁用 JavaScript，并且可以使用 curl 或 postman 直接向 API 端点发出请求，因此可以绕过任何客户端方法。

With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?话虽如此，是否应该执行清理（DOMPurify）、验证（validator.js）和编码（he）：1）仅客户端 2）客户端和服务器端或 3）仅服务器端？

Option 2 , sanitize and validate input on client and server.选项2 ，清理和验证客户端和服务器上的输入。

在保护评论表单和相关 API 端点时，是否应该在浏览器、服务器或两者中对输入进行清理、验证和编码？

问题描述

2 个解决方案

解决方案1
0 2020-09-10 05:55:53

解决方案2
0 已采纳 2020-09-24 14:14:33

在保护评论表单和相关 API 端点时，是否应该在浏览器、服务器或两者中对输入进行清理、验证和编码？

问题描述

2 个解决方案

解决方案1 0 2020-09-10 05:55:53

解决方案2 0 已采纳 2020-09-24 14:14:33

解决方案1
0 2020-09-10 05:55:53

解决方案2
0 已采纳 2020-09-24 14:14:33