简体   繁体   English

如何在HTML标记之间搜索文本

[英]How to search the text between the HTML tags

I'm using mongoJS to process my database query. 我正在使用mongoJS来处理我的数据库查询。 I came across an issue which the string contains HTML tags, I'm using regex expressions to search my string in the collection. 我遇到了一个问题,字符串包含HTML标签,我正在使用正则表达式来搜索集合中的字符串。 How to search the text by ignore the HTML tags? 如何通过忽略HTML标记来搜索文本?

var userInput = $scope.userInput; // value from user input
db.collections.find({'obj': {$regex: new RegExp(userInput) } }).toArray(function(err, result){ 
  return res.json(result); 
}

Collections 集合

[{_id:"34aw34d343s4", obj:"How are you?"},
{_id:"34asdfwer343s4", obj:"Are you okay?"},
{_id:"3sDaweqr43s4", obj:"Goodbye, my friend!"},
{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]

User Input 用户输入

these are
these
these are important

Output 产量

[{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]
[{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]
[]

Expected 预期

[{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]
[{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]
[{_id:"34aw3sdfgds3s4", obj:"Do you know these are <strong>important</strong> items"}]

You should sanitize the user input before it goes into the database. 您应该在用户输入进入数据库之前清理它。 From my understanding of your system, there is a great probability that user input (prior to being inserted in the database)is not sanitized and your site is vulnerable to an XSS attack . 根据我对您的系统的理解,用户输入(在插入数据库之前)很可能不会被清理,并且您的站点很容易受到XSS攻击

I recommend you to use a library like sanitize-html to secure your site against cross-site scripting and as well as an answer to this question. 我建议您使用像sanitize-html这样的库来保护您的网站免受跨站点脚本攻击以及此问题的答案。

You could use the RegExp test method: /these|are/.test(stringToCheckAgainst); 你可以使用RegExp 测试方法:/ /these|are/.test(stringToCheckAgainst);

 var testCases = ["these are", "these", "these are <strong>item</strong>"]; testCases.forEach(function(value) { document.write(/these|are/.test(value) + "\\n"); }); 

If you want to remove the html tag then the following method 如果要删除html标记,请使用以下方法

  1. jQuery(html).text(); jQuery的(HTML)的.text();
  2. yourStr.replace(/<(?:.|\\n)*?>/gm, ''); yourStr.replace(/ <(?:。| \\ n)*?> / gm,'');
  3. yourStr.replace(/<[^>]+>/g, ''); yourStr.replace(/ <[^>] +> / g,'');

more on Strip HTML from Text JavaScript 更多关于来自Text JavaScript的Strip HTML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM