[英]Check if string contains spam words
Am trying to implement a little bit of code found on stack overflow which covers a spam words filter.我正在尝试实现在堆栈溢出中发现的一些代码,其中涵盖了垃圾邮件过滤器。 When i just type a spam word the function works however when I type in a bunch of text before the spam word it passes.
当我只输入一个垃圾邮件词时,该功能会起作用,但是当我在它通过的垃圾邮件词之前输入一堆文本时。 I've checked the source and I must be missing something, can anyone help?
我已经检查了来源,我一定遗漏了一些东西,有人可以帮忙吗?
code is:代码是:
function strpos_arr($haystack, $needle) {
if(!is_array($needle)) $needle = array($needle);
foreach($needle as $what) {
if(($pos = strpos($haystack, $what))!==false) return $pos;
}
return false;
}
function I'm calling it like is:我称之为的功能是:
if(strpos_arr($text, $bad_words)) {
return false;
} else {
return true;
}
the array is just a simple array with a lot of bad words like so:该数组只是一个简单的数组,其中包含很多像这样的坏词:
$bad_words = array(
'bad word 1',
'bad word 2');
link to original article: Using an array as needles in strpos原始文章链接: 在 strpos 中使用数组作为针
Thanks谢谢
Firstly, it looks like you have your logic the wrong way round. 首先,您似乎错误地选择了自己的逻辑。 I think:
我认为:
if(strpos_arr($text, $bad_words)) {
return false;
} else {
return true;
}
should be: 应该:
if (strpos_arr($text, $bad_words)) {
return TRUE;
} else {
return FALSE;
}
Then, you're returning $pos
if a bad word is found. 然后,如果发现错误单词,则返回
$pos
。 If $pos
happens to be zero, it's going to fail the next check. 如果
$pos
恰好为零,则它将使下一个检查失败。 Unless you need to know the position of the bad word in the text, I would change it to: 除非您需要知道坏词在文本中的位置,否则我将其更改为:
if (($pos = strpos($haystack, $what)) !== FALSE) return TRUE;
The function strpos_arr
returns the position of the first "needle" found in the string: 函数
strpos_arr
返回在字符串中找到的第一个“ needle”的位置:
if(($pos = strpos($haystack, $what))!==false) return $pos;
or false
if there aren't any "needles" in the text. 或
false
如果文本中没有任何“针”)。
This means that strpos_arr($text, $bad_words)
returns false
if there is any bad word in the text. 这意味着,
strpos_arr($text, $bad_words)
有任何不良词,则strpos_arr($text, $bad_words)
返回false
。 Otherwise it returns an integer with the position of the first bad word in the string. 否则,它将返回一个整数,其中包含字符串中第一个错误单词的位置。
Notice that when the text starts with a bad word, it will return a 0
, that is equivalent to false
. 请注意,当文本以错误词开头时,它将返回
0
,等于false
。 That's why when you "just type a spam word the function works however when I type in a bunch of text before the spam word it passes". 这就是为什么当您“只键入垃圾邮件词,但是当我在垃圾邮件词通过之前输入一堆文本”时,该函数仍然起作用的原因。
You could implement a function to find bad words like this: 您可以实现一个函数来查找像这样的坏词:
function has_bad_word($text, array $bad_words) {
return strpos_arr($text, $bad_words) === false;
}
Notice though that strpos_arr
is case sensitive and will return true
when any string from the needle is a substring in the haystack, even when it's part of a larger word. 请注意,尽管
strpos_arr
区分大小写,并且当针中的任何字符串是干草堆中的子字符串时,即使它是较大单词的一部分,也会返回true
。 This function solves both issues: 此功能解决了两个问题:
function has_bad_word($text, array $bad_words) {
$pregQuotedBadWords = array_map('preg_quote', $bad_words, array('/'));
$badWordsRegex = '/((\s+|^)'
. join('(\s+|$))|((\s+|^)', $pregQuotedBadWords)
. '(\s+|$))/is';
return preg_match($badWordsRegex, $text) > 0;
}
I've implemented something similar using the an highlight library for jQuery.我已经使用 jQuery 的高亮库实现了类似的功能。 Basically, I provide a list of 700+ spam words and the library highlights each word that match the regex.
基本上,我提供了一个包含 700 多个垃圾邮件单词的列表,库会突出显示与正则表达式匹配的每个单词。 Have a look at the source code ( here ) to see how it's implemented:
查看源代码(这里)以了解它是如何实现的:
Here's a snippet:这是一个片段:
$(function () {
$("#spam-checker--textarea").highlightWithinTextarea({
highlight: [
{ highlight: /\baccess\b/gi, keyword: "Access", category: "urgency" },
{ highlight: /\baccess now\b/gi, keyword: "Access now", category: "urgency" },
{ highlight: /\bact\b/gi, keyword: "Act", category: "urgency" },
{ highlight: /\bact immediately\b/gi, keyword: "Act immediately", category: "urgency" },
{ highlight: /\bact now\b/gi, keyword: "Act now", category: "urgency" },
{ highlight: /\bact now!\b/gi, keyword: "Act now!", category: "urgency" },
{ highlight: /\baction\b/gi, keyword: "Action", category: "urgency" },
{ highlight: /\baction required\b/gi, keyword: "Action required", category: "urgency" },
{ highlight: /\bapply here\b/gi, keyword: "Apply here", category: "urgency" },
{ highlight: /\bapply now\b/gi, keyword: "Apply now", category: "urgency" },
{ highlight: /\bapply now!\b/gi, keyword: "Apply now!", category: "urgency" },
{ highlight: /\bapply online\b/gi, keyword: "Apply online", category: "urgency" },
// ...
]
})
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.