简体   繁体   English

检查字符串是否包含垃圾邮件

[英]Check if string contains spam words

Am trying to implement a little bit of code found on stack overflow which covers a spam words filter.我正在尝试实现在堆栈溢出中发现的一些代码,其中涵盖了垃圾邮件过滤器。 When i just type a spam word the function works however when I type in a bunch of text before the spam word it passes.当我只输入一个垃圾邮件词时,该功能会起作用,但是当我在它通过的垃圾邮件词之前输入一堆文本时。 I've checked the source and I must be missing something, can anyone help?我已经检查了来源,我一定遗漏了一些东西,有人可以帮忙吗?

code is:代码是:

function strpos_arr($haystack, $needle) {
    if(!is_array($needle)) $needle = array($needle);
    foreach($needle as $what) {
    if(($pos = strpos($haystack, $what))!==false) return $pos;
}
return false;
}

function I'm calling it like is:我称之为的功能是:

if(strpos_arr($text, $bad_words)) {
        return false;
    } else {
        return true;
    }

the array is just a simple array with a lot of bad words like so:该数组只是一个简单的数组,其中包含很多像这样的坏词:

$bad_words = array(
        'bad word 1',
        'bad word 2');

link to original article: Using an array as needles in strpos原始文章链接: 在 strpos 中使用数组作为针

Thanks谢谢

Firstly, it looks like you have your logic the wrong way round. 首先,您似乎错误地选择了自己的逻辑。 I think: 我认为:

if(strpos_arr($text, $bad_words)) {
    return false;
} else {
    return true;
}

should be: 应该:

if (strpos_arr($text, $bad_words)) {
    return TRUE;
} else {
    return FALSE;
}

Then, you're returning $pos if a bad word is found. 然后,如果发现错误单词,则返回$pos If $pos happens to be zero, it's going to fail the next check. 如果$pos恰好为零,则它将使下一个检查失败。 Unless you need to know the position of the bad word in the text, I would change it to: 除非您需要知道坏词在文本中的位置,否则我将其更改为:

if (($pos = strpos($haystack, $what)) !== FALSE) return TRUE;

The function strpos_arr returns the position of the first "needle" found in the string: 函数strpos_arr返回在字符串中找到的第一个“ needle”的位置:

if(($pos = strpos($haystack, $what))!==false) return $pos;

or false if there aren't any "needles" in the text. false如果文本中没有任何“针”)。

This means that strpos_arr($text, $bad_words) returns false if there is any bad word in the text. 这意味着, strpos_arr($text, $bad_words)有任何不良词,则strpos_arr($text, $bad_words)返回false Otherwise it returns an integer with the position of the first bad word in the string. 否则,它将返回一个整数,其中包含字符串中第一个错误单词的位置。

Notice that when the text starts with a bad word, it will return a 0 , that is equivalent to false . 请注意,当文本以错误词开头时,它将返回0 ,等于false That's why when you "just type a spam word the function works however when I type in a bunch of text before the spam word it passes". 这就是为什么当您“只键入垃圾邮件词,但是当我在垃圾邮件词通过之前输入一堆文本”时,该函数仍然起作用的原因。

You could implement a function to find bad words like this: 您可以实现一个函数来查找像这样的坏词:

function has_bad_word($text, array $bad_words) {
    return strpos_arr($text, $bad_words) === false;
}

Notice though that strpos_arr is case sensitive and will return true when any string from the needle is a substring in the haystack, even when it's part of a larger word. 请注意,尽管strpos_arr区分大小写,并且当针中的任何字符串是干草堆中的子字符串时,即使它是较大单词的一部分,也会返回true This function solves both issues: 此功能解决了两个问题:

function has_bad_word($text, array $bad_words) {
    $pregQuotedBadWords = array_map('preg_quote', $bad_words, array('/'));
    $badWordsRegex = '/((\s+|^)'
                     . join('(\s+|$))|((\s+|^)', $pregQuotedBadWords)
                     . '(\s+|$))/is';
    return preg_match($badWordsRegex, $text) > 0;
}

I've implemented something similar using the an highlight library for jQuery.我已经使用 jQuery 的高亮库实现了类似的功能。 Basically, I provide a list of 700+ spam words and the library highlights each word that match the regex.基本上,我提供了一个包含 700 多个垃圾邮件单词的列表,库会突出显示与正则表达式匹配的每个单词。 Have a look at the source code ( here ) to see how it's implemented:查看源代码(这里)以了解它是如何实现的:

Here's a snippet:这是一个片段:

$(function () {
   $("#spam-checker--textarea").highlightWithinTextarea({
      highlight: [
        { highlight: /\baccess\b/gi, keyword: "Access", category: "urgency" },
        { highlight: /\baccess now\b/gi, keyword: "Access now", category: "urgency" },
        { highlight: /\bact\b/gi, keyword: "Act", category: "urgency" },
        { highlight: /\bact immediately\b/gi, keyword: "Act immediately", category: "urgency" },
        { highlight: /\bact now\b/gi, keyword: "Act now", category: "urgency" },
        { highlight: /\bact now!\b/gi, keyword: "Act now!", category: "urgency" },
        { highlight: /\baction\b/gi, keyword: "Action", category: "urgency" },
        { highlight: /\baction required\b/gi, keyword: "Action required", category: "urgency" },
        { highlight: /\bapply here\b/gi, keyword: "Apply here", category: "urgency" },
        { highlight: /\bapply now\b/gi, keyword: "Apply now", category: "urgency" },
        { highlight: /\bapply now!\b/gi, keyword: "Apply now!", category: "urgency" },
        { highlight: /\bapply online\b/gi, keyword: "Apply online", category: "urgency" },
        // ...
      ]
   })
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM