检查字符串是否包含垃圾邮件

Question

Am trying to implement a little bit of code found on stack overflow which covers a spam words filter.我正在尝试实现在堆栈溢出中发现的一些代码，其中涵盖了垃圾邮件过滤器。 When i just type a spam word the function works however when I type in a bunch of text before the spam word it passes.当我只输入一个垃圾邮件词时，该功能会起作用，但是当我在它通过的垃圾邮件词之前输入一堆文本时。 I've checked the source and I must be missing something, can anyone help?我已经检查了来源，我一定遗漏了一些东西，有人可以帮忙吗？

code is:代码是：

function strpos_arr($haystack, $needle) {
    if(!is_array($needle)) $needle = array($needle);
    foreach($needle as $what) {
    if(($pos = strpos($haystack, $what))!==false) return $pos;
}
return false;
}

function I'm calling it like is:我称之为的功能是：

if(strpos_arr($text, $bad_words)) {
        return false;
    } else {
        return true;
    }

the array is just a simple array with a lot of bad words like so:该数组只是一个简单的数组，其中包含很多像这样的坏词：

$bad_words = array(
        'bad word 1',
        'bad word 2');

link to original article: Using an array as needles in strpos原始文章链接：在 strpos 中使用数组作为针

Thanks谢谢

Answer 1

Firstly, it looks like you have your logic the wrong way round. 首先，您似乎错误地选择了自己的逻辑。 I think: 我认为：

if(strpos_arr($text, $bad_words)) {
    return false;
} else {
    return true;
}

should be: 应该：

if (strpos_arr($text, $bad_words)) {
    return TRUE;
} else {
    return FALSE;
}

Then, you're returning $pos if a bad word is found. 然后，如果发现错误单词，则返回$pos 。 If $pos happens to be zero, it's going to fail the next check. 如果$pos恰好为零，则它将使下一个检查失败。 Unless you need to know the position of the bad word in the text, I would change it to: 除非您需要知道坏词在文本中的位置，否则我将其更改为：

if (($pos = strpos($haystack, $what)) !== FALSE) return TRUE;

Answer 2

The function strpos_arr returns the position of the first "needle" found in the string: 函数strpos_arr返回在字符串中找到的第一个“ needle”的位置：

if(($pos = strpos($haystack, $what))!==false) return $pos;

or false if there aren't any "needles" in the text. 或false如果文本中没有任何“针”）。

This means that strpos_arr($text, $bad_words) returns false if there is any bad word in the text. 这意味着， strpos_arr($text, $bad_words)有任何不良词，则strpos_arr($text, $bad_words)返回false 。 Otherwise it returns an integer with the position of the first bad word in the string. 否则，它将返回一个整数，其中包含字符串中第一个错误单词的位置。

Notice that when the text starts with a bad word, it will return a 0 , that is equivalent to false . 请注意，当文本以错误词开头时，它将返回0 ，等于false 。 That's why when you "just type a spam word the function works however when I type in a bunch of text before the spam word it passes". 这就是为什么当您“只键入垃圾邮件词，但是当我在垃圾邮件词通过之前输入一堆文本”时，该函数仍然起作用的原因。

You could implement a function to find bad words like this: 您可以实现一个函数来查找像这样的坏词：

function has_bad_word($text, array $bad_words) {
    return strpos_arr($text, $bad_words) === false;
}

Notice though that strpos_arr is case sensitive and will return true when any string from the needle is a substring in the haystack, even when it's part of a larger word. 请注意，尽管strpos_arr区分大小写，并且当针中的任何字符串是干草堆中的子字符串时，即使它是较大单词的一部分，也会返回true 。 This function solves both issues: 此功能解决了两个问题：

function has_bad_word($text, array $bad_words) {
    $pregQuotedBadWords = array_map('preg_quote', $bad_words, array('/'));
    $badWordsRegex = '/((\s+|^)'
                     . join('(\s+|$))|((\s+|^)', $pregQuotedBadWords)
                     . '(\s+|$))/is';
    return preg_match($badWordsRegex, $text) > 0;
}

Answer 3

I've implemented something similar using the an highlight library for jQuery.我已经使用 jQuery 的高亮库实现了类似的功能。 Basically, I provide a list of 700+ spam words and the library highlights each word that match the regex.基本上，我提供了一个包含 700 多个垃圾邮件单词的列表，库会突出显示与正则表达式匹配的每个单词。 Have a look at the source code ( here ) to see how it's implemented:查看源代码（这里）以了解它是如何实现的：

Here's a snippet:这是一个片段：

$(function () {
   $("#spam-checker--textarea").highlightWithinTextarea({
      highlight: [
        { highlight: /\baccess\b/gi, keyword: "Access", category: "urgency" },
        { highlight: /\baccess now\b/gi, keyword: "Access now", category: "urgency" },
        { highlight: /\bact\b/gi, keyword: "Act", category: "urgency" },
        { highlight: /\bact immediately\b/gi, keyword: "Act immediately", category: "urgency" },
        { highlight: /\bact now\b/gi, keyword: "Act now", category: "urgency" },
        { highlight: /\bact now!\b/gi, keyword: "Act now!", category: "urgency" },
        { highlight: /\baction\b/gi, keyword: "Action", category: "urgency" },
        { highlight: /\baction required\b/gi, keyword: "Action required", category: "urgency" },
        { highlight: /\bapply here\b/gi, keyword: "Apply here", category: "urgency" },
        { highlight: /\bapply now\b/gi, keyword: "Apply now", category: "urgency" },
        { highlight: /\bapply now!\b/gi, keyword: "Apply now!", category: "urgency" },
        { highlight: /\bapply online\b/gi, keyword: "Apply online", category: "urgency" },
        // ...
      ]
   })
})

检查字符串是否包含垃圾邮件

问题描述

3 个解决方案

解决方案1
2 已采纳 2014-10-14 12:01:28

解决方案2
2 2014-10-14 13:45:23

解决方案3
0 2021-10-26 09:34:53

检查字符串是否包含垃圾邮件

问题描述

3 个解决方案

解决方案1 2 已采纳 2014-10-14 12:01:28

解决方案2 2 2014-10-14 13:45:23

解决方案3 0 2021-10-26 09:34:53

解决方案1
2 已采纳 2014-10-14 12:01:28

解决方案2
2 2014-10-14 13:45:23

解决方案3
0 2021-10-26 09:34:53