简体   繁体   English

从单词列表中替换字符串中的多个单词

[英]Replace multiple words in a string from a list of words

i have a list of words:我有一个单词列表:

string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.我有一些文本(通常很短,最多 250 个字),我需要删除其中的所有BAD_WORDS

i have tried this:我试过这个:

    foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

but, if the text starts or ends with a bad word, it will not be removed.但是,如果文本以坏词开头或结尾,则不会将其删除。 i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.我是用空格做的,所以它不会匹配部分单词,例如“oxxx”不应该被删除,因为它与坏词不完全匹配。

anyone can give me advise on this?任何人都可以给我建议吗?

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")

This is a great task for Linq, and also the Split method.这对 Linq 来说是一项伟大的任务,对于 Split 方法也是如此。 Try this:试试这个:

return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));

You could use StartWith and EndsWith methods like:您可以使用 StartWith 和 EndsWith 方法,例如:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.希望这能解决您的问题。

Put the fake space's before and after the string varaible input .在字符串变量input之前和之后放置假空格。 That way it will detect the first and last words.这样它就会检测第一个和最后一个词。

input = " " + input + " ";

 foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

Then trim the string:然后修剪字符串:

input = input.Trim();

You can store words from text to one list.您可以将文本中的单词存储到一个列表中。 Then just check all words if they are in bad list, something like this :然后只需检查所有单词是否在错误列表中,如下所示:

List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();

myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);

Just wanted to point out that you shoulde have done with just whiole inside your for like so:只是想指出你应该像这样在你的内部完成 whiole :

   foreach (var word in BAD_WORDS)
{
    while (input.Contains(String.Format(" {0} ", word);))
    {
        input = input.Replace(w, " ");
    }
}

No need for that if and 'w' variable, in any case i wouldehave used the answer above me that Antonio Bakula, first think that came to mind was this.不需要那个 if 和 'w' 变量,无论如何,我会使用上面那个安东尼奥·巴库拉的答案,首先想到的是这个。

According to the following post the fastest way is to use Regex and MatchEvaluator : Replacing multiple characters in a string, the fastest way?根据以下帖子,最快的方法是使用 Regex 和 MatchEvaluator : 替换字符串中的多个字符,最快的方法是什么?

        Regex reg = new Regex(@"(o2o|xxx)");
        MatchEvaluator eval = match =>
        {
            switch (match.Value)
            {
                case "o2o": return " ";
                case "xxx": return " ";
                default: throw new Exception("Unexpected match!");
            }
        };
        input = reg.Replace(input, eval);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM