简体   繁体   中英

Replace multiple words in a string from a list of words

i have a list of words:

string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.

i have tried this:

    foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.

anyone can give me advise on this?

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")

This is a great task for Linq, and also the Split method. Try this:

return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));

You could use StartWith and EndsWith methods like:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.

Put the fake space's before and after the string varaible input . That way it will detect the first and last words.

input = " " + input + " ";

 foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

Then trim the string:

input = input.Trim();

You can store words from text to one list. Then just check all words if they are in bad list, something like this :

List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();

myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);

Just wanted to point out that you shoulde have done with just whiole inside your for like so:

   foreach (var word in BAD_WORDS)
{
    while (input.Contains(String.Format(" {0} ", word);))
    {
        input = input.Replace(w, " ");
    }
}

No need for that if and 'w' variable, in any case i wouldehave used the answer above me that Antonio Bakula, first think that came to mind was this.

According to the following post the fastest way is to use Regex and MatchEvaluator : Replacing multiple characters in a string, the fastest way?

        Regex reg = new Regex(@"(o2o|xxx)");
        MatchEvaluator eval = match =>
        {
            switch (match.Value)
            {
                case "o2o": return " ";
                case "xxx": return " ";
                default: throw new Exception("Unexpected match!");
            }
        };
        input = reg.Replace(input, eval);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM