简体   繁体   中英

Efficient alternative to nested For Loop

I am doing profanity filter. I have 2 for loops nested as shown below. Is there a better way of avoiding nested for loop and improve time complexity.

boolean isProfane = false;
final String phraseInLowerCase = phrase.toLowerCase();
for (int start = 0; start < phraseInLowerCase.length(); start++) {
    if (isProfane) {
        break;
    }
    for (int offset = 1; offset < (phraseInLowerCase.length() - start + 1 ); offset++) {
        String subGeneratedCode = phraseInLowerCase.substring(start, start + offset);
        //BlacklistPhraseSet is a HashSet which contains all profane words
        if (blacklistPhraseSet.contains(subGeneratedCode)) {
            isProfane=true;
            break;
        }
    }
}

Consider Java 8 version of @Mad Physicist implementation:

        boolean isProfane = Stream.of(phrase.split("\\s+"))
            .map(String::toLowerCase)
            .anyMatch(w -> blacklistPhraseSet.contains(w));

or

        boolean isProfane = Stream.of(phrase
            .toLowerCase()
            .split("\\s+"))
            .anyMatch(w -> blacklistPhraseSet.contains(w));

If you want to check every possible combination of consecutive characters, then your algorithm is O(n^2) , assuming that you use a Set with O(1) lookup characteristics, like a HashSet . You would probably be able to reduce this by breaking the data and the blacklist into Trie structures and walking along each possibility that way.

A simpler approach might be to use a heuristic like "profanity always starts and ends at a word boundary". Then you can do

isProfane = false;
for(String word: phrase.toLowerCase().split("\\s+")) {
    if(blacklistPhraseSet.contains(word)) {
        isProfane = true;
        break;
    }
}

You won't improve a lot on time complexity, because those use iterations under the hood but you could split the phrase on spaces and iterate over the array of words from your phrase. Something like:

String[] arrayWords = phrase.toLowerCase().split(" ");
for(String word:arrayWords){
    if(blacklistPhraseSet.contains(word)){
        isProfane = true;
        break;
    }
}

The problem of this code is that unless your word contains compound words, it won't match those, whereas your code as I understand it will. The word "f**k" in the black list won't match "f**kwit" in my code, it will in yours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM