简体   繁体   中英

Is there anything faster than regex for matching a whole word?

EDIT :

My original question was asking whether anything could ever be faster than regex for matching a whole word. I have added my code, and have run several tests. The details are below

My sample matching string (from The Old Man And The Sea )

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week

Here's my regex

"(\b(cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop)(s?)\b)"

Here's my first matching attempt without regex

public static words = "cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop";

public static bool MatchBySplitting(string sentence)
{
    string[] sentence_words = sentence.Split(',','.',' ',';','-');
    string[] match_words = words.Split('|'); 

    foreach(string w in sentence_words)
    {
        foreach(string m in match_words)
        {
            if(m == w)
                return true;
        }
    }
    return false;
}

Running 5000 iterations of each:

  • Regex Matching: 250-300 ms
  • MatchBySplitting: 250-350 ms, a comparable time to the regex.

However, if I shorten my matching string to just the first line, my results change

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish.

The regex stays about the same, but MatchBySplitting speeds up a lot:

  • Regex Matching: 220-260 ms
  • MatchBySplitting: 50-150 ms - Faster than regex.

If I then start messing with the classics, and insert a word that will match

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. Then on the eighty fifth day, he caught a tuna. the end

  • Regex Matching: 170-300 ms
  • MatchBySplitting: 100-200 ms - Faster than regex.

I think I've answered my own question here. My custom matching method seems to be equal to or faster than regex in most cases.

However, I haven't covered all word boundaries in my code (!?) so it may slow down a little if I add those in.

Try making a compiled regex, like this:

static readonly Regex CornRegex = new Regex("\b(corn)\b", RegexOptions.Compiled);

This will actually generate and compile a method that contains the assembly instructions needed to match that regex. It should be very fast, comparable to writing your own custom function that loops over the individual characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM