EDIT :
My original question was asking whether anything could ever be faster than regex for matching a whole word. I have added my code, and have run several tests. The details are below
My sample matching string (from The Old Man And The Sea )
He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week
Here's my regex
"(\b(cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop)(s?)\b)"
Here's my first matching attempt without regex
public static words = "cod|tuna|mackerel|plaice|haddock|salmon|prawns|shrimp|fishcake|halibut|sole|eel|anchovy|anchovies|sardine|herring|bonito|whiting|seabass|carp|crab|flounder|pollock|mullet|ray|ray wings|clam|mussel|scallop";
public static bool MatchBySplitting(string sentence)
{
string[] sentence_words = sentence.Split(',','.',' ',';','-');
string[] match_words = words.Split('|');
foreach(string w in sentence_words)
{
foreach(string m in match_words)
{
if(m == w)
return true;
}
}
return false;
}
Running 5000 iterations of each:
However, if I shorten my matching string to just the first line, my results change
He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish.
The regex stays about the same, but MatchBySplitting
speeds up a lot:
If I then start messing with the classics, and insert a word that will match
He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. Then on the eighty fifth day, he caught a tuna. the end
I think I've answered my own question here. My custom matching method seems to be equal to or faster than regex in most cases.
However, I haven't covered all word boundaries in my code (!?) so it may slow down a little if I add those in.
Try making a compiled regex, like this:
static readonly Regex CornRegex = new Regex("\b(corn)\b", RegexOptions.Compiled);
This will actually generate and compile a method that contains the assembly instructions needed to match that regex. It should be very fast, comparable to writing your own custom function that loops over the individual characters.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.