简体   繁体   中英

'Regular Expression' VS 'String Comparison operators / functions'

This question is designed around the performance within PHP but you may broaden it to any language if you wish to.

After many years of using PHP and having to compare strings I've learned that using string comparison operators over regular expressions is beneficial when it comes to performance.

I fully understand that some operations have to be done with Regular Expressions down to there complexity but for operations that can be resolved via regex AND string functions.

take this example:

PHP

preg_match('/^[a-z]*$/','thisisallalpha');

C#

new Regex("^[a-z]*$").IsMatch('thisisallalpha');

can easily be done with

PHP

ctype_alpha('thisisallalpha');

C#

VFPToolkit.Strings.IsAlpha('thisisallalpha');

There are many other examples but you should get the point I'm trying to make.

What version of string comparison should you try and lean towards and why?

Looks like this question arose from our small argument here , so i feel myself somehow obliged to respond.

php developers are being actively brainwashed about "performance", whereat many rumors and myths arise, including sheer stupid things like "double quotes are slower". Regexps being "slow" is one of these myths, unfortunately supported by the manual (see infamous comment on the preg_match page). The truth is that in most cases you don't care. Unless your code is repeated 10,000 times, you don't even notice a difference between string function and a regular expression. And if your code does repeat 10,000 times, you must be doing something wrong in any case, and you will gain performance by optimizing your logic, not by stripping down regular expressions.

As for readability, regexps are admittedly hard to read, however, the code that uses them is in most cases shorter, cleaner and simpler (compare yours and mine answers on the above link).

Another important concern is flexibility, especially in php, whose string library doesn't support unicode out of the box. In your concrete example, what happens when you decide to migrate your site to utf8? With ctype_alpha you're kinda out of luck, preg_match would require another pattern, but will keep working.

So, regexes are not slower, more readable and more flexible. Why on earth should we avoid them?

Regular expressions actually lead to a performance gain (not that such microoptimizations are in any way sensible) when they can replace multiple atomic string comparisons. So typically around five strpos() checks it gets advisable to use a regular expression instead. Moreso for readability.

And here's another thought to round things up: PCRE can handle conditionals faster than the Zend kernel can handle IF bytecode.

Not all regular expressions are designed equal, though. If the complexetiy gets too high, regex recursion can kill its performance advantage. Therefore it's often reconsiderworthy to mix regex matching and regular PHP string functions. Right tool for the job and all.

PHP itself recommends using string functions over regex functions when the match is straightforward. For example, from the preg_match manual page:

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.

Or from the str_replace manual page:

If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of ereg_replace() or preg_replace().

However, I find that people try to use the string functions to solve problems that would be better solved by regex. For instance, when trying to create a full-word string matcher, I have encountered people trying to use strpos($string, " $word ") (note the spaces), for the sake of "performance", without stopping to think about how spaces aren't the only way to delineate a word (think about how many string functions calls would be needed to fully replace preg_match('/\\bword\\b/', $string) ).

My personal stance is to use string functions for matching static strings (ie. a match of a distinct sequence of characters where the match is always the same) and regular expressions for everything else.

They're both part of the language for a reason. IsAlpha is more expressive. For example, when an expression you're looking at is inherently alpha or not, and that has domain meaning, then use it.

But if it is, say, an input validation, and could possibly be changed to include underscores, dashes, etc., or if it is with other logic that requires regex, then I would use regex. This tends to be the majority of the time for me.

Agreed that PHP people tend to over-emphasise performance of one function over another. That doesn't mean the performance differences don't exists -- they definitely do -- but most PHP code (and indeed most code in general) has much worse bottlenecks than the choice of regex over string-comparison. To find out where your bottlenecks are, use xdebug's profiler. Fix the issues it comes up with before worrying about fine-tuning individual lines of code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM