简体   繁体   中英

Should I use regex, or just a series of if statements?

I need to validate a string against these rules:

  1. Value is not s
  2. Value is at least 1 character long
  3. Value contains only a-z0-9-_/
  4. Value does not begin with /
  5. Value does not end with /
  6. Value does not contain /s/
  7. Value does not contain //
  8. Value does not begin with s/
  9. Value does not end with /s

(More simply, I am looking for something that resembles a UNIX-style path, with a slash separator, where file/folder names allow only a-z0-9-_ , no file/folder is named s , and it does not having a beginning or trailing slash.)

I need to do this on the client-side via JavaScript and on the server-side using PHP.

I know that the most elegant solution would be via a complex regular expression. But is it worth the challenge of trying to write one? Or should I just use conditions?

Right now, my solution is this: http://jsfiddle.net/cKfnW/

JavaScript:

(function ($) {
    var test = function (val) {
        return
            val != 's' &&
            /^[a-z0-9-_\/]+$/.test(val) &&
            val.substr(0, 1) != '/' &&
            val.substr(val.length-1) != '/' &&
            val.search('/s/') == -1 &&
            val.search('//') == -1 &&
            val.substr(0, 2) != 's/' &&
            val.substr(val.length-2) != '/s';
    };
    $('#test')
        .keyup(function () {
            if (test($(this).val())) {
                $(this).removeClass('fail').addClass('pass');
            }
            else {
                $(this).removeClass('pass').addClass('fail');
            }
        )
        .keyup();
})(jQuery);

PHP:

<?php
function test ($val) {
    return
        $val != 's' &&
        preg_match('/^[a-z0-9-_\/]+$/', $val) &&
        substr($val, 0, 1) != '/' &&
        substr($val, -1) != '/' &&
        strpos($val, '/s/') === false &&
        strpos($val, '//') === false &&
        substr($val, 0, 2) != 's/' &&
        substr($val, -2) != '/s';
}

die (test($_GET['test']) ? 'pass' : 'fail');
?>

Is this acceptable practice? I'm not very good at regex, and I have no idea how to write one for this -- but I can't help feeling like this is more of a hack than a solution.

What do you think?

Thanks.

Even with your checks, you surely should get rid of nesting IFs by merging them all into one if. Here's simpler variant with 2 regexps (first restricts your edge cases, second makes checks for allowed chars):

if (
    $val != 's' 
    && !preg_match('!(^/|/s|s/|//|/$)!', $val) 
    && preg_match('!^[a-z0-9-_/]+$!', $val)
) {
  // ...
}

UPD: Oh, you've removed nested IFs while I was typing answer :) Good, good!

Clearly use a regex for this:

if (preg_match('~^(?!s?/|s$)(?>[a-z0-9_-]++|/(?!s?/|s?$))++$~', $val)) {
    // do that
}

pattern details:

~                 # pattern delimiter
^                 # start of the string
(?!s?/|s$)        # negative lookahead (not followed by "s$", "/", "s/")
(?>               # open an atomic group (can be replaced by "(?:")
    [a-z0-9_-]++  # allowed characters except "/", one or more times
  |               # OR
    /(?!s?/|s?$)  # "/" not followed by "s/" or "/" or "$" or "s$" 
)++               # close the group and repeat one or more times
$                 # end of the string
~                 # pattern delimiter

what is the advantage of a single regex here against multiple small regexes?

You walk your test string only one time, and the pattern fails at the first bad character.

For futur debugging, you can use the verbose mode and nowdoc to make it more clear, example:

$pattern = <<<'LOD'
~
^                 
(?!s?/|s$)        # not followed by "s$", "/", "s/"

(?>  [a-z0-9_-]++ | / (?!s?/|s?$)  )++

$                 
~x
LOD;                 

For the client side, you can use this pattern in javascript:

/^(?!s?\/|s$)(?:[a-z0-9_-]|\/(?!s?\/|s?$))+$/

Notice: When you want to put a literal - inside a character class, you must always write it at the begining or at the end of the class, since it is a special character that is used to define a character range.

A single regex solution for multiple AND 'ed requirements

Here is a commented php regex that meets your requirements: (always write non-trivial regexes this way)

$re = '% # Validate *nix-like path w/multiple specs.
    ^          # Anchor to start of string.
    (?!s$)     # Value is not s
    (?=.)      # Value is at least 1 character long
    (?!/)      # Value does not begin with /
    (?!.*/$)   # Value does not end with /
    (?!.*/s/)  # Value does not contain /s/
    (?!.*//)   # Value does not contain //
    (?!s/)     # Value does not begin with s/
    (?!.*/s$)  # Value does not end with /s
    [\w\-/]+   # Value contains only a-z0-9-_/
    $          # Anchor to end of string.
    %ix';

Here is the equivalent JavaScript version:

var re = /^(?!s$)(?=.)(?!\/)(?!.*\/$)(?!.*\/s\/)(?!.*\/\/)(?!s\/)(?!.*\/s$)[\w\-\/]+$/i;

This solution assumes that your requirements are not case sensitive. If this is not the case, then remove the i ignorecase modifiers (and change the [\\w\\-/]+ expression to [a-z0-9_\\-/]+ ).

For descriptive clarity, I have written the commented version with one assertion per line for each of your requirements. Together with the ^ anchor at the start, each of the lookahead assertions work in a logical AND manner. Note that the (?=.) assertion (which ensures that one character exists) is redundant and unnecessary since the last expression: [\\w\\-/]+ also ensures that the length is at least one. Note that both the ^ and $ anchors are required for this to work.

This solution demonstrates how multiple requirements can be achieved in a single, easy to read and maintain regex. However, for other reasons you may wish to split this up into separate checks - eg so that your code can generate seperate meaningful error messages for each of the requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM