简体   繁体   English

正则表达式 - 排除字符串的负前瞻

[英]regex - negative lookahead to exclude strings

I am trying to find (and replace with something else) in a text all parts which我试图在文本中找到(并用其他东西替换)所有部分

  1. start with '/'从...开始 '/'
  2. ends with '/'以。。结束 '/'
  3. between the two /'s there can be anything, except the strings '.'在两个 / 之间可以有任何东西,除了字符串 '.' and '..'.和 '..'。

(For your info, I am searching for and replacing directory and file names, hence the '.' and '..' should be excluded.) (为了您的信息,我正在搜索和替换目录和文件名,因此应该排除“.”和“..”。)

This is the regular expression I came up with:这是我想出的正则表达式:

/(?!\.|\.\.)([^/]+)/

The second part第二部分

([^/]+)

matches every sequence of characters, '/' excluded.匹配每个字符序列,不包括“/”。 There are no character restrictions required, I am simply interpreting the input.不需要字符限制,我只是解释输入。

The first part第一部分

(?!\.|\.\.)

uses the negative lookahead assertion to exclude the strings '.'使用否定的前瞻断言来排除字符串 '.' and '..'.和 '..'。

However, this doesn't seem to work in PHP with mb_ereg_replace().但是,这在带有 mb_ereg_replace() 的 PHP 中似乎不起作用。

Can somebody help me out?有人可以帮帮我吗? I fail to see what's wrong with my regex.我看不出我的正则表达式有什么问题。

Thank you.谢谢你。

POSIX regex probably don't have support for negative lookaheads. POSIX 正则表达式可能不支持负前瞻。 (I may be wrong though) (虽然我可能错了)

Anyway since PCRE regex are usually faster than POSIX I think you can use PCRE version of the same function since PCRE supports utf8 as well using u flag.无论如何,由于 PCRE 正则表达式通常比 POSIX 更快,我认为您可以使用相同 function 的 PCRE 版本,因为 PCRE 也支持 utf8 以及使用u标志。

Consider this code as a substitute:考虑将此代码作为替代:

preg_replace('~/(?!\.|\.\.)([^/]+)/~u', "", $str);

EDIT: Even better is to use:编辑:更好的是使用:

preg_replace('~/(?!\.)([^/]+)/~u', "", $str);

This is a little verbose, but it definitely does work:这有点冗长,但它确实有效:

#/((\.[^./][^/]*)|(\.\.[^/]+)|([^.][^/]*))/#
^  |------------| |---------| |---------|
|        |             |               |
|        |        text starting with   |
|        |        two dots, that isn't |
|        |             "." or ".."     |
|  text starting with                  |
|  a dot, that isn't                text not starting
|  "." or ".."                         with a dot
|
delimiter

Does not match:不匹配:

  • hi
  • //
  • /./
  • /../

Does match:是否匹配:

  • /hi/
  • /.hi/
  • /..hi/
  • /... / /... /

Have a play around with it on http://regexpal.com/ .http://regexpal.com/上玩一下。

I wasn't sure whether or not you wanted to allow // .我不确定您是否要允许// If you do, stick * before the last / .如果这样做,请在最后一个/之前粘贴*

I'm not against regex, but I would have done this instead:我不反对正则表达式,但我会这样做:

function simplify_path($path, $directory_separator = "/", $equivalent = true){
  $path = trim($path);
  // if it's absolute, it stays absolute:
  $prepend = (substr($path,0,1) == $directory_separator)?$directory_separator:"";
  $path_array = explode($directory_separator, $path);
  if($prepend) array_shift($path_array);
  $output = array();
  foreach($path_array as $val){
    if($val != '..' || ((empty($output) || $last == '..') && $equivalent)) {
      if($val != '' && $val != '.'){
        array_push($output, $val);
        $last = $val;
      }
    } elseif(!empty($output)) {
        array_pop($output);
    }
  }
  return $prepend.implode($directory_separator,$output);
}

Tests:测试:

echo(simplify_path("../../../one/no/no/../../two/no/../three"));
// =>  ../../../one/two/three
echo(simplify_path("/../../one/no/no/../../two/no/../three"));
// =>  /../../one/two/three
echo(simplify_path("/one/no/no/../../two/no/../three"));
// =>  /one/two/three
echo(simplify_path(".././../../one/././no/./no/../../two/no/../three"));
// =>  ../../../one/two/three
echo(simplify_path(".././..///../one/.///./no/./no/../../two/no/../three/"));
// =>  ../../../one/two/three

I thought that it would be better to return an equivalent string, so I respected the ocurrences of .. at the begining of the string.我认为返回一个等效的字符串会更好,所以我尊重..在字符串开头的出现。

If you dont want them, you can call it with the third parameter $equivalent = false:如果你不想要它们,你可以用第三个参数 $equivalent = false 来调用它:

echo(simplify_path("../../../one/no/no/../../two/no/../three", "/", false));
// =>  one/two/three
echo(simplify_path("/../../one/no/no/../../two/no/../three", "/", false));
// =>  /one/two/three
echo(simplify_path("/one/no/no/../../two/no/../three", "/", false));
// =>  /one/two/three
echo(simplify_path(".././../../one/././no/./no/../../two/no/../three", "/", false));
// =>  one/two/three
echo(simplify_path(".././..///../one/.///./no/./no/../../two/no/../three/", "/", false));
// =>  one/two/three

/(?.(\.|\.\.)/)([^/]+)/ This will allow ... as a valid name. /(?.(\.|\.\.)/)([^/]+)/这将允许...作为有效名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM