简体   繁体   English

正则表达式 - 去除第一个和第二个正斜杠之间的文本

[英]Regex - strip out text between first and second forward slashes

I've almost got this regex working but am having trouble with the leading forward slash - can anyone see where I'm going wrong with this?我几乎让这个正则表达式正常工作,但在使用前导斜杠时遇到问题 - 谁能看到我哪里出错了? I just want to extract the first string " projects " from this example我只想从这个例子中提取第一个字符串“ projects

  /projects/personal/29/56

see also here -> http://regexr.com?300av另见此处-> http://regexr.com?300av

The easiest way is to split string using forward slash最简单的方法是使用正斜杠拆分字符串

var firstString = url.split('/')[1];

and you will have first string, but if you want to extract using regext than this will do, just remember don't add global parameter in your regex.并且您将拥有第一个字符串,但是如果您想使用正则表达式提取而不是这样做,请记住不要在正则表达式中添加全局参数。

\/([a-zA-Z0-9]{0,})

I hope this helps我希望这有帮助

I played around with the answer from anubhava and got the following我玩弄了 anubhava 的答案并得到了以下内容

string                          expression                             returns
/projects/personal/29/56        ([a-zA-Z])([^/]*)\/                     projects/
/projects/personal/29/56        ([a-zA-Z])([^/]*)                       projects
/projects123/personal/29/56     ([a-zA-Z])*?([a-zA-Z][0-9])([^/]*)      projects123

The second line achieves what bsod99 was asking:第二行实现了 bsod99 的要求:

  • remove the first slash / and删除第一个斜杠/
  • extract the first string projects from /projects/personal/29/56/projects/personal/29/56提取第一个字符串projects

It seems you can get your test using split but for pure regex solution use:似乎您可以使用 split 进行测试,但对于纯正则表达式解决方案使用:

s = '/projects/personal/29/56';
arr = s.match(/^\/([^/]*)\//); // arr[1] becomes 'project'
document.writeln('<pre>Matched: [' + arr[1] + "]</pre>");

Adding for someone who comes looking for this kind of answer.为寻求此类答案的人添加。 You can try adding global flag to get other values also and in addition to the first part of url '/projects'.除了 url '/projects' 的第一部分之外,您还可以尝试添加全局标志以获取其他值。

/projects/personal/29/56

You just need to traverse the array present at the end [i]您只需要遍历末尾的数组[i]

/\/([a-zA-Z0-9]{0,})/g[i]

expression                       returns
i=0                             /projects
i=1                             /personal
i=2                             /29
i=3                             /56

I'm adding the answer here, only because I wanted to add it to [SO]: Python: return a string between // regex , and that question was marked as a dup to this one (!!! while I was editing !!!).我在这里添加答案,只是因为我想将其添加到[SO]: Python: return a string between // regex ,并且该问题被标记为对这个问题的重复(!!!在我编辑时! !!)。

script.py :脚本.py

#!/usr/bin/env python3

import re


def main():
    group_name = "between_slashes"
    words = [
                "en/lemon_peel/n/",
                "ca/llimona/n/",
                "/asd /",
                "/asd",
                "asdf/",
                "aa//vv",
            ]
    pat = re.compile("^[^/]*/(?P<{}>[^/]*)/.*$".format(group_name))
    for idx, word in enumerate(words):
        match = pat.match(word)
        if match is not None:
            print("{}: \"{}\" - \"{}\"".format(idx, word, match.group(group_name)))
        else:
            print("{}: \"{}\"".format(idx, word))


if __name__ == "__main__":
    main()

Notes :注意事项

  • The pattern seems complicated, but I'll do my best to explain it:模式看起来很复杂,但我会尽力解释它:
    1. The 1 st char ( ^ ) marks the beginning of the string字符串的第1字符(^)标记的开始
    2. The following [] matches a character class: the contents ( ^/ ) tells it to match any character but /以下[]匹配字符类:内容 ( ^/ ) 告诉它匹配任何字符,但/
    3. Next, the * tells that the previous group ( #2. ) can occur 0 or more times接下来, *表示前一组( #2. )可以出现0多次
    4. Then it follows the / character which is our 1 st (begin) guard然后它跟在/字符之后,它是我们的第一个(开始)守卫
    5. The parentheses () denote a group match - which can be later referenced as by its name ( between_slashes ).括号()表示组匹配 - 稍后可以通过其名称( between_slashes )引用。 For more details check [Python 3.Docs]: Regular Expression Syntax (search for (?P<name>...) )有关更多详细信息,请查看[Python 3.Docs]:正则表达式语法(搜索(?P<name>...)
    6. The contents between the parentheses (after > ) are what we are looking for (we already know what): 0 or more non / char s括号之间的内容(在>之后)是我们要找的(我们已经知道是什么): 0 或更多 non / char s
    7. The next / char is our 2 nd (end) guard下一个/ char是我们的第二个(结束)守卫
    8. Then, .* tells: any char , 0 or more times然后, .*告诉:任何字符,0 次或更多次
    9. Finally, $ marks the end of the string最后, $标记字符串的结尾
  • I took the liberty of adding more strings to be searched, besides the ones provided in the question, to illustrate some corner cases除了问题中提供的字符串之外,我冒昧地添加了更多要搜索的字符串,以说明一些极端情况
  • Runs with Python 3 and Python 2使用Python 3Python 2运行

Output :输出

 c:\\Work\\Dev\\StackOverflow\\q45985002>"c:\\Install\\x64\\Python\\Python\\3.5\\python.exe" script.py 0: "en/lemon_peel/n/" - "lemon_peel" 1: "ca/llimona/n/" - "llimona" 2: "/asd /" - "asd " 3: "/asd" 4: "asdf/" 5: "aa//vv" - ""

在 JS RegEx 中,您可以使用:

\B\/([a-zA-Z0-9-]{0,})\S

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM