简体   繁体   English

如何在网址中对字符串进行正则表达式

[英]How to regex the strings in an url

http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6

I have tried to get the value: bOhxBeD, SyhyTGi and so on. 我尝试获取该值:bOhxBeD,SyhyTGi等。 This is what I come up with ( yes fairly simple ) /([a-zA-Z0-9]{7})/ , it seems to work with PCRE: 这是我想出的(是相当简单的) /([a-zA-Z0-9]{7})/ ,它似乎可以与PCRE一起使用:

([a-zA-Z0-9]{7})

正则表达式可视化

Debuggex Demo Debuggex演示

But when it comes to Ruby, I use it like this : 但是当涉及到Ruby时,我是这样使用的:

str.match(/([a-zA-Z0-9]{7})/)
#<MatchData "bOhxBeD" 1:"bOhxBeD">

it doesn't seem to work. 它似乎不起作用。 Can anyone point out what's wrong with this regex ? 谁能指出这个正则表达式有什么问题吗? Thanks 谢谢

You need to add word boundary \\b inorder to match an exact 7 alphanumeric characters. 您需要添加单词边界\\b才能匹配确切的7个字母数字字符。

\b[a-zA-Z0-9]{7}\b

DEMO 演示

irb(main):006:0> "http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6".scan(/\b([a-zA-Z0-9]{7})\b/)
=> [["bOhxBeD"], ["SyhyTGi"], ["TMDDSIB"], ["U72gx2J"], ["kQTIRy9"], ["7VXgGDw"], ["eSxIcK6"], ["S5oNlnn"], ["WBHHsLk"], ["BdMGd2d"], ["U9kNlsF"], ["cHVyc7Y"], ["D83kaJ5"], ["cLWgdSO"], ["iWtCIF3"], ["ount8L6"]]
 (?!.*?\/)[a-zA-Z0-9]{7}

应该是这个。否则它将从链接中也选择7个字母词。“ somethi”将在ans中。但是我想这不是必需的。

match only picks up the first match. match只会拿到第一场比赛。
You can try the global version of match which is scan . 您可以尝试的全球版本matchscan
You can use scan to search string not containing specific characters using [^...] : 您可以使用[^...]使用scan搜索不包含特定字符的字符串:

str.scan(/[^\/\.\,]+/)[3..-1]   
#=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw", "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y", "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]  

Update: 更新:
If you know that the strings between the comma are always 7 characters, you can use this instead: 如果您知道逗号之间的字符串始终为7个字符,则可以改用以下字符:

   str.scan(/[^\/\.\,]{7}/)[1..-1]

发生这种情况是因为您的正则表达式仅匹配一个包含7个字符的元素,仅此而已,因为简单的解决方案可能是:

str.match(/\/(.*)\z/)[1].split(',')

You could use String#[] and String#split : 您可以使用String#[]String#split

str[/.*\/(.*)/,1].split(',')
  #=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw",
  #    "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y",
  #    "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]

.*\\/ in the regex, "greedy" as it is, will consume characters up to and including the last forward slash in the string. 正则表达式中的.*\\/ “贪婪”,它将消耗字符串中最后一个正斜杠(包括该斜杠)之前的字符。 Capture group #1 (.*) sucks up the remainder of the string and, due to the presence of ,1 , returns it. 捕获组#1 (.*)吸收字符串的其余部分,由于存在,1 ,将其返回。 split(',') then breaks up the string to give you the desired array. 然后split(',')分解字符串以提供所需的数组。

Another way: 其他方式:

str[str[/.*\//].size..-1].split(',')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM