简体   繁体   English

Javascript Regex逗号分隔文本

[英]Javascript Regex comma separated text

I have this string: 我有这个字符串:

remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820

I want to match and extract strings separated with commas. 我想匹配并提取用逗号分隔的字符串。

The result should be: 结果应为:

MATCH 1 
'remote:City|Vestavia Hills,AL' 
MATCH 2 
'remote:Citystate|Vestavia Hills' 
MATCH 3 
'395b5231539390675a7abe0751fc4820' 
MATCH 4 
'remote:City|Vestavia Hills,AL' 
MATCH 5 
'remote:Citystate|Vestavia Hills' 
MATCH 6 
'395b5231539390675a7abe0751fc4820'

I have this regex: 我有这个正则表达式:

(remote:[a-zA-Z]+\|[^\,]+|[a-f0-9]{32})

but those cities which have state 'AL' (separated with comma) are separated incorrectly. 但那些州为“ AL”(用逗号分隔)的城市则错误地分开了。

Possible solution: 可能的解决方案:

I was thinking of doing something like this - remote:[a-zA-Z]+\\|.* - and end match on the comma which have after it self ( remote:[a-zA-Z]+\\|.* ) or md5 hash ( [a-f0-9]{32},? ). 我当时正在考虑做这样的事情remote:[a-zA-Z]+\\|.* -并在逗号后面加上结束符,而这些匹配符后面是self( remote:[a-zA-Z]+\\|.* )或md5哈希( [a-f0-9]{32},? )。

Here is my regex tester link: 这是我的正则表达式测试器链接:

https://regex101.com/r/rP8iJ2/1 https://regex101.com/r/rP8iJ2/1

You can fine-tune your regex into this lookahead based regex: 您可以将正则表达式微调到此基于前瞻性的正则表达式中:

/(?:^|,)(.+?(?=,(?:[a-f0-9]{32}|remote:)|$))/igm

This will give 6 captured groups as you're expecting. 正如您所期望的那样,这将提供6个捕获的组。

Updated RegEx Demo 更新了RegEx演示

(?:^|,)                 # Match line start or comma
(                       # captured group #1 start
   .+?                  # match 1 or more of any character (lazy)
   (?=                  # lookahead start
      ,                 # match comma followed by
      (?:               # non-capturing group start
         [a-f0-9]{32}   # match hex digit 32 times
         |              # OR
         remote:        # match literal "remote:"
      )                 # non-capturing group end
      |                 # OR
      $                 # line end
   )                    # looakehad end
)                       # capturing group #1 end
([a-f0-9]{32}|remote:[^|]+\|[^,]+(?:,[A-Z]{2})?),?

This one is simpler to understand, I made a special optional sufix to the group, than can only be 2 uppcase letters after a comma. 这个更容易理解,我给该组添加了一个特殊的可选sufix,而不是只能在逗号​​后加上2个大写字母。

https://regex101.com/r/rP8iJ2/3 https://regex101.com/r/rP8iJ2/3

With a single regex you might do as follows; 使用单个正则表达式,您可以执行以下操作;

 var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820", arr = str.match(/(r.+?|[\\da-f]{32})(?=,?(remote|[\\da-f]{32}|$))/g); console.log(arr); 

One option is to use split of javascript: 一种选择是使用javascript拆分:

 var str = "remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820,remote:City|Vestavia Hills,AL,remote:Citystate|Vestavia Hills,395b5231539390675a7abe0751fc4820"; var aux = str.split("remote"); var res = []; for (var i=1 ; i < aux.length ; i++){ res.push("remote" + aux[i]); }; console.log(res); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM