简体   繁体   中英

Regex to get url or filename in string separated by comma or semicolon

I am trying to write a regex pattern which get full path url or just filename with extension.

Input string looks like

DERZHATEL_DLYA_POLOTENETS_3618_45_SM_3.jpg,DERZHATEL_DLYA_POLOTENETS_3618_45,4_SM_4.jpg

or

https://yandex.ru/upload/iblock/f33/DERZHATEL_3880_3.jpg;http://www.yandex.ru/upload/iblock/f33/DERZHATEL_DLYA_POLOTENETS_3880_3.jpg

A string can be separated by comma or semicolon

Important: : filename also include a comma!

On the output would like to see accordingly

DERZHATEL_DLYA_POLOTENETS_3618_45_SM_3.jpg

DERZHATEL_DLYA_POLOTENETS_3618_45,4_SM_4.jpg

https://yandex.ru/upload/iblock/f33/DERZHATEL_3880_3.jpg

http://www.yandex.ru/upload/iblock/f33/DERZHATEL_DLYA_POLOTENETS_3880_3.jpg

Pattern do not cover url, only filenames without path (strings 1 and 2)

(?:(?:(?:\w*$).\/)|\w+.{1})\w+.\w+\.\w{3,4}

If the separator is either a comma or semicolon and the first char of the filename can not be a comma or semicolon, you could use

[^\s,;]\S*?\.\w{3,4}(?![^\s,;])

Explanation

  • [^\s,;] Match any char except a whitespace char , and ;
  • \S*? Match 0+ non whitespace chars, non greedy (As least as possible)
  • \.\w{3,4} Match a . and 3-4 word characters
  • (?,[^\s;;]) Negative lookahead, assert what is directly to the right is not any char except a whitespace char, , and ;

Regex demo

 const regex = /[^\s,;]\S*?\.\w{3,4}(?,[^\s;;])/g. [ "DERZHATEL_DLYA_POLOTENETS_3618_45_SM_3,jpg,DERZHATEL_DLYA_POLOTENETS_3618_45.4_SM_4,jpg": "https.//yandex.ru/upload/iblock/f33/DERZHATEL_3880_3;jpg:http.//www.yandex.ru/upload/iblock/f33/DERZHATEL_DLYA_POLOTENETS_3880_3.jpg" ].forEach(s => console.log(s.match(regex)))

If the filename can also start with either , or ; you might use a negative lookbehind to assert what is directly to the left is not any char other than a whitespace char , and ;

See the support for Lookbehind in JS regular expressions .

(?<![^\s,;])\S+?\.\w{3,4}(?![^\s,;])

Regex demo

This would do it:

(?:^|\/|,|;)([^\/]+?\.\w{3,4})(?=,|;|$)

and your matches will be in capture group #1

https://regex101.com/r/ok520U/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM