简体   繁体   中英

Strange behavior of regexp in JavaScript

I wrote a simple JavaScript function to split a file name into parts: given a file name of the type 'image01.png' it splits it into 'image', '01', 'png'.

For this I use the following regular expression:

var reg = /(\D+)(\d+).(\S+)$/;

This works.

However, I would like to be able to split also something like this: day12Image01.png into 'day12Image', '01', 'png'. Generally, I would like to have any number of additional digits associated to the body as long as they do not fall right before the extension.

I tried with:

var reg = /(.+)(\d+).(\S+)$/;

or the alternative:

var reg = /(\S+)(\d+).(\S+)$/;

Confusingly (to me), if I apply those regular expressions to 'image01.png' I get following decomposition: 'image0', '1', 'png'.

Why is the '0' being assigned to the body instead of the numerical index in these cases?

Thanks for any feedback.

Try to use non-greedy regular expression /(\\S+?)(\\d+).(\\S+)$/. As far as I know this should work for javascript.

Here is one possible regular expression that should work fine:

/^(.+?)(\d+)\.(\S+)$/

Note, you should escape a dot . character, since otherwise the regex will consider it as 'any character' (so called "Special dot" ).

By default, capture groups are greedy, they will capture as much as they can, and since + means one OR more, it can just match the last digit and leave the first to the . or the \\S . Make them un-greedy with ? :

var reg = /(.+?)(\d+).(\S+)$/;

Or

var reg = /(\S+?)(\d+).(\S+)$/;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM