简体   繁体   English

正则表达式分为大写和第一位数

[英]Regex split on upper case and first digit

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ] 我需要将字符串"thisIs12MyString"拆分为一个看起来像[ "this", "Is", "12", "My", "String" ]

I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ] 我已经到了"thisIs12MyString".split(/(?=[A-Z0-9])/)但它在每个数字上分裂并给出数组[ "this", "Is", "1", "2", "My", "String" ]

So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it. 所以在单词中我需要将字符串拆分为大写字母和数字,而前面没有另一个数字。

Are you looking for this? 你在找这个吗?

"thisIs12MyString".match(/[A-Z]?[a-z]+|[0-9]+/g)

returns 回报

["this", "Is", "12", "My", "String"]

As I said in my comment, my approach would be to insert a special character before each sequence of digits first, as a marker : 正如我在评论中所说,我的方法是先在每个数字序列之前插入一个特殊字符作为标记

"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)

where ~ could be any other character, preferably a non-printable one (eg a control character), as it is unlikely to appear "naturally" in a string. 其中~可以是任何其他字符,最好是不可打印的字符(例如控制字符),因为它不太可能在字符串中“自然地”出现。

In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy: 在这种情况下,您甚至可以在每个大写字母前插入标记,并省略前瞻,使分割变得非常容易:

"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')

It might or might not perform better. 它可能会或可能不会更好。

In my rhino console, 在我的犀牛控制台中,

js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);
this,Is,12,My,String

another one, 另一个,

js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return  a;});
this,Is,12,My,String

I can't think of any ways to achieve this with a RegEx. 我想不出用RegEx实现这一目标的任何方法。

I think you will need to do it in code. 我想你需要在代码中做到这一点。

Please check the URL, same question different language (ruby) -> 请检查URL,同一问题不同语言(ruby) - >

The code is at the bottom: http://code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/ 代码位于底部: http//code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/

You can fix the JS missing of lookbehinds working on the array split using your current regex. 您可以使用当前正则表达式修复JS缺少使用当前正则表达式进行数组拆分的外观。
Quick pseudo code: 快速伪代码:

var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {

    if (isSingleDigit(word)) {
        if (!digitsFlag) {
            result.push(word);
        } else {
            result[result.length - 1] += word;
        }
        digitsFlag = true;
    } else {
        result.push(word);
        digitsFlag = false;
    }

});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM