简体   繁体   English


[英]Regex split on upper case and first digit

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ] 我需要将字符串"thisIs12MyString"拆分为一个看起来像[ "this", "Is", "12", "My", "String" ]

I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ] 我已经到了"thisIs12MyString".split(/(?=[A-Z0-9])/)但它在每个数字上分裂并给出数组[ "this", "Is", "1", "2", "My", "String" ]

So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it. 所以在单词中我需要将字符串拆分为大写字母和数字,而前面没有另一个数字。

Are you looking for this? 你在找这个吗?


returns 回报

["this", "Is", "12", "My", "String"]

As I said in my comment, my approach would be to insert a special character before each sequence of digits first, as a marker : 正如我在评论中所说,我的方法是先在每个数字序列之前插入一个特殊字符作为标记

"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)

where ~ could be any other character, preferably a non-printable one (eg a control character), as it is unlikely to appear "naturally" in a string. 其中~可以是任何其他字符,最好是不可打印的字符(例如控制字符),因为它不太可能在字符串中“自然地”出现。

In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy: 在这种情况下,您甚至可以在每个大写字母前插入标记,并省略前瞻,使分割变得非常容易:

"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')

It might or might not perform better. 它可能会或可能不会更好。

In my rhino console, 在我的犀牛控制台中,

js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);

another one, 另一个,

js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return  a;});

I can't think of any ways to achieve this with a RegEx. 我想不出用RegEx实现这一目标的任何方法。

I think you will need to do it in code. 我想你需要在代码中做到这一点。

Please check the URL, same question different language (ruby) -> 请检查URL,同一问题不同语言(ruby) - >

The code is at the bottom: http://code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/ 代码位于底部: http//code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/

You can fix the JS missing of lookbehinds working on the array split using your current regex. 您可以使用当前正则表达式修复JS缺少使用当前正则表达式进行数组拆分的外观。
Quick pseudo code: 快速伪代码:

var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {

    if (isSingleDigit(word)) {
        if (!digitsFlag) {
        } else {
            result[result.length - 1] += word;
        digitsFlag = true;
    } else {
        digitsFlag = false;


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM