简体   繁体   English

使用RegEx模式将字符串拆分为数组

[英]Split String into Array with RegEx Pattern

I have a string that I want to split into an array. 我有一个字符串,我想拆分成一个数组。 The string looks like this: 字符串看起来像这样:

'O:BED,N:KET,OT,N:JAB,FA,O:RPT,'

The string can contain any number of objects eg 该字符串可以包含任意数量的对象,例如

'O:BED,N:KET,OT,N:JAB,FA,O:RPT,X:BLA,GTO'

I want to split this string on the instance of \\w: eg O: 我想在\\w:的实例上拆分这个字符串\\w:例如O:

So I'll end up with array like this: 所以我最终会得到这样的数组:

['O:BED','N:KET, OT','N:JAB,FA','O:RPT']

I am using the following code: 我使用以下代码:

var array = st.split(/^(\w:.+)(?=\w:)/g);

However I end up with array like this : 但是我最终得到这样的数组:

['','O:BED,N:KET,OT,N:JAB,FA,','O:RPT,']

It seems the regex is being greedy, what should I do to fix it? 似乎正则表达式是贪婪的,我该怎么做才能解决它?

Note I am using angularjs and eventually I want to end up with this : 注意我正在使用angularjs,最终我想结束这个:

   var objs = [
     {type:O,code: BED, suf: ''},
     {type:N, code: KET, suf: OT},
     {type:N, code: JAB, suf: FA},
     {type:O, code: RPT, suf: ''}
     ]

It would be much easier if your string is formatted properly. 如果您的字符串格式正确将更容易。 But still we can achieve the task with extra effort. 但我们仍然可以通过额外的努力来完成任务。 Hope the below code works for you. 希望以下代码适合您。

var str = 'O:BED,N:KET,OT,N:JAB,FA,O:RPT,X:BLA,GTO';

var a = str.split(',');
var objs = [], obj, item, suf;

for(var i=0; i<a.length;){
  item = a[i].split(':');

  if(a[i+1] && a[i+1].indexOf(':') == -1){
    suf = a[i+1];
    i++;
  }else{
    suf = "";
  }

  obj = {
    type: item[0],
    code: item[1],
    suf: suf
  };

  objs.push(obj);
  i++;
}

console.log(objs);

You can use the RegExp.prototype.exec method to obtain successive matches instead of splitting the string with a delimiter: 您可以使用RegExp.prototype.exec方法获取连续匹配,而不是使用分隔符拆分字符串:

var myStr = 'O:BED,N:KET,OT,N:JAB,FA,O:RPT,';
var myRe = /([^,:]+):([^,:]+)(?:,([^,:]+))??(?=,[^,:]+:|,?$)/g;
var m;
var result = [];

while ((m = myRe.exec(myStr)) !== null) {
  result.push({type:m[1], code:m[2], suf:((m[3])?m[3]:'')});
}

console.log(result);

You want to do a string match and then iterate over that. 你想做一个字符串匹配 ,然后迭代它。

Full example inside AngularJS: http://jsfiddle.net/184cyspg/1/ AngularJS中的完整示例: http//jsfiddle.net/184cyspg/1/

var myString = 'O:BED,N:KET,OT,N:JAB,FA,O:RPT,';
$scope.myArray = [];
var objs = myString.match(/([A-Z])\:([A-Z]*)\,([A-Z]?)/g);
objs.forEach(function (entry) {
    var obj = entry.replace(',', ':');
    obj = obj.split(':');
    $scope.myArray.push({type: obj[0], code: obj[1], suf: obj[2]});
});

I love regular expressions :) 我喜欢正则表达式:)

This will match each object of your string, if you want to use the global flag and exec() through all the matches: 如果要在所有匹配项中使用全局标志和exec() ,这将匹配字符串的每个对象:

(\w):(\w+)(?:,((?!\w:)\w+))?

The only real trick is to only treat the next bit after the comma as the suffix to this one if it doesn't look like the type of the next. 唯一真正的诀窍是只将逗号后面的下一位作为后一个处理,如果它看起来不像下一个的类型。

Each match captures the groups: 每场比赛都会捕获这些组:

  1. type 类型
  2. code
  3. suf SUF

If you just want to split as you said, then the solution to your greedy problem is to tell it to split on commas which are followed by those matching objects, eg: 如果你只想按照你所说的那样进行split ,那么你贪婪问题的解决办法就是告诉它在逗号上拆分,然后是那些匹配的对象,例如:

,(?=(\w):(\w+)(?:,((?!\w:)\w+))?)

The following does not solve your regex issue however is an alternative approach to introduce underscorejs to handle from simple to more complex operations. 以下内容并未解决您的正则表达式问题,但是引入下划线来处理从简单操作到更复杂操作的替代方法。 Although an overkill in this case; 虽然在这种情况下有点矫枉过正;

// ie. input string = 'O:BED,N:KET,OT,N:JAB,FA,O:RPT,';
.controller('AppCtrl', [function() {
    /**
     * Split by comma then (chain) eval each (map) 
     * element that (if-else) contains '0:' is pushed 
     * into array as a new element, otherwise concat element
     * 
     * :#replace hardcoded values with params
     *
     * @param String string - a string to split
     * @param String prefix - prefix to determine start of new array element ie. '0:'
     * @param String delimiter - delimiter to split string ie ','
     * @return Array array of elements by prefix
     */
    $scope.splitter = function(string) {
      var a = [];
      var tmp = "";

      _.chain(string.split(',')) 
        .map(function(element) {
          if(element.indexOf('O:') >= 0) {
            element += tmp;
            a.push(element);
            tmp = "";
          } else {
            tmp += element;
          }
        });

      return a;
    };
}]);

Output: 输出:

array: Array[2]
  0: "O:BED"
  1: "O:RPTN:KETOTN:JABFA"
length: 2

Updated: Just read your requirements on Objects. 更新:只需阅读您对对象的要求。 underscorejs allows chaining operations. underscorejs允许链接操作。 For example, the code above could be tweaked to handle Objects, chained to .compact().object().value() to produce output as Object k:v pairs; 例如,可以调整上面的代码来处理对象,链接到.compact()。object()。value()以产生输出为Object k:v对;

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM