简体   繁体   English

如何将基于数组元素的字符串拆分成数组,保留javascript中的拆分字

[英]How to split the String based on array elements into array retaining the array the split word in javascript

I have a string我有一个字符串

sReasons =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

and I need to split the above string based on the separator array我需要根据分隔符数组拆分上面的字符串

const separator = ["O9", "EO", "HJ", "J8"];

Where first 2 characters(O9) represnet code, next 4 another code(C270) & next 4 the character(0021) length of the String which is Not eligible for SDWC其中前 2 个字符 (O9) 代表网络代码,接下来 4 个另一个代码 (C270) & 下 4 个字符 (0021) 不符合 SDWC 条件的字符串长度

Where the separator codes are unique, with 2 capital letters and will not be repeated in textMessage except inEligType其中分隔码是唯一的,有2个大写字母,除inEligType textMessage中不会重复

I need to create a json of the format我需要创建一个格式为 json

{
    {inEligType: "O9", msgCode: "C270", msgLen: "0021", textMsg: "Not eligible for SDWC"},
    {inEligType: "EO", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "HJ", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "J8", msgCode: "C500", msgLen: "0016", textMsg: "Delivery Attempt"}
}

I'm basically failing at the splitting the string itself based on the array given, I tried the following我基本上没有根据给定的数组拆分字符串本身,我尝试了以下

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

function formatReasons(Reasons: string) {
var words: any[] = Reasons.split(this.spearator); 
for(let word in words)
    {
       console.log(word) ;
    }
}
var result = formatReasons(sHdnReasonsCreate);
console.log("Returned Result: "+result);

But it gives me result as但它给了我结果

["O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt"]length: 1__proto__: Array(0)

Returned Address is: undefined

My Regex-based approach:我的基于正则表达式的方法:

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

// build the regex based on separators
let regexPattern = '^';
separator.forEach(text => {
    regexPattern += `${text}(.*)`;
});
regexPattern += '$';

// match the reasons
let r = new RegExp(regexPattern);
let matches = sReasons.match(r);

// prepare to match each message
let msgMatcher = new RegExp('^(?<msgCode>.{4})(?<msgLen>.{4})(?<textMsg>.*)$');
let output = [];

for (let i=1; i<matches.length; i++) {
    // match the message
    const msg = matches[i].match(msgMatcher);

    // store
    let item = msg.groups;
    item.inEligType = separator[i-1];
    output.push(item);
}

console.log(JSON.stringify(output, null, 2));

Produces生产

[
  {
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC",
    "inEligType": "O9"
  },
  {
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "EO"
  },
  {
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "HJ"
  },
  {
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt",
    "inEligType": "J8"
  }
]

It may well be that textMsg field, nor any other field, will never contain the two-letter strings you are using for the inEligType field.很可能textMsg字段或任何其他字段永远不会包含您用于inEligType字段的两个字母字符串。 But are you absolutely sure of that?但你绝对确定吗? The data format looks to me like it really wants someone to parse it by substrings of certain lengths;在我看来,数据格式确实希望有人通过特定长度的子字符串来解析它; why even have a msgLen field if you could just split based on delimiters?如果您可以根据分隔符进行拆分,为什么还要有一个msgLen字段? What if the list of inEligType codes changes in the future?如果将来inEligType代码列表发生变化怎么办?

For these reasons I strongly recommend that you parse by substring lengths and not by delimiter matching.由于这些原因,我强烈建议您通过 substring 长度而不是分隔符匹配来解析。 Here's one possible way to do that:这是一种可能的方法:

function formatReasons(reasons: string) {
  const ret = []
  while (reasons) {
    const inEligType = reasons.substring(0, 2);
    reasons = reasons.substring(2);
    const msgCode = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const msgLen = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const textMsg = reasons.substring(0, +msgLen);
    reasons = reasons.substring(+msgLen);
    ret.push({ inEligType, msgCode, msgLen, textMsg });
  }
  return ret;
}

You can verify that it produces the expected output for your example sReasons string:您可以验证它是否为您的示例sReasons字符串生成了预期的 output:

const formattedReasons = formatReasons(sReasons);
console.log(JSON.stringify(formattedReasons, undefined, 2));
/* [
  {
    "inEligType": "O9",
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC"
  },
  {
    "inEligType": "EO",
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "HJ",
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "J8",
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt"
  }
] */

Note that the implementation above does not check that the string is properly formatted;请注意,上面的实现不检查字符串的格式是否正确; right now if you pass garbage in, you get garbage out.现在,如果你把垃圾送进去,你就会把垃圾拿出来。 If you want more safety you could do runtime checks and throw errors if you, say, run off the end of the reasons string unexpectedly, or find a msgLen field that doesn't represent a number.如果您想要更多的安全性,您可以进行运行时检查并抛出错误,例如,意外地运行reasons字符串的末尾,或者找到不代表数字的msgLen字段。 And one could refactor so that there's no repetition of code like const s = reasons.substring(0, n); reasons = reasons.substring(n)并且可以进行重构,这样就不会重复像const s = reasons.substring(0, n); reasons = reasons.substring(n)这样的代码。 const s = reasons.substring(0, n); reasons = reasons.substring(n) . const s = reasons.substring(0, n); reasons = reasons.substring(n) But the basic algorithm is there.但是基本算法就在那里。

Playground link to code Playground 代码链接

Another option with RegExp with less code RegExp 的另一种选择,代码更少

 // Your data const data = "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt"; // Set your data splitters from array const spl = ["O9", "EO", "HJ", "J8"].join('|'); // Use regexp to parse data const results = []; data.replace(new RegExp(`(${spl})(\\w{4})(\\w{4})(.*?)(?=${spl}|$)`, 'g'), (m,a,b,c,d) => { // Form objects and push to res results.push({ inEligType: a, msgCode: b, msgLen: c, textMsg: d }); }); // Result console.log(results);

A first approach, based on a groups capturing regex consumed by split , processed by a helper function and finally reduce d to the expected result...第一种方法,基于捕获split消耗的正则表达式的组,由助手 function 处理,最后reduce d 减少到预期结果......

 function chunkRight(arr, chunkLength) { const list = []; arr = [...arr]; while (arr.length >= chunkLength) { list.unshift( arr.splice(-chunkLength) ); } return list; } // see also... [https://regex101.com/r/tatBAB/1] // with eg // (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4}) //... or... // (O9|EO|HJ|J8)(\w{4})(\d{4}) // function extractStatusItems(str, separators) { const regXSplit = RegExp(`(${ separators.join('|') })(\\w{4})(\\d{4})`); const statusValues = String(str).split(regXSplit).slice(1); const groupedValues = chunkRight(statusValues, 4); return groupedValues.reduce((list, [inEligType, msgCode, msgLen, textMsg]) => list.concat({ inEligType, msgCode, msgLen, textMsg }), [] ); } const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt'; console.log( `statusCode... ${ statusCode }...`, extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8']) );
 .as-console-wrapper { min-height: 100%;important: top; 0; }

... followed by a second approach, based almost entirely on a regex which captures named groups , consumed by matchAll and finally map ped into the expected result... ...其次是第二种方法,几乎完全基于捕获命名组的正则表达式,由matchAll消耗,最后map进入预期结果...

 // see also... [https://regex101.com/r/tatBAB/2] // with eg // (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})(.*?)(?<textMsg>.*?)(?=O9|EO|HJ|J8|$) // function extractStatusItems(str, separators) { separators = separators.join('|'); const regXCaptureValues = RegExp( `(?<inEligType>${ separators })(?<msgCode>\\w{4})(?<msgLen>\\d{4})(.*?)(?<textMsg>.*?)(?=${ separators }|$)`, 'g' ); return [...String(str).matchAll(regXCaptureValues) ].map( ({ groups }) => ({...groups }) ); } const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt'; console.log( `statusCode... ${ statusCode }...`, extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8']) );
 .as-console-wrapper { min-height: 100%;important: top; 0; }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM