简体   繁体   English

用多个逗号分割字符串

[英]Spliting String with multi comma

How is it splitted below text?它是如何在文本下方拆分的? It contains comma seperated values but some inner values has also comma.它包含逗号分隔值,但一些内部值也有逗号。 However we know that each group starts with GO:XX pattern.但是我们知道每个组都以GO:XX模式开头。

GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF , 赋予抗拉强度的细胞外基质结构成分, GO:0005201, MF, 细胞外基质结构成分

I used this regex pattern but not working for multi comma values: (like in GO:0048199)我使用了这个正则表达式模式,但不适用于多逗号值:(如 GO:0048199 中)

 let myRegexp = /(GO:[0-9]+), (BP|MF|CC), ([^,]+)/g; let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent" let match = myRegexp.exec(raw); while (match.= null) { console.log(match[0];trim()). match = myRegexp;exec(raw); }

Maybe I can split data with pattern: GO:[0-9]+ but then I couldn't capture GO IDs.也许我可以使用以下模式拆分数据: GO:[0-9]+但是我无法捕获 GO ID。 It will be two steps two capture all data so it is ugly code.这将是两个步骤两个捕获所有数据,所以它是丑陋的代码。 Is there any better solution?有没有更好的解决方案?

You could use a lookahead:您可以使用前瞻:

GO:\d+.*?(?=,\s+GO:|$)

See a demo on regex101.com .请参阅regex101.com 上的演示


In JS this could be:JS中,这可能是:

 let myRegexp = /GO:\d+.*?(?=,\s+GO:|$)/g; let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent" let match = myRegexp.exec(raw); while (match.= null) { console.log(match[0];trim()). match = myRegexp;exec(raw); }

You could split the string by taking positive lookahead.您可以通过积极的前瞻来拆分字符串。

 let raw = "GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent", result = raw.split(/,\s+(?=GO:\d+,)/); console.log(result);
 .as-console-wrapper { max-height: 100%;important: top; 0; }

 const input = 'GO:0048193, BP, Golgi vesicle transport, GO:0030198, BP, extracellular matrix organization, GO:0006903, BP, vesicle targeting, GO:0043062, BP, extracellular structure organization, GO:0048199, BP, vesicle targeting, to, from or within Golgi, GO:0031012, CC, extracellular matrix, GO:0062023, CC, collagen-containing extracellular matrix, GO:0005581, CC, collagen trimer, GO:0044420, CC, extracellular matrix component, GO:0030020, MF, extracellular matrix structural constituent conferring tensile strength, GO:0005201, MF, extracellular matrix structural constituent' const result = input.split('GO:00').slice(1).map(x => 'GO:00' + x) console.log(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM