简体   繁体   English

如何通过正则表达式模式进行拆分并使定界符保留在长字符串上?

[英]How to split by a regex pattern and keep the delimitter on long string?

I have a long string of addresses, with each address having a structure similar to: 我有一长串地址,每个地址的结构类似于:

123 Main Street St. Louisville OH 43071,432

I want to split the address string on the state, zipcode, house number (in the above instance this would be: OH 43071,432). 我想在状态,邮政编码,门牌号上分割地址字符串(在上述情况下,这将是:OH 43071,432)。 While I have a regex combination that identifies these elements in each string (/\\d+,\\d+/), splitting based on this results in the delimiter being removed. 虽然我有一个正则表达式组合来标识每个字符串(/ \\ d +,\\ d + /)中的这些元素,但基于此的拆分会导致分隔符被删除。

While I've seen other stack overflow threads that address similar questions to this one, none of those solutions work. 尽管我已经看到其他堆栈溢出线程可以解决与此类似的问题,但这些解决方案都无法正常工作。 For instance, if I place the regex combo in a capture group, like (/(\\d+,\\d+)/), it returns the zip code and address on another line: 例如,如果我将正则表达式组合放在一个捕获组中,例如(/(\\ d +,\\ d +)/),它将在另一行返回邮政编码和地址:

[ '123 Main Street St. Louisville OH ',
  '43071,432',

Similarly, adding ?! 同样,添加?! or ?= in the regex combo is not effective. 或regex组合中的?=无效。

How can I successfully split the address strings, so the output will mirror: 如何成功拆分地址字符串,所以输出将镜像:

[ '123 Main Street St. Louisville OH 43071,432',
   Main Long Road St. Louisville OH 43071,786

The list of addresses I have is: 我拥有的地址列表是:

let addr =
  "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200";

Because you have overlapping matches, you won't be able to use split - instead, repeatedly use .exec with a capturing group, and extract the capturing group. 因为你有重叠的比赛中,你将无法使用split -相反,重复使用.exec与捕获组,并提取捕获组。 Match a comma or the beginning of the string, then in a lookahead, capture the address string, followed by a comma and digits: 匹配逗号或字符串的开头,然后先行查找,捕获地址字符串,后跟逗号和数字:

 const addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200"; let match; const matches = []; const pattern = /(?:^|,)(?=([^,]+,\\d+))./g while (match = pattern.exec(addr)) { matches.push(match[1]); } console.log(matches); 

If you need this operation only on the backend with last Node.js versions, you can use split() with a lookbehind assertion. 如果仅在具有最新Node.js版本的后端上需要此操作,则可以将split()与后向断言一起使用。 This code can also be tested in the last Google Chrome versions. 此代码也可以在最新的Google Chrome版本中进行测试。

 const addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200"; console.log(addr.split(/(?<=\\d+,\\d+) /)); 

How can I successfully split the address strings, so the output will mirror: 如何成功拆分地址字符串,所以输出将镜像:

 [ '123 Main Street St. Louisville OH 43071,432', Main Long Road St. Louisville OH 43071,786 

To match the string at the updated question you can use RegExp /[^\\s][^,]+,\\d+/g and String.prototype.match() to match character that is not a space character " " followed by one or more characters that are not comma characters , followed by comma character and one or more digit characters 要在更新的问题上匹配字符串,您可以使用RegExp /[^\\s][^,]+,\\d+/gString.prototype.match()来匹配非空格字符" "后跟一个或多个不是逗号的字符,然后是逗号和一个或多个数字字符

 let addr = "123 Main Street St. Louisville OH 43071,432 Main Long Road St. Louisville OH 43071,786 High Street Pollocksville NY 56432,54 Holy Grail Street Niagara Town ZP 32908,3200 Main Rd. Bern AE 56210,1 Gordon St. Atlanta RE 13000,10 Pussy Cat Rd. Chicago EX 34342,10 Gordon St. Atlanta RE 13000,58 Gordon Road Atlanta RE 13000,22 Tokyo Av. Tedmondville SW 43098,674 Paris bd. Abbeville AA 45521,10 Surta Alley Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32908,320 Main Al. Bern AE 56210,14 Gordon Park Atlanta RE 13000,100 Pussy Cat Rd. Chicago EX 34342,2 Gordon St. Atlanta RE 13000,5 Gordon Road Atlanta RE 13000,2200 Tokyo Av. Tedmondville SW 43098,67 Paris St. Abbeville AA 45521,11 Surta Avenue Goodtown GG 30654,45 Holy Grail Al. Niagara Town ZP 32918,320 Main Al. Bern AE 56215,14 Gordon Park Atlanta RE 13200,100 Pussy Cat Rd. Chicago EX 34345,2 Gordon St. Atlanta RE 13222,5 Gordon Road Atlanta RE 13001,2200"; let res = addr.match(/[^\\s][^,]+,\\d+/g); console.log(JSON.stringify(res, null, 2)); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM