简体   繁体   中英

Regex to not match final group when one group has a match

I have data I need to split up in the following format:

22Dec17 DEB ACME 16.27
22Dec17 DEB BIG CO STORE 50.33
123353443
22Dec17 FEE CHARGE NAME 39.91 DR
123434454
22Dec17 DEB NAMENAME 12.91 123.23
22Dec17 DEB NAME 6 91

In the above example, the first two lines of data would be:

22Dec17, DEB, ACME, 16.27,
22Dec17, DEB, BIG CO STORE, 50.33, 123353443
22Dec17, FEE, CHARGE NAME, 39.91, 123434454
22Dec17, DEB, NAMENAME, 12.91,
22Dec17, DEB, NAME, 6 91,

I am using the following regex which mostly works:

([0-9]{1,2}[A-Za-z]{1,3}[0-9]{2}) ([A-Z]{2,3}) ([A-Za-z.,\/& ]*) ?([0-9.]{1,8}[\. ][0-9.]{2})? ?(?:[0-9.]{1,8}[\. ][0-9.]{2})?\n?(?![0-9]{1,2}[A-Za-z]{1,3}[0-9]{2})([0-9A-Z-\/ .]*)

The problem comes when there is a number in the name, like so:

27Dec15 DEB TESCO UPT 123 34.90

This creates the regex result:

27Dec15, DEB, TESCO UPT, 123 34, .97

How can I make this number only match when it is part of last two values? When it is in the format 12 34 or 12.34 only, and know that in 123 34.90 is does not consider 123 34 and .90 parts of that match?

One way would be to force the \\n char to be required. I have it optional for now, as otherwise it prevents all matches. Could it be part of the lookahead?

Is the part of the regex that checks if the next line does not contain a date correct? >

\n?(?![0-9]{1,2}[A-Za-z]{1,3}[0-9]{2})([0-9A-Z-\/ .]*)

/(\\d{0,2}[az]{3}\\d{0,2})\\s([^.]+)\\s([\\d.]+)[\\n]?(\\d+\\s)?/gi

This regex should get what you're after, demo in the code example. You'll just have to sanitize out the newlines later. Breakdown:

  • (\\d{0,2}[az]{3}\\d{0,2})\\s match the date block, followed by a space
  • ([^.]+)\\s Get the company name, so any character that isn't . , followed by a mandatory space
  • ([\\d.]+) Get the cost
  • [\\n]?(\\d+\\s)? Optionally grab that extra line of digits, if it exists

 var teststrs = `22Dec17 DEB ACME 16.27 22Dec17 DEB BIG CO STORE 50.33 123353443 22Dec17 FEE CHARGE NAME 39.91 DR 123434454 22Dec17 DEB NAMENAME 12.91 123.23 22Dec17 DEB NAME 6 91` var rgx = /(\\d{0,2}[az]{3}\\d{0,2})\\s([^.]+)\\s([\\d.]+)[\\n]?(\\d+\\s)?/gi console.log(teststrs.match(rgx)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM