简体   繁体   中英

Non-repeating Regex Pattern - negative lookahead

I am attempting to parse a string with regex in Java that is used for dimensions and return only the required parts of it.

The ideal String would be: number x number.

Anything not in this format can be ignored and return null.

Some of the Strings that are inputted include the following though.

  • 123x 132 sqft
  • 200 sq.ft. x 310 sq.ft.
  • 404X931X1007X1140
  • .772 Acres
  • 680 and 3209.05
  • 0.772 AC
  • approx 255 by 640
  • 111'X301'
  • approx.2 acre

My current regex solution is this

"(\\d+(?:\\.\\d*)?)[^\\dxX]*(?:[xX]| and |by|\\*)[^\\dxX]*(\\d+(?:\\.\\d*)?)"

and i return match.group(1) + "x" + match.group(2)

The problem I am left with is these repeating ones like "404X931X1007X1140" This should also be returned as a null since its an irregular shape but instead returns 404x931

My question is how would I make sure not to include these? My thought was to append a negative lookahead but it fails to meet my expectations and returns 404x93 for some reason.

first expression + "\\D*(?!([xX]| and |by|\\*)\\d+)"

Incase anyone else is looking for this. I ended up figuring out a solution that worked. I would have just used \b at the end but it didn't work for * characters. And the {0,30} in the lookbehind is because java wont let me do infinite quantifiers in a lookbehind. Kind of a mess to look at though.

(?<!\\d(?:[xX]| and |by|\\*).{0,30})\\b(\\d+(?:,\\d+)*(?:\\.\\d+)?)[^\\dxX]*(?:[xX]| and |by|\\*)[^\\dxX]*(\\d+(?:,\\d+)*(?:\\.\\d+)?)(?!.*(?:[xX]| and |by|\\*)\\D*\\d+)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM