简体   繁体   中英

Parsing units with javascript regex

Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example).

I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft":

/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g

Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. I understand javascript regex does not support look back so how do I accurately parse the string?

Additional notes

  • The regex presented above is greatly simplified around the prefixes and units groups. The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable.
  • The desire is for each group to catch the prefix match as group 1 and the unit as match two ie for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft'].
  • The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. Likewise the match for 'mm' would need to be the prefix milli and the unit metres.

I would try with:

/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g

at least with example above, it matches all units merged into one string. DEMO

EDIT

Another try ( DEMO ):

/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g

this one again match only one part, but if you use $1,$2,$3,$4, etc, ( DEMO ) you can extract other fragments. It ignores . , ( , ) , characters. The problem is to count proper matched groups, but it works to some degree.

Or if you accept multiple separate matches I think simple alternative is:

/(m|k|Btu|th|ft)/g 

A word boundary will not separate two non-word characters. So, you don't actually want a word boundary since the parentheses and period are not valid word characters. Instead, you want the string to not be followed by a word character, so you can use this instead:

[mk]??(Btu\(th\)|ft|m)(?!\w)

Demo

I believe you're after something like this. If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots.

/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g

https://regex101.com/r/eQ5nR4/2

If you don't care about being able to match the parentheses but just return the elements you can just do

/(m|k?)(\w+)/g

https://regex101.com/r/oC1eP5/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM