简体   繁体   English

使用javascript正则表达式解析单位

[英]Parsing units with javascript regex

Say I have a string which contains some units (which may or may not have prefixes) that I want to break into the individual units. 假设我有一个字符串,其中包含我想要分成单个单元的一些单元(可能有也可能没有前缀)。 For example the string may contain "Btu(th)" or "Btu(th).ft" or even "mBtu(th).ft" where mBtu(th) is the bastardised unit milli thermochemical BTU's (this is purely an example). 例如,字符串可以包含“Btu(th)”或“Btu(th).ft”或甚至“mBtu(th).ft”,其中mBtu(th)是标准化单位毫升热化学BTU(这纯粹是一个例子) 。

I currently have the following (simplified) regex however it fails for the case "mBtu(th).ft": 我目前有以下(简化)正则表达式,但它失败的情况“mBtu(th).ft”:

/(m|k)??(Btu\(th\)|ft|m)(?:\b|\s|$)/g

Currently this does not correctly detect the boundary between the end of 'Btu(th)' and the start of 'ft'. 目前,这并未正确检测'Btu(th)'的结尾与'ft'的开头之间的边界。 I understand javascript regex does not support look back so how do I accurately parse the string? 我理解javascript正则表达式不支持回顾所以如何准确地解析字符串?

Additional notes 补充笔记

  • The regex presented above is greatly simplified around the prefixes and units groups. 上面提到的正则表达式围绕前缀和单位组大大简化。 The prefixes could span multiple characters like 'Ki' and therefore character sets are not suitable. 前缀可能跨越多个字符,如'Ki',因此字符集不适合。
  • The desire is for each group to catch the prefix match as group 1 and the unit as match two ie for 'mBtu(th).ft' match one would be ['m','Btu(th)'] and match two would be ['','ft']. 希望每个组捕获前缀匹配作为组1,单位作为匹配2,即'mBtu(th).ft'匹配一个将是['m','Btu(th)']并且匹配两个将是['','ft']。
  • The prefix match needs to be lazy so that the string 'm' would be matched as the unit metres rather than the prefix milli. 前缀匹配需要是惰性的,以便字符串'm'将匹配为单位米而不是前缀milli。 Likewise the match for 'mm' would need to be the prefix milli and the unit metres. 同样,'mm'的匹配需要是前缀milli和单位米。

I would try with: 我会尝试:

/((m)|(k)|(Btu(\(th\))?)|(ft)|(m)|(?:\.))+/g

at least with example above, it matches all units merged into one string. 至少在上面的示例中,它匹配合并为一个字符串的所有单元。 DEMO DEMO

EDIT 编辑

Another try ( DEMO ): 另一个尝试( DEMO ):

/(?:(m)|(k)|(Btu)|(th)|(ft)|[\.\(\)])/g

this one again match only one part, but if you use $1,$2,$3,$4, etc, ( DEMO ) you can extract other fragments. 这一个再次只匹配一个部分,但如果你使用$ 1,$ 2,$ 3,$ 4等,( DEMO )你可以提取其他片段。 It ignores . 它忽略了. , ( , ) , characters. () ,字符。 The problem is to count proper matched groups, but it works to some degree. 问题是计算适当匹配的组,但它在某种程度上起作用。

Or if you accept multiple separate matches I think simple alternative is: 或者如果您接受多个单独的匹配,我认为简单的替代方案是

/(m|k|Btu|th|ft)/g 

A word boundary will not separate two non-word characters. 单词边界不会分隔两个非单词字符。 So, you don't actually want a word boundary since the parentheses and period are not valid word characters. 因此,您实际上并不需要单词边界,因为括号和句点不是有效的单词字符。 Instead, you want the string to not be followed by a word character, so you can use this instead: 相反,您希望字符串后面没有单词字符,因此您可以使用它:

[mk]??(Btu\(th\)|ft|m)(?!\w)

Demo 演示

I believe you're after something like this. 我相信你是在追求这样的事情。 If I understood you correctly that want to match any kind of element, possibly preceded by the m or k character and separated by parantheses or dots. 如果我理解你正确想要匹配任何类型的元素,可能在mk字符之前,并用parantheses或点分隔。

/[\s\.\(]*(m|k?)(\w+)[\s\.\)]*/g

https://regex101.com/r/eQ5nR4/2 https://regex101.com/r/eQ5nR4/2

If you don't care about being able to match the parentheses but just return the elements you can just do 如果你不关心是否能够匹配括号,只需返回你可以做的元素

/(m|k?)(\w+)/g

https://regex101.com/r/oC1eP5/1 https://regex101.com/r/oC1eP5/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM