[英]Regex is matching one or more groups too many
I have a series of filenames of varying complexity. 我有一系列复杂程度各异的文件名。 Basically, they are always split up by [_]{ASSET}_[OPTIONAL_DESCRIPTION]_v{#####}.{EXT}.
基本上,它们总是由[_] {ASSET} _ [OPTIONAL_DESCRIPTION] _v {#####}。{EXT}分割。 ([]s are optional, in this case).
(在这种情况下,[]是可选的)。 Within that format though, each piece can be arbitrarily complex.
但是在这种格式下,每一部分都可以任意复杂。 (leading _s are arbitrary)
(前导_是任意的)
character_thing_v001.md
character_Description_v001.md
character_Some_Long_Description_v001.md
character_thing_with_additional_info_v001.md
character_thing_with_additional_info_Description_v001.md
character_thing_with_additional_info_More_Description_Info_v001.md
character_with_additional_info_Complete234ly_arbitrary_Description_v001.md
_character_thing_v001.md
___character_Description_v001.md
____character_Some_Long_Description_v001.md
__character_thing_with_additional_info_v001.md
__character_thing_with_additional_info_Description_v001.md
___character_thing_with_additional_info_More_Description_Info_v001.md
I made a lookahead assertion to separate ASSET and DESCRIPTION and everything worked fine until just recently, when my boss threw a wrench in the system. 我做了一个先行的断言,将资产和描述分开,直到最近,当我的老板在系统中扳动扳手时,一切都运转良好。 Now I have to support assets whose convention could be "some_undercase" OR "CAPS_###".
现在,我必须支持约定为“ some_undercase”或“ CAPS _ ###”的资产。 I modified to allow AZ and made descriptionText match anything.
我进行了修改,以允许AZ,并使descriptionText匹配任何内容。 That's where the mess started.
那是混乱的开始。
(?:[_]+)?
(?P<assetText>[a-zA-Z0-9]+
(?=_[a-zA-Z0-9]+)? # lookahead and optionally assert _Capital
(?:(?:_[a-zA-Z0-9]+)+)? # match next group if it exists
) # get full match
(?:[_]+)?
\_(?P<descriptionText>.+)?
\_v(?P<versionIncrement>\d+)
\.(?:\.)?
(?P<extension>(?:md|some|other|extension|options))
This gets me part of the way there but it has problems that you can view, here 这让我的存在方式的一部分,但它有问题,你可以看到, 在这里
Now that the ASSET can have capitals, the lookahead matches too much for ASSET and starts going into the DESCRIPTION. 既然ASSET可以有大写字母,那么与ASSET匹配的前瞻就太多了,并开始进入DESCRIPTION。 This pattern is one of several that gets automatically generated so I'm looking for a way to solve the root of the problem, rather than write around it.
这种模式是自动生成的几种模式中的一种,因此我正在寻找一种解决问题根源的方法,而不是一味解决。 Any guidance would be really appreciated, thank you.
任何指导将不胜感激,谢谢。
I can't really follow the logic of some of the parts of your regex that seem unnecessary. 我无法真正遵循您的正则表达式某些似乎不必要的部分的逻辑。
Doesn't this simplified regex do the same job? 这个简化的正则表达式不做同样的工作吗?
_*
(?P<assetText>[a-zA-Z0-9]+(_[a-z_0-9]+)?)
(_ (?P<descriptionText>[a-zA-Z0-9_]+) )?
_v(?P<versionIncrement>[0-9]+)
(?P<extension>\.[A-Za-z0-9]+)
Perhaps the (natural-language) rules for what constitutes an asset and what constitutes an optional description need to be clarified: 可能需要澄清关于什么构成资产和什么构成可选描述的(自然语言)规则:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.