简体   繁体   English

Python 正则表达式匹配下划线和字符串

[英]Python Regex Matching by Underscores and Strings

I have strings of the format:我有以下格式的字符串:

Between the first and second underscore, the text is either "red" or "blue" and between the second underscore and first pair of double underscores, the text is either "one" or "two".在第一个和第二个下划线之间,文本是“红色”或“蓝色”,在第二个下划线和第一对双下划线之间,文本是“一个”或“两个”。 Between the first set of double underscores is a Name.在第一组双下划线之间是一个名称。 This can include a single first name or a first name and last name separated by a single underscore.这可以包括单个名字或由单个下划线分隔的名字和姓氏。 This Name section is defined by the double underscores surrounding and any single underscore there means that it is part of Name.这个 Name 部分由双下划线包围,任何一个下划线都意味着它是 Name 的一部分。 (note, the first letter of Name must be CAPS). (注意,Name 的第一个字母必须是大写)。 Between the next set of double underscores is a nickname.在下一组双下划线之间是昵称。 Similarly, nicknames can be multiple words but separated by a single underscore.同样,昵称可以是多个单词,但由单个下划线分隔。 Anything detected between the second set of double underscores will be taken as the nickname.在第二组双下划线之间检测到的任何内容都将被视为昵称。 The remaining following the third double underscores can be anything.第三个双下划线之后的剩余部分可以是任何内容。 If multiple words are needed, they can be separated with single underscore.如果需要多个单词,可以用一个下划线分隔。 There doesn't have to be a remaining portion of the string.不必有字符串的剩余部分。

Here is what I have so far for my regex :到目前为止,这是我的正则表达式:

always_(?:red|blue)_(?:one|two)__[A-Z]{1,1}....

I don't want to use \\w+ to check for the name using underscores because this will also match the double underscores following the Name.我不想使用 \\w+ 来检查使用下划线的名称,因为这也将匹配名称后面的双下划线。 I'm stuck where to go from here.我被困从这里去哪里。

To clarify further, I want to catch any strings that are not following that format.为了进一步澄清,我想捕获任何不遵循该格式的字符串。

I came up with我想出了

always_(red|blue)_(one|two)__((?:[A-Z][a-z]+_?)+)__((?:_?[a-z]+)+)(?:__(\w+))?

which works for the examples here, you might want to do some more testing适用于此处的示例,您可能需要进行更多测试

You may use this regex that follows all the rules defined in your question:您可以使用遵循问题中定义的所有规则的正则表达式:

^always_(red|blue)_(one|two)__([A-Z][a-zA-Z]*(?:_[A-Z][a-zA-Z]*)?)__([a-zA-Z]+(?:_[a-zA-Z ]+)*)(?:__|$)

RegEx Demo正则表达式演示

Are you limited solely to re ?你仅限于re吗? If not I think, that this task become easier after you split your string at __ .如果不是,我认为,在__处拆分字符串后,此任务会变得更容易。 I would do:我会做:

s = "always_red_one__Darrel_Jack__jackie__enter_anything_here"
parts = s.split("__")
print(parts)

Output:输出:

['always_red_one', 'Darrel_Jack', 'jackie', 'enter_anything_here']

Then you might use always_(?:red|blue)_(?:one|two) to check if parts[0] is ok, parts[1][0].isupper() to check if second part starts with uppercase and len(parts)==4 to check if there is correct number of parts.然后你可以使用always_(?:red|blue)_(?:one|two)来检查parts[0]是否正常, parts[1][0].isupper()来检查第二部分是否以大写开头并且len(parts)==4检查零件数量是否正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM