简体   繁体   English

匹配特定模式的正则表达式

[英]Regular expression to match a specific pattern

I have the following string:我有以下字符串:

s = "<X> First <Y> Second"

and I can match any text right after <X> and <Y> (in this case "First" and "Second").我可以在<X><Y>之后匹配任何文本(在这种情况下是“第一”和“第二”)。 This is how I already did it:这就是我已经做到的:

import re
s = "<X> First <Y> Second"
pattern = r'\<([XxYy])\>([^\<]+)'  # lower and upper case X/Y will be matched
items = re.findall(pattern, s)
print items
>>> [('X', ' First '), ('Y', ' Second')]

What I am now trying to match is the case without <> :我现在试图匹配的是没有<>的情况:

s = "X First Y Second"

I tried this:我试过这个:

pattern = r'([XxYy]) ([^\<]+)'
>>> [('X', ' First Y Second')]

Unfortunately it's not producing the right result.不幸的是,它没有产生正确的结果。 What am I doing wrong?我究竟做错了什么? I want to match X or x or Y or y PLUS one whitespace (for instance "X ").我想匹配 X 或 x 或 Y 或 y 加上一个空格(例如“X”)。 How can I do that?我怎样才能做到这一点?

EDIT: this is a possible string too:编辑:这也是一个可能的字符串:

s = "<X> First one <Y> Second <X> More <Y> Text"

Output should be:输出应该是:

 >>> [('X', ' First one '), ('Y', ' Second '), ('X', ' More '), ('Y', ' Text')]

EDIT2:编辑2:

pattern = r'([XxYy]) ([^ ]+)'
s = "X First text Y Second"

produces:产生:

[('X', 'First'), ('Y', 'Second')]

but it should be:但它应该是:

[('X', 'First text'), ('Y', 'Second')]

How about something like: <?[XY]>? ([^<>XY$ ]+)怎么样: <?[XY]>? ([^<>XY$ ]+) <?[XY]>? ([^<>XY$ ]+)

Example in javascript: javascript中的示例:

 const re = /<?[XY]>? ([^<>XY$ ]+)/ig console.info('<X> First <Y> Second'.match(re)) console.info('X First Y Second'.match(re))

If you know which whitespace char to match, you can just add it to your expression.如果您知道要匹配哪个空白字符,则可以将其添加到您的表达式中。 If you want any whitespace to match, you can use \\s如果你想匹配任何空格,你可以使用 \\s

pattern = r'\<([XxYy])\>([^\<]+)'

would then be那么将是

pattern = r'\<([XxYy])\>\s([^\<]+)'

Always keep in mind the the expression within the () is what will be returned as your result.始终记住 () 中的表达式将作为您的结果返回。

假设要匹配的空白标记是单个空格字符,则模式为:

pattern = r'([XxYy]) ([^ ]+)'

所以我想出了这个解决方案:

pattern = r"([XxYy]) (.*?)(?= [XxYy] |$)"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM