简体   繁体   中英

pattern matching check if greater than symbol is not preceded by smaller than symbol

i would like to check if the greater sign is preceded by the smaller than sign. what i really need is to check i there are more than one word seprated by space between the > and <.

for example :

<a v >

should be found because there are more than one "word" inside

and this :

< a > 

should not

here is my python code

text = '<a > b'
if re.search('(?<!\<)[a-zA-Z0-9_ ]+>',text):   # search for '>'
   print "found a match"

for this text i dont want it to match because there is a smaller than sign before. but it does find a match. the Negative Lookbehind does not seem to be working.

solution(kindof): this also finds smaller than symbol that is not preceded by a greater than symbol

match = re.search('<?[a-zA-Z0-9_ ]+>',text)
if ((match) and (match.group(0)[0] != '<')):
   print "found >"
match = re.search('<[a-zA-Z0-9_ ]+>?',text)
if ((match) and (match.group(0)[len(match.group(0))-1] != '>')):
   print "found <"

thanks homson_matt for the solution.

BETTER SOLUTION:

by replacing the string that causes the problem before looking for the greater and smaller symbols.

# replace all templates from source hunk ( <TEMPLATE> )
srcString = re.sub("<[ ]*[a-zA-Z0-9_\*:/\.]+[ ]*>", "TEMPLATE", srcString)
if re.search('[a-zA-Z0-9_ )]>',srcString): # search for '>'
    return True
if re.search('<[a-zA-Z0-9_ (]',srcString): # search for '<'
    return True

The match is: a > . This section matches your regex perfectly - it doesn't start with <, then it's got "a ", which matches the bit in square brackets, and then there's a >.

Are you trying to match the whole string? If you are, try re.match instead of re.search .

Or you might want to try this code. It searches for a substring that might start with <, and then decides if it actually does.

text = '<a > b'
match = re.search('<?[a-zA-Z0-9_ ]+>',text)

if ((match) and (match.group(0)[0] != '<')):
  # Match found

I think this is what you're looking for:

r'<\s*\w+(?:\s+\w+)+\s*>'

\\w+ matches the first word, then (?:\\s+\\w+)+ matches one or more additional words, separated by whitespace. If you don't want the match to span multiple lines, you can change \\s to a literal space:

r'< *\w+(?: +\w+)+ *>'

...or to a character class for horizontal whitespace only (ie, TAB or space characters):

r'<[ \t]*\w+(?:[ \t]+\w+)+[ \t]*>'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM