I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.
(src|href)="\w+|(\w+/)+
The way I read this regular expression:
This is meant to match something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.
Input:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
Using this regular expression, with this input, there is one match.
org/1999/
Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?
What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\\w+
OR (\\w+/)+
of which second bit is being matched:
org/1999/
In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation |
refers to:
(src|href)="(\w+|(\w+/)+)
Btw I used Expresso to help work this out.
Try Expresso , for example. It has a nice "explain" feature.
Try this app http://www.regexbuddy.com/ . You can set the RegEx flavor to .NET and it has a great tab which breaks down each element of your RegEx.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.