简体   繁体   中英

Trying to understand .NET regular expressions

I've been doing a lot of reading on .NET regular expressions and I have developed a regular expression, that I can't make any sense of.

(src|href)="\w+|(\w+/)+

The way I read this regular expression:

  1. Match exactly "src" or "href"
  2. Followed by ="
  3. Followed by match 1 or more word characters ([a-zA-Z0-9_]) or one or more of (one or more word characters followed by /)

This is meant to match something like 'src="Folder', 'src="folder/', 'href="Folder/SubFolder/', etc.

Input:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

Using this regular expression, with this input, there is one match.

org/1999/

Can anyone possibly explain this? Src or href aren't referenced in the entire string, how can there be any match at all?

What's happening here is the | is seperating the regex into two completely seperate conditions. That is select either: (src|href)="\\w+ OR (\\w+/)+ of which second bit is being matched:

org/1999/

In your case you'd probably need to put the last part in parentheses to make it clear what exactly the alternation | refers to:

(src|href)="(\w+|(\w+/)+)

Btw I used Expresso to help work this out.

Try Expresso , for example. It has a nice "explain" feature.

Try this app http://www.regexbuddy.com/ . You can set the RegEx flavor to .NET and it has a great tab which breaks down each element of your RegEx.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM