简体   繁体   中英

c# Regex.Matches problems with multiple matches results

I am trying to use Regex.Matches and it seems to work in a different way to what I am used to with other languages like PHP. Here is what I am trying to do:

I want to get all forms from a particular webpage, but when I try to do the following

        String pattern = "(?i)<form[^<>]*>(.*)<\\/form>"; 
        MatchCollection matches = Regex.Matches(content, pattern );

        foreach (Match myMatch in matches)
        {
            MessageBox.Show(myMatch.Result("$1"));
        }

This code does not show anything even though there are three forms on that page. It seems that when I use (.*) it just skips everything till the end of the content.

The Regex class makes the . operator NOT match \r and \n by default. Try replacing this:

MatchCollection matches = Regex.Matches(content, pattern );

with:

MatchCollection matches = Regex.Matches(content, pattern, RegexOptions.Singleline);

Try something like this for the main portion of your Regex:

    String pattern = "<form[\\d\\D]*?</form>";

It is a pattern I am currently using to strip all tags of a specific type out of a document, but should do well finding the form tags. You can alter the \d\D section, if so desired.

string pattern = @"(?is)<form[^<>]*>(.*?)</form>"; 

That regex should work the same in PHP and C# (or, more accurately, PCRE and .NET). If you're getting minimal matches in PHP without the ? , you probably have the /U ("ungreedy") option set, eg:

preg_match_all('~<form[^<>]*>(.*)</form>~isU', $subject, $matches);

or

preg_match_all('~(?isU)<form[^<>]*>(.*)</form>~', $subject, $matches);

.NET has no equivalent for PCRE's ungreedy mode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM