正则表达式匹配和子字符串合而为一？

Question

I have a HTML source as input and would like to know what CMS the website is made in. Many CMS leave their name in a meta tag like this: 我有一个HTML源作为输入，并且想知道该网站是用什么CMS制成的。许多CMS都将其名称保留在这样的元标记中：

<meta name="Generator" content="MY CMS" />

I can get the result like this: 我可以得到这样的结果：

        Match match = Regex.Match(html, ".*(?i)meta.*generator.*");
        match = Regex.Match(match.ToString(), "content.*\".*\"");
        match = Regex.Match(match.ToString(), "\".*\"");

Gives me "MY CMS" 给我“我的CMS”

But is there any way to shorten it down to one Regex.Match? 但是有什么方法可以将其缩短为一个Regex.Match吗？

Please notice, that the meta tag could be like this: 请注意，meta标签可能是这样的：

<meta content="MY CMS" name="Generator" />

Thanks and best regards 谢谢和最好的问候

Answer 1

var regex = new Regex(@"<meta\s+name=""Generator""\s+content=""([^""]+)""", RegexOptions.IgnoreCase);
var match = regex.Match(html);
var generator = match.Groups[1].Value;

Answer 2

Try the following: 请尝试以下操作：

Regex regex = new Regex(@"<meta[^>]+content\s*=\s*['"]([^'"]+)['"][^>]*>");
Match match = regex.Match(input);

The value is in group 1. 该值在组1中。

Hope it helps. 希望能帮助到你。

Answer 3

Regex is not a good choice for parsing HTML files.. 正则表达式不是解析HTML文件的好选择。

HTML is not strict nor is it regular with its format.. HTML既不严格也不规范其格式。

Use htmlagilitypack 使用htmlagilitypack

Regex is used for Regular expression NOT Irregular expression 正则表达式用于正则表达式NOT 不规则表达式

You can use this code to retrieve it using HtmlAgilityPack 您可以使用此代码通过HtmlAgilityPack进行检索

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var content= doc.DocumentNode
                .SelectSingleNode("//meta[@name='Generator']")
                .Attributes["content"].Value;

正则表达式匹配和子字符串合而为一？

问题描述

3 个解决方案

解决方案1
1 2012-11-24 17:36:45

解决方案2
1 已采纳 2012-11-24 17:37:00

解决方案3
1 2012-11-24 17:54:44

正则表达式匹配和子字符串合而为一？

问题描述

3 个解决方案

解决方案1 1 2012-11-24 17:36:45

解决方案2 1 已采纳 2012-11-24 17:37:00

解决方案3 1 2012-11-24 17:54:44

解决方案1
1 2012-11-24 17:36:45

解决方案2
1 已采纳 2012-11-24 17:37:00

解决方案3
1 2012-11-24 17:54:44