需要有关正则表达式的帮助

Question

suppose i have html like 假设我有html之类的

<html>
<Head>
<link type="text/css" href="c1.css" rel="stylesheet" />
<link type="text/css" href="c2.css" rel="stylesheet" />
<link type="text/css" href="c3.css" rel="stylesheet" />
<link type="text/css" href="c4.css" rel="stylesheet" />
<link type="text/css" href="c5.css" rel="stylesheet" />

<script type="text/javascript" src="j1.js"></script>
<script type="text/javascript" src="j2.js"></script>
</Head>

<body>

<script type="text/javascript" src="j3.js"></script>
<script type="text/javascript" src="j4.js"></script>

</body>
</html>

first i will use a regex which will return me all link tag detail and second regex will return me all script tag detail. 首先，我将使用正则表达式将返回所有链接标签详细信息，第二个正则表达式将返回所有脚本标签详细信息。 i search google but not getting anything suitable. 我搜索谷歌，但没有任何合适的方法。 if anyone aware of the two regex pattern then please let me know. 如果有人知道两种正则表达式模式，请告诉我。 thanks 谢谢

Answer 1

This answer is the one you're looking for. 这个答案就是您要寻找的答案。 Do not try to parse HTML with regexes. 不要尝试使用正则表达式解析HTML。

Answer 2

As it's been commented by others, it might not be a good practice trying to parse HTML with regexes, but this is what you'd asked for. 正如其他人所评论的那样，尝试使用正则表达式解析HTML可能不是一个好习惯，但这就是您所要的。 So here we go: 所以我们开始：

Regular Expression for `link` tag 链接标签的正则表达式

@"(?ix)" +
@"<link\s*type=\x22(?'type'.*?)\x22\s*" +
@"href=\x22(?'href'.*?)\x22\s*" +
@"rel=\x22(?'rel'.*?)\x22\s*" +
@"\/>";

Regular Expression for `script` tag 脚本标签的正则表达式

@"(?ix)" + 
@"<script\s*type=\x22(?'type'.*?)\x22\s*" +
@"src=\x22(?'src'.*?)\x22\s*" +
@"><\/script>";

Example 例

Supposing that you have your HTML in a variable of type string: 假设您将HTML放在字符串类型的变量中：

public const string LINK_PATTERN = 
                        @"(?ix)" +
                        @"<link\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"href=\x22(?<href>.*?)\x22\s*" +
                        @"rel=\x22(?<rel>.*?)\x22\s*" +
                        @"\/>";

public const string SCRIPT_PATTERN =
                        @"(?ix)" +
                        @"<script\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"src=\x22(?<src>.*?)\x22\s*" +
                        @"><\/script>";

static void Main(string[] args)
{
    string html = getBody();

    Regex links = new Regex(LINK_PATTERN);
    Regex scripts = new Regex(SCRIPT_PATTERN);

    foreach (Match link in links.Matches(html)) 
    {
        Console.WriteLine("<link>: " + link);

        Console.WriteLine("\ttype: " + link.Groups["type"]);
        Console.WriteLine("\thref: " + link.Groups["href"]);
        Console.WriteLine("\trel: " + link.Groups["rel"]);

        Console.WriteLine("");
    }

    foreach (Match script in scripts.Matches(html)) 
    {
        Console.WriteLine("<script>: " + script);

        Console.WriteLine("\ttype: " + script.Groups["type"]);
        Console.WriteLine("\tsrc: " + script.Groups["src"]);

        Console.WriteLine("");
    }

    Console.ReadKey();
}

public static string getBody()
{
    string html = "";

    html += "<html>";
    html += "<head>";
    html += "<link type=\"text/css\" href=\"c1.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c2.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c3.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c4.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c5.css\" rel=\"stylesheet\" />";
    html += "<script type=\"text/javascript\" src=\"j1.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j2.js\"></script>";
    html += "<body>";
    html += "<script type=\"text/javascript\" src=\"j3.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j4.js\"></script>";
    html += "</body>";
    html += "</html>";

    return html;
}

Answer 3

It is not a good idea to parse HTML with regexes, it requires a real parser to do it properly. 用正则表达式解析HTML不是一个好主意，它需要一个真正的解析器才能正确执行。

While it is possible to make it work with the first example text you're given, you will then seem to spend every waking moment making changes to cover every 'special case' in the next text that you have to parse. 尽管有可能使它与您收到的第一个示例文本一起使用，但是随后您似乎会花费所有清醒的时间进行更改以覆盖您必须解析的下一个文本中的每个“特殊情况”。

Answer 4

该解析器似乎很流行： HTML Agility Pack

需要有关正则表达式的帮助

问题描述

4 个解决方案

解决方案1
2 2011-12-05 17:42:26

解决方案2
2 已采纳 2011-12-05 18:15:13

Regular Expression for `link` tag 链接标签的正则表达式

Regular Expression for `script` tag 脚本标签的正则表达式

Example 例

解决方案3
1 2011-12-05 17:30:20

解决方案4
1 2011-12-05 17:39:14

需要有关正则表达式的帮助

问题描述

4 个解决方案

解决方案1 2 2011-12-05 17:42:26

解决方案2 2 已采纳 2011-12-05 18:15:13

Regular Expression for `link` tag 链接标签的正则表达式

Regular Expression for `script` tag 脚本标签的正则表达式

Example 例

解决方案3 1 2011-12-05 17:30:20

解决方案4 1 2011-12-05 17:39:14

解决方案1
2 2011-12-05 17:42:26

解决方案2
2 已采纳 2011-12-05 18:15:13

解决方案3
1 2011-12-05 17:30:20

解决方案4
1 2011-12-05 17:39:14