简体   繁体   English

需要有关正则表达式的帮助

[英]Need help regarding Regular Expression

suppose i have html like 假设我有html之类的

<html>
<Head>
<link type="text/css" href="c1.css" rel="stylesheet" />
<link type="text/css" href="c2.css" rel="stylesheet" />
<link type="text/css" href="c3.css" rel="stylesheet" />
<link type="text/css" href="c4.css" rel="stylesheet" />
<link type="text/css" href="c5.css" rel="stylesheet" />

<script type="text/javascript" src="j1.js"></script>
<script type="text/javascript" src="j2.js"></script>
</Head>

<body>

<script type="text/javascript" src="j3.js"></script>
<script type="text/javascript" src="j4.js"></script>

</body>
</html>

first i will use a regex which will return me all link tag detail and second regex will return me all script tag detail. 首先,我将使用正则表达式将返回所有链接标签详细信息,第二个正则表达式将返回所有脚本标签详细信息。 i search google but not getting anything suitable. 我搜索谷歌,但没有任何合适的方法。 if anyone aware of the two regex pattern then please let me know. 如果有人知道两种正则表达式模式,请告诉我。 thanks 谢谢

This answer is the one you're looking for. 这个答案就是您要寻找的答案。 Do not try to parse HTML with regexes. 不要尝试使用正则表达式解析HTML。

As it's been commented by others, it might not be a good practice trying to parse HTML with regexes, but this is what you'd asked for. 正如其他人所评论的那样,尝试使用正则表达式解析HTML可能不是一个好习惯,但这就是您所要的。 So here we go: 所以我们开始:

Regular Expression for `link` tag 链接标签的正则表达式

@"(?ix)" +
@"<link\s*type=\x22(?'type'.*?)\x22\s*" +
@"href=\x22(?'href'.*?)\x22\s*" +
@"rel=\x22(?'rel'.*?)\x22\s*" +
@"\/>";

Regular Expression for `script` tag 脚本标签的正则表达式

@"(?ix)" + 
@"<script\s*type=\x22(?'type'.*?)\x22\s*" +
@"src=\x22(?'src'.*?)\x22\s*" +
@"><\/script>";

Example

Supposing that you have your HTML in a variable of type string: 假设您将HTML放在字符串类型的变量中:

public const string LINK_PATTERN = 
                        @"(?ix)" +
                        @"<link\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"href=\x22(?<href>.*?)\x22\s*" +
                        @"rel=\x22(?<rel>.*?)\x22\s*" +
                        @"\/>";

public const string SCRIPT_PATTERN =
                        @"(?ix)" +
                        @"<script\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"src=\x22(?<src>.*?)\x22\s*" +
                        @"><\/script>";

static void Main(string[] args)
{
    string html = getBody();

    Regex links = new Regex(LINK_PATTERN);
    Regex scripts = new Regex(SCRIPT_PATTERN);

    foreach (Match link in links.Matches(html)) 
    {
        Console.WriteLine("<link>: " + link);

        Console.WriteLine("\ttype: " + link.Groups["type"]);
        Console.WriteLine("\thref: " + link.Groups["href"]);
        Console.WriteLine("\trel: " + link.Groups["rel"]);

        Console.WriteLine("");
    }

    foreach (Match script in scripts.Matches(html)) 
    {
        Console.WriteLine("<script>: " + script);

        Console.WriteLine("\ttype: " + script.Groups["type"]);
        Console.WriteLine("\tsrc: " + script.Groups["src"]);

        Console.WriteLine("");
    }

    Console.ReadKey();
}

public static string getBody()
{
    string html = "";

    html += "<html>";
    html += "<head>";
    html += "<link type=\"text/css\" href=\"c1.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c2.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c3.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c4.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c5.css\" rel=\"stylesheet\" />";
    html += "<script type=\"text/javascript\" src=\"j1.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j2.js\"></script>";
    html += "<body>";
    html += "<script type=\"text/javascript\" src=\"j3.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j4.js\"></script>";
    html += "</body>";
    html += "</html>";

    return html;
}

It is not a good idea to parse HTML with regexes, it requires a real parser to do it properly. 用正则表达式解析HTML不是一个好主意,它需要一个真正的解析器才能正确执行。

While it is possible to make it work with the first example text you're given, you will then seem to spend every waking moment making changes to cover every 'special case' in the next text that you have to parse. 尽管有可能使它与您收到的第一个示例文本一起使用,但是随后您似乎会花费所有清醒的时间进行更改以覆盖您必须解析的下一个文本中的每个“特殊情况”。

该解析器似乎很流行: HTML Agility Pack

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM