简体   繁体   English

C#使用正则表达式提取字符串

[英]C# extract string using regular expression

I have a html string which i'm parsing which looks like below. 我有一个正在解析的html字符串,如下所示。 I need to get the value of @Footer. 我需要获取@Footer的值。

strHTML = "<html><html>\r\n\r\n<head>\r\n<meta http-equiv=Content-Type 
           content=\"text/html; charset=windows-1252\">\r\n
           <meta name=Generator content=\"Microsoft Word 14></head></head><body> 
           <p>@Footer=CONFIDENTIAL<p></body></html>"

I have tried the below code, how do i get the value? 我尝试了以下代码,我如何获得价值?

Regex m = new Regex("@Footer", RegexOptions.Compiled);
foreach (Match VariableMatch in m.Matches(strHTML.ToString()))
{
     Console.WriteLine(VariableMatch);
}

You need to capture the value after the = . 您需要在=之后捕获值。 This will work, as long as the value cannot contain any < characters: 只要该值不能包含任何<字符,这将起作用:

Regex m = new Regex("@Footer=([^<]+)", RegexOptions.Compiled);
foreach (Match VariableMatch in m.Matches(strHTML.ToString()))
{
    Console.WriteLine(VariableMatch.Groups[1].Value);
}

You can do this with regex, but it's not necessary. 您可以使用正则表达式执行此操作,但这不是必需的。 One simple way to do this would be: 一种简单的方法是:

var match = strHTML.Split(new string[] { "@Footer=" }, StringSplitOptions.None).Last();
match = match.Substring(0, match.IndexOf("<"));

This assumes that your html string only has one @Footer . 假设您的html字符串只有一个@Footer

Your regex will match the string "@Footer". 您的正则表达式将匹配字符串“ @Footer”。 The value of the match will be "@Footer". 匹配项的值为“ @Footer”。

Your regex should look like this instead : 您的正则表达式应该看起来像这样:

Regex regex = new Regex("@Footer=[\w]+");
string value = match.Value.Split('=')[1];

使用匹配的组。

Regex.Matches(strHTML, @"@Footer=(?<VAL>([^<\n\r]+))").Groups["VAL"].Value;

如果这就是您的所有字符串,我们可以使用字符串方法来解决它,而无需接触正则表达式的东西:

var result = strHTML.Split(new string[]{"@Footer=", "<p>"}, StringSplitOptions.RemoveEmptyEntries)[1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM