简体   繁体   English

C#正则表达式查找变量模式

[英]C# Regular Expression Find Variable Pattern

I need to parse html that is formatted in the manner of the code sample below. 我需要解析以以下代码示例的方式格式化的html。 The issue I have is that the field name can be wrapped in tags that have variable background or color styles. 我的问题是字段名称可以包装在具有可变背景或颜色样式的标签中。 The pattern I am looking for is 我正在寻找的模式是
tag, ignore any span that wraps text followed by a colon (this is the pattern 标记,请忽略所有将文本后接冒号的跨度(这是模式
id: without an span tag wrapping). id:不包含span标签)。 Matching this pattern should give me the key name and whatever follows the key name is the key value, until the next key name is hit. 匹配此模式应为我提供键名,并且键名后面的所有内容均为键值,直到按下下一个键名为止。 Below is a sample of the html I need to parse. 以下是我需要解析的html示例。

string source = "
<br />id: Value here
        <br /><SPAN style=\"background-color: #A0FFFF; color: #000000\">community</SPAN>: Value here
        <br /><SPAN style=\"background-color: #A0FFFF; color: #000000\">content</SPAN><SPAN style=\"background-          color: #A0FFFF; color: #000000\">title</SPAN>: Value here
"
//split the source into key value pairs based on the pattern match.

Thanks for any help. 谢谢你的帮助。

Here's some code that'll parse it, assuming that your example HTML should have another <br /> element after `content'. 假设您的示例HTML在“内容”之后应该有另一个<br />元素,这是一些解析它的代码。

string source = @"
  <br />id: Value here
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">community</SPAN>: Value here
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">content</SPAN>
  <br /><SPAN style=""background-color: #A0FFFF; color: #000000"">title</SPAN>: Value here";

var items = Regex.Matches(source,@"<br />(?:<SPAN[^>]*>)?([^<:]+)(?:</SPAN>)?:?\s?(.*)")
         .OfType<Match>()
         .ToDictionary (m => m.Groups[1].Value, m => m.Groups[2].Value)
         .ToList();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM