简体   繁体   English

正则表达式从定界符之间的字符串中提取值

[英]regex extract value from the string between delimiters

I have a large String and I need to extract String value from it. 我有一个大字符串,我需要从中提取字符串值。 String value is located between delimiters 字符串值位于定界符之间

category = '

and

';

This is my regex, but I need to avoid outputing delimiters. 这是我的正则表达式,但是我需要避免输出定界符。

String productCategory = Regex.Match(html, @"category = '(.*?)';").Value;

This is the exampe category = 'Video Cards'; 这是示例category = 'Video Cards';

and I need to extract Video Cards 我需要提取Video Cards

What you can use is the lookahead and lookbehind operators, so you end up with something like: 可以使用的是先行和后行运算符,因此最终会得到如下结果:

string pattern = @"(?<=category = ').*(?=';)";
string productCategory = Regex.Match(html, pattern ).Value;

It's also worth mentioning that parsing HTML with regexes is a bad idea . 还值得一提的是, 用正则表达式解析HTML是一个坏主意 You should use an HTML parser to parse HTML. 您应该使用HTML解析器来解析HTML。

Have you considered using the MatchObj.Groups property? 您是否考虑过使用MatchObj.Groups属性? If you test your current regex at a testing site like Derek Slager's , you'll notice exactly what you want is the first Group. 如果您在Derek Slager's之类的测试站点上测试当前的正则表达式,您会确切地注意到您想要的是第一个Group。 You should simply be able to invoke the first Group and get what you need. 您应该只能够调用第一个组并获得所需的内容。

productCategory.Groups[0].Value

您要提取组:

String productCategory = Regex.Match(html, @"category = '(.*?)';").Groups[1].Value; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM