简体   繁体   English

正则表达式末尾不匹配

[英]Regex with not match at end

I'm trying to write a regex to match patterns like this: 我正在尝试编写一个正则表达式来匹配这样的模式:

<td style="alskdjf" />

ie a self terminating <td> 即自终止<td>

but not this: 但不是这个:

<td style=alsdkjf"><br /></td>

I initially came up with: 我最初想出了:

<td\s+.*?/>

but that obviously fails on the second example and I thought that something like this might work: 但这显然在第二个示例中失败了,我认为这样的方法可能会起作用:

<td\s+.*?[^>]/>

but it doesn't. 但事实并非如此。 I'm using C#.NET. 我正在使用C#.NET。

Only looking for <td> 's that have an attribute. 仅查找具有属性的<td> eg looking for <td style="alsdfkj" /> but not <td> . 例如,寻找<td style="alsdfkj" />而不是<td>

You're going to have problems using regexps with HTML since HTML is not regular. 由于HTML不是常规的,因此将正则表达式与HTML结合使用将会遇到问题。 I'd recommend using an HTML parser for all but the very simplest cases. 除了最简单的情况,我建议对所有其他情况都使用HTML解析器。

This will match what you're looking for, and not match the problematic case you had with your first few tries: 这将与您要查找的内容相匹配,而与您最初尝试几次时遇到的有问题的案例不匹配:

<td[^>]*?/>

Note, however, that if you need to allow > characters in attribute values, you'd need something like this: 但是请注意,如果需要在属性值中允许>字符,则需要如下所示:

<td(?:[^>]|"[^"]*?")*?/>

Which allows > only within matching double-quotes (you could similarly expand it to allow single-quotes). 它允许>仅在匹配的双引号(你可以同样展开它允许单引号)。

You can add whatever specific attribute you're looking for into the regex; 您可以将要查找的任何特定属性添加到正则表达式中。 for instance for your example: 例如您的示例:

<td[^>]*? style="alskdjf"[^>]*?/>

Regex will have serious trouble interpreting messy HTML, as is the sort browsers often have to deal with. 正则表达式在解释凌乱的HTML时会遇到严重的麻烦,这是浏览器通常必须处理的问题。 There are all sorts of horrible obfuscations that can be done to the markup that you just don't want to have to think about! 对于您不想考虑的标记,可以进行各种各样的令人困惑的混淆!

The HTML Agility Pack is what you really want to be using, and has had very good reviews everywhere I've seen. HTML Agility Pack是您真正想要使用的,并且在我见过的所有地方都有很好的评价。 It is a robust library for reading any kind of mangled HTML into a DOM model. 它是一个健壮的库,用于将任何类型的损坏的HTML读取到DOM模型中。 I have personally found it to be an superb library, as surely have others, many using the library in the context of business applications. 我个人发现它是一个极好的库,当然还有其他库,其中许多库是在业务应用程序上下文中使用的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM