简体   繁体   English

VB.net RegEx基本问题

[英]VB.net basic RegEx problems

Hello I am trying save a value from an input tag in some HTML source code. 您好,我正在尝试在一些HTML源代码中保存来自输入标签的值。 The tag looks like so: 标签看起来像这样:

<input name="user_status" value="3" />

I have the page source in a variable (pageSourceCode), and need to work out some regex to get the value (3 in this example). 我将页面源包含在变量(pageSourceCode)中,需要计算一些正则表达式来获取值(在本示例中为3)。 I have this so far: 到目前为止,我有:

Dim sCapture As String = System.Text.RegularExpressions.Regex.Match(pageSourceCode, "\<input\sname\=\""user_status\""\svalue\=\""(.*)?\""\>").Groups(1).Value

Which works fine most of the time, however this code is used to process source code from multiple sites (that use the same platform), and sometimes there are other attributes included in the input tag, or they are in a different order, eg: 大部分时间都可以正常工作,但是此代码用于处理来自多个站点(使用同一平台)的源代码,有时输入标签中还包含其他属性,或者它们的顺序不同,例如:

<input class="someclass" type="hidden" value="3" name="user_status" />

I just dont understand regex enough to cope with these situations. 我只是不了解正则表达式不足以应付这些情况。

Any help very much appreciated. 任何帮助,非常感谢。

PS Although i am looking for a specific answer to this question if at all possible, a pointer to a good regex tutorial would be great as well 附言:尽管我正在寻找这个问题的具体答案(如果有可能),那么指向良好的正则表达式教程的指针也将是不错的选择

Thanks 谢谢

You can search for <input[^>]*\\bvalue="([^"]+)" if your input tags never contain angle brackets. 如果您的input标签从不包含尖括号,则可以搜索<input[^>]*\\bvalue="([^"]+)"

[^>]* matches any number of characters except > which keeps the regex from accidentally matching across tags. [^>]*匹配任意数量的字符,但>除外,这可以防止正则表达式在标签之间意外匹配。

\\b ensures that we only match value and not something like x_value . \\b确保我们只匹配value而不匹配x_value

EDIT: 编辑:

If you only want to look at input tags where name="user_status" , then you can do this with an additional lookahead assertion : 如果只想查看name="user_status" input标签,则可以使用附加的超前断言来实现

<input(?=[^>]*name="user_status")[^>]*\bvalue="([^"]+)"

In VB.NET: 在VB.NET中:

ResultString = Regex.Match(SubjectString, "<input(?=[^>]*user_status=""name"")[^>]*\bvalue=""([^""]+)").Groups(1).Value

A good tutorial can be found at http://www.regular-expressions.info 可以在http://www.regular-expressions.info上找到一个很好的教程

Assuming this is an ASP.Net page and not some external HTML you can't control the better solution would be simply to access the control. 假设这是一个ASP.Net页面,而不是您无法控制的某些外部HTML,则更好的解决方案是直接访问控件。

Add an ID field to your input control and a runat="server" like this. 将一个ID字段添加到您的输入控件中,并添加一个runat =“ server”,就像这样。

<input id="user_status" runat="server" class="someclass" type="hidden" value="3" name="user_status" />

You can probably get rid of the Name field. 您可能可以摆脱“名称”字段。 It's typically the same as the ID field and ID is a better choice. 通常与ID字段相同,ID是更好的选择。 You can actually have both an ID and Name field if you want and they can both be the same value. 如果需要,您实际上可以同时具有ID和Name字段,并且它们都可以是相同的值。

In your code behind you can then access the value by the ID with no need for a regex. 在后面的代码中,您可以通过ID访问值,而无需使用正则表达式。

Me.user_status.value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM