简体   繁体   English

VB.NET正则表达式匹配XML数据

[英]VB.NET Regex Match on XML data

I'm trying to get a match against XML data as string for a specific id and a name from a listbox. 我正在尝试将XML数据匹配为特定ID和列表框中的名称的字符串。

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles  Button2.Click
    'website
    Dim link As String = "https://s25-pt.ogame.gameforge.com/api/players.xml"

    Dim html As String
    'name selected on listbox
    Dim jogador As String = ListBox1.Text
    Dim pattern As String = "player id=""(.*?)"" name=""" & jogador & """"


    webc1 = New WebClient
    webc1.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.0; es-ES; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3")

    html = webc1.DownloadString(link)


    Dim match As Match = Regex.Match(html, pattern)

    If match.Success Then
        MsgBox(match.Groups(1).Value)
    End If
End Sub

I'm not getting just the id but also I get a big piece of the 'html' string. 我不仅得到了id,而且还得到了很大一部分的“ html”字符串。

I tried to look for answer's on google, I tried other patterns but i don't get how to solve this problem. 我试图在谷歌上寻找答案,我尝试了其他模式,但是我不知道如何解决这个问题。 Is there a way I can improve my regex ? 有什么方法可以改善我的正则表达式吗?

I know this is xml, and I probably could get it using other method more appropriate, but i find this way easier. 我知道这是xml,我可能可以使用其他更合适的方法来获得它,但是我发现这种方式更容易。

If you try your regex on regex101 then it works fine eg running in pcre/ php mode. 如果您在regex101上尝试使用regex,则可以正常运行,例如以pcre / php模式运行。 However, .NET regexes work a little differently from other implementations. 但是,.NET正则表达式的工作与其他实现略有不同。

So, I tried with this regex instead and got a proper match: 因此,我尝试使用此正则表达式进行了尝试,并获得了正确的匹配项:

player id="(\d+)" name="sniper lord"

Giving me a result of 1000042 from your data. 根据您的数据提供给我1000042的结果。

\\d+ just means one or more digits - your XML data indicates the player IDs are numeric only so this 'tightens up' the regex. \\d+仅表示一个或多个数字-您的XML数据指示玩家ID仅是数字,因此这“收紧”了正则表达式。 This also uses sniper lord as a test value for jogador . 这也使用sniper lord作为jogador的测试值。

Perhaps you can also use the String.Format command to help out with the slightly confusing run of double quotes: 也许您还可以使用String.Format命令来帮助解决一些令人困惑的双引号:

Dim pattern As String = String.Format("player id=""{0}"" name=""{1}""", "(\d+)", jogador)

I just couldn't resist this since RegEx against XML is just not a good idea. 我无法抗拒这一点,因为针对XML的RegEx并不是一个好主意。

Your link to the sample XML was kind enough to offer up a schema: 您到示例XML的链接足够友好,可以提供一个模式:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="players">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="player" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:attribute name="id" use="required" type="xs:integer"/>
                        <xs:attribute name="name" use="required" type="xs:string"/>
                        <xs:attribute name="status" use="optional">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="(a|[vIibo]+)"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:attribute>
                        <xs:attribute name="alliance" type="xs:string"/>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="timestamp" type="xs:integer"/>
            <xs:attribute name="serverId" type="xs:string"/>
        </xs:complexType>
    </xs:element>
</xs:schema>

This produces the following two classes (we don't care about the restriction in this case): 这将产生以下两个类(在这种情况下,我们不关心限制):

Imports System.Net
Imports System.IO
Imports System.Text
Imports System.Collections.Specialized
Imports System.Xml.Serialization
Imports System.Diagnostics
Imports System.Collections.Generic
Imports System.Linq

<XmlType(AnonymousType:=True, TypeName:="players"), XmlRoot(ElementName:="players")>
Public Class PlayerList
    <XmlElement("player", Form:=XmlSchemaForm.Unqualified, ElementName:="player")>
    Public Property Players() As New List(Of Player)

    <XmlAttribute(AttributeName:="timestamp"), DefaultValue(0)>
    Public Property Timestamp() As Integer

    <XmlAttribute(AttributeName:="serverId"), DefaultValue("")>
    Public Property ServerId() As String

    Public Function Find(PlayerName As String) As Player
        Return Players.FirstOrDefault(Function(p) p.Name = PlayerName)
    End Function
End Class

<XmlType(AnonymousType:=True, TypeName:="player"), XmlRoot("player")>
Public Class Player
    <XmlAttribute(AttributeName:="id"), DefaultValue(0)>
    Public Property Id() As Integer

    <XmlAttribute(AttributeName:="name"), DefaultValue("")>
    Public Property Name() As String

    <XmlAttribute(AttributeName:="status"), DefaultValue("")>
    Public Property Status() As String

    <XmlAttribute(AttributeName:="alliance"), DefaultValue("")>
    Public Property Alliance() As String
End Class

I've added a Find function in the PlayerList class for your button handler to call: 我在PlayerList类中添加了一个Find函数供您的按钮处理程序调用:

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles  Button2.Click
    Dim Link As String = "https://s25-pt.ogame.gameforge.com/api/players.xml"
    Dim MyPlayers As PlayerList = Nothing

    With New WebClient
        .Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.0; es-ES; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3")
        MyPlayers = Deserialize(.DownloadString(Link), GetType(PlayerList))
        .Dispose()
    End With

    Dim MyPlayer As Player = MyPlayers.Find(ListBox1.Text)
    If MyPlayer IsNot Nothing Then
        Debug.Print("Player ID: {0}", MyPlayer.Id)
        Debug.Print("Player Name: {0}", MyPlayer.Name)
        Debug.Print("Player Status: {0}", MyPlayer.Status)
        Debug.Print("Player Alliance: {0}", MyPlayer.Alliance)
    Else
        Debug.Print("Not Found")
    End If
End Sub

Private Function Deserialize(XMLString As String, ObjectType As Type) As Object
    Return New XmlSerializer(ObjectType).Deserialize(New MemoryStream(Encoding.UTF8.GetBytes(XMLString)))
End Function

Testing with Fantasma2 I get the following output: 使用Fantasma2测试,我得到以下输出:

Player ID: 100110
Player Name: Fantasma2
Player Status: vI
Player Alliance: 4762

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM