简体   繁体   English

VB.net通过HTML代码搜索

[英]VB.net searching through HTML code

I'm creating a program that will search through a pages HTML source code and returns if a specified string is present, though it always comes back false, could someone have a look incase I am missing something? 我正在创建一个程序,该程序将搜索页面HTML源代码并返回是否存在指定的字符串,尽管该字符串总是返回false,有人可以看看我是否丢失了某些东西?

Private Const QUOTE As Char = """"c

Private Sub ServerStatus_Load(sender As Object, e As EventArgs) Handles MyBase.Load

    'download the page source and store it here
    Dim sourceString As String = New System.Net.WebClient().DownloadString("https://support.rockstargames.com/hc/en-us/articles/200426246")

    'call the source and validate a string exists, if not
    If (sourceString).Contains($"<div class={QUOTE}panel-base xbl{QUOTE} style={QUOTE}background-color: RGB(236, 255, 236);{QUOTE}><div class={QUOTE}marshmallowLogo{QUOTE} id={QUOTE}xboxLogo{QUOTE}>Xbox 360</div><center><span class={QUOTE}statusSpan{QUOTE} style={QUOTE}color green;{QUOTE}>Up</span></center>") = True Then
        Label1.Text = "It's there"
        ' if it does
    ElseIf (sourceString).Contains($"<div class={QUOTE}panel-base xbl{QUOTE} style={QUOTE}background-color: RGB(236, 255, 236);{QUOTE}><div class={QUOTE}marshmallowLogo{QUOTE} id={QUOTE}xboxLogo{QUOTE}>Xbox 360</div><center><span class={QUOTE}statusSpan{QUOTE} style={QUOTE}color green;{QUOTE}>Up</span></center>") = False Then
        Label1.Text = "It's not"
    End If

End Sub

End Class 末级

So I spent a few minutes analyzing the page (you're welcome), and as indicated in a comment the data is loaded via javascript and is not present in the base html returned by your original URL. 因此,我花了几分钟分析页面(不客气),如注释中所述,数据是通过javascript加载的,并不存在于原始URL返回的基本html中。 I'm not 100% sure yet, but I think you actually want to look at this address: 我还不确定100%,但是我认为您实际上是想看看这个地址:

https://supportfiles.rockstargames.com/support/serverStatus.json https://supportfiles.rockstargames.com/support/serverStatus.json

which returns a response like this: 返回如下响应:

jsonCallbackStatus(
    {
        "statuses":

            {
                "psnUpOrDownOverride": "",
                "ps4UpOrDownOverride": "",
                "xboxUpOrDownOverride": "",
                "xboxOneUpOrDownOverride": "",
                "rgscUpOrDownOverride": "",
                "psnWarningOverrideMessage": "",
                "ps4WarningOverrideMessage": "",
                "xboxWarningOverrideMessage": "",
                "xboxOneWarningOverrideMessage": "",
                "rgscWarningOverrideMessage": "",
                "pcWarningOverrideMessage": "",
                "pcUpOrDownOverride": "",
                "giantWarningOverrideMessage": ""
            },

    }
);

If I'm reading this correctly, the empty string next to each item means there's nothing wrong... no news is good news. 如果我正确地阅读了此内容,则每个项目旁边的空字符串表示没有问题,没有新闻是好消息。 This should be so much easier to parse than all that html :) Don't forget to look at both the warning and the up/down status for your platform, as well as the giantWarningOverrideMessage . 它应该比所有html都容易解析得多:)不要忘记查看平台的警告和启动/关闭状态以及giantWarningOverrideMessage

How I found this address 我如何找到这个地址

Data like this almost always comes in one of three ways: json, rss (or similar xml), or web service (soap). 像这样的数据几乎总是以以下三种方式之一出现:json,rss(或类似的xml)或Web服务(soap)。 A web service would usually be loaded and parsed at the server, and then sent with the html, and rss is harder to parse in javascript and less popular recently, so I went for json first. 通常将在服务器上加载和解析Web服务,然后将其与html一起发送,而rss很难在javascript中进行解析,并且最近不太流行,因此我首先使用json。

I started by opening the page in chrome. 我首先以chrome打开页面。 Then I opened the developer tools ( F12 ) and chose the Network tab. 然后,我打开开发人员工具( F12 ),然后选择“ Network选项卡。 Now when I refresh the page I get a list of every item downloaded from the web server for this page. 现在,当我刷新页面时,将获得从Web服务器为该页面下载的每个项目的列表。 1 I then narrow down the list by just looking at the javascript downloads (the JS button in the toolbar... I'm looking for a json response). 1然后,我仅通过查看javascript下载(工具栏中的JS按钮……我正在寻找json响应)来缩小列表的范围。 This gives me a reasonable number of items, and I can narrow the search further by only looking at 200 status responses, of which I only saw two: both from this address. 这给了我合理数量的项目,并且我可以通过仅查看200状态响应来进一步缩小搜索范围,其中我只看到两个:都来自此地址。

Note that the full address actually looked like this: 请注意,完整地址实际上看起来像这样:

https://supportfiles.rockstargames.com/support/serverStatus.json?callback=jsonCallbackStatus&callback=jsonCallbackStatus&_=1465445182216 https://supportfiles.rockstargames.com/support/serverStatus.json?callback=jsonCallbackStatus&callback=jsonCallbackStatus&_=1465445182216

There's a bug in the page, as it makes no sense to have a callback url parameter twice, especially with the same value. 页面中存在一个错误,因为两次callback URL参数毫无意义,尤其是使用相同的值时。 I only bring this up because of the _ url parameter. 我只是因为_ url参数才提出这个问题。 Cut the last 3 digits off of that value and you end up with a unix timestamp that happens to match today's date. 将该值减去最后3位数字,最后得到一个恰好与今天的日期匹配的unix时间戳。 You may want to generate a url which includes a timestamp like this, as it's possible that Rockstar uses the timestamp on the server to avoid serving a cached response. 您可能想生成一个包含这样的时间戳的url,因为Rockstar可能会在服务器上使用该时间戳以避免提供缓存的响应。 You'd hate to a get a response cached an hour ago when everything was fine if a server is down now. 您不希望在一个小时前获得响应缓存,如果服务器现在关闭,一切都很好,那么该响应将被缓存。

One last reminder: I'm not 100% sure this is the data you need. 最后提醒一下:我不是100%确定这是您需要的数据。 It's possible it comes from another request. 它可能来自另一个请求。 But this is all you get for free :) Hopefully the write up of how I got this far is enough for you to do your own detective work verifying the result. 但这就是您免费获得的全部:)希望我能做到这一点的文章足以使您自己进行侦探工作,以验证结果。

Of course, you also have the option of using a WebBrowser control, which would run the javascript. 当然,您还可以选择使用WebBrowser控件,该控件将运行javascript。 But it's way slower, you're back to parsing the ugly html, and any little html change will break your code (whereas the json result is likely to live through several web site redesigns). 但它的方法要慢,你又回到了解析HTML难看,和任何一个小的HTML变化会破坏你的代码(而JSON结果很可能通过几个网站重新设计生活)。

Source code to read the data 读取数据的源代码

Dim unixTime As ULong = (DateTime.UtcNow - New DateTime(1970, 1, 1, 0, 0, 0)).TotalMilliSeconds
Using wc As New WebClient(),
      rdr As New StreamReader(wc.OpenRead($"https://supportfiles.rockstargames.com/support/serverStatus.json?_={unixTime}"))

    Dim line = rdr.ReadLine()
    While line IsNot Nothing
        line = line.Trim()
        If line.StartsWith("""xboxUpOrDownOverride") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Up/Down Failed")
            Else
                Console.WriteLine("Up/Down Okay")
            End If
        End If
        If line.StartsWith("""xboxWarningOverrideMessage") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Warning Failed")
            Else
                Console.WriteLine("Warning Okay")
            End If
        End If
        If line.StartsWith("""giantWarningOverrideMessage") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Giant Warning Failed")
            Else
                Console.WriteLine("Giant Warning Okay")
            End If
        End If
        line = rdr.ReadLine()
    End While

You should also consider using a real json parser (very easy to do via NuGet), as even something as simple as adding a minimizer would break this existing code by pushing everything into one line. 您还应该考虑使用真正的json解析器(非常容易通过NuGet进行操作),因为即使添加最小化器之类的简单操作也会通过将所有内容压入一行来破坏现有代码。


1 And there were a lot of things downloaded. 1并且下载了很多东西。 Rockstar should invest in a bundler to minimize http requests for faster page loads and lower bandwidth, especially on mobile devices. Rockstar应该投资捆绑软件,以最大程度地减少HTTP请求,以加快页面加载速度并降低带宽,尤其是在移动设备上。

Reference code for anyone who cannot use VS2015 (VB14): 不能使用VS2015(VB14)的任何人的参考代码:

Private Const QUOTE As Char = """"c

Private Sub ServerStatus_Load(sender As Object, e As EventArgs) Handles MyBase.Load

    'download the page source and store it here
    Dim sourceString As String = New System.Net.WebClient.DownloadString("https://support.rockstargames.com/hc/en-us/articles/200426246")

    'call the source and validate a string exists, if not

Label1.Text = If(sourceString.Contains(String.Format(
"<div class={0}panel-base xbl{0} style={0}background-color: RGB(236, 255, 236);{0}><div class={0}marshmallowLogo{0} id={0}xboxLogo{0}>Xbox 360</div><center><span class={0}statusSpan{0} style={0}color green;{0}>Up</span></center>",
QUOTE)),"It's there", "It's not")

    End If
End Sub
End Class

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM