繁体   English   中英

VB.Net将网站上的html表保存到文本文件中

[英]VB.Net save html tables on website into a text file

我是VB.Net编程和制作程序的新手。 该程序应该从http://www.xe.com/currencytables/?from=AUD&date=2014-09-18读取数据表,并将该表保存在文本文件中。 我一直在通过网络进行研究,但无法获得任何答案。 如果有人可以帮助我,我会喜欢的。 以下是我到目前为止所拥有的

Private Sub Button6_Click(sender As Object, e As EventArgs) Handles Button6.Click

    Dim document As New HtmlAgilityPack.HtmlDocument
    Dim myHttpWebRequest = CType(WebRequest.Create("http://www.xe.com/currencytables/?from=AUD&date=2014-09-18"), HttpWebRequest)

    myHttpWebRequest.UserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
    Dim streamRead = New StreamReader(CType(myHttpWebRequest.GetResponse(), HttpWebResponse).GetResponseStream)
    Dim res As HttpWebResponse = myHttpWebRequest.GetResponse()
    document.Load(res.GetResponseStream, True)

    Dim tabletag2 As HtmlNode = document.DocumentNode.SelectSingleNode("//div[@class='ICTtableDiv']//tbody")
    If tabletag2 IsNot Nothing Then
        My.Computer.FileSystem.WriteAllText("C:\temp\test.txt", tabletag2.InnerHtml, False)
    Else
        MsgBox(Nothing)
    End If
    Debug.WriteLine("finished")
End Sub

这将保存一个文本文件,但是文本文件中的数据是表的html代码。 我只需要表格文字。 谁能帮忙吗?

上述链接中的HTML表格如下所示

<div class="ICTtableDiv">
                        <table id='historicalRateTbl' class='tablesorter ICTTable'>
                            <thead> 
                                <tr>
                                    <th class="ICTCurrencyCode">
                                        Currency code
                                        <span class="nonSortAppend">&#9650;&#9660;</span>
                                    </th>
                                    <th class="ICTCurrencyName">
                                        Currency name
                                        <span class="nonSortAppend">&#9650;&#9660;</span>
                                    </th>
                                    <th class="ICTRateHeader">Units per AUD</th>
                                    <th class="ICTRateHeader">AUD per Unit</th>
                                </tr>
                            </thead>
                            <tbody>
                        <tr><td><a href='/currency/usd-us-dollar'>USD</a></td><td>US Dollar</td><td class="ICTRate">0.8982463498</td><td class="ICTRate">1.1132803381</td></tr><!-- <tr><td><a href='/currency/usd-us-dollar'>USD</a></td><td>US Dollar</td><td class="ICTRate">1.5525826958</td><td class="ICTRate">0.6440880751</td></tr> --><tr><td><a href='/currency/eur-euro'>EUR</a></td><td>Euro</td><td class="ICTRate">0.6955704202</td><td class="ICTRate">1.4376689563</td></tr><!-- <tr><td><a href='/currency/eur-euro'>EUR</a></td><td>Euro</td><td class="ICTRate">1.2973942472</td><td class="ICTRate">0.7707757316</td></tr> --><tr><td><a href='/currency/gbp-british-pound'>GBP</a></td><td>British Pound</td><td class="ICTRate">0.5485743518</td><td class="ICTRate">1.8229069527</td></tr><!-- <tr><td><a href='/currency/gbp-british-pound'>GBP</a></td><td>British Pound</td><td class="ICTRate">0.6505821652</td><td class="ICTRate">1.5370848656</td></tr> --><tr><td><a href='/currency/inr-indian-rupee'>INR</a></td><td>Indian Rupee</td><td class="ICTRate">54.5819382185</td><td class="ICTRate">0.0183210790</td></tr>

我想要的是

USD美元0.8982463498 1.1132803381

表中的每个条目。

以下方法适用于网站和所需的表格。 它将所有提取的行写入文件,其中每个字段都用逗号String.Join("," ...)根据需要更改String.Join("," ...) )。

这是循环和LINQ的混合体,我发现它更具可读性(在VB.NET中):

Dim table = document.DocumentNode.SelectSingleNode("//table[@class='tablesorter ICTTable']")
Dim allCSVLines As New List(Of String)
If table IsNot Nothing Then
    Dim rows = table.SelectNodes("tr")
    If rows Is Nothing AndAlso table.SelectSingleNode("tbody") IsNot Nothing Then
        rows = table.SelectSingleNode("tbody").SelectNodes("tr")
    End If
    For Each row As HtmlNode In rows
        Dim fields = From td In row.SelectNodes("th|td").Cast(Of HtmlNode)()
                     Select td.InnerText
        Dim csvLine = String.Join(",", fields)
        allCSVLines.Add(csvLine)
    Next
    File.WriteAllLines("C:\temp\test.txt", allCSVLines)
End If

结果(缩短,因为总共166行):

USD,US Dollar,0.8982463498,1.1132803381
EUR,Euro,0.6955704202,1.4376689563
GBP,British Pound,0.5485743518,1.8229069527
INR,Indian Rupee,54.5819382185,0.0183210790
AUD,Australian Dollar,1.0000000000,1.0000000000
CAD,Canadian Dollar,0.9832756941,1.0170087657
SGD,Singapore Dollar,1.1388903049,0.8780476888
CHF,Swiss Franc,0.8394278948,1.1912875498
MYR,Malaysian Ringgit,2.9181565764,0.3426820919
JPY,Japanese Yen,97.6309788591,0.0102426506
CNY,Chinese Yuan Renminbi,5.5165706143,0.1812720384
NZD,New Zealand Dollar,1.1033232455,0.9063526977
....

由于您在获取所需结果方面遇到困难,因此这是我用来加载文档的代码。 与您上面发布的代码相同。 因此,目前尚不清楚为什么它对您不起作用:

Dim document As New HtmlAgilityPack.HtmlDocument
Dim myHttpWebRequest = CType(WebRequest.Create("http://www.xe.com/currencytables/?from=AUD&date=2014-09-18"), HttpWebRequest)
myHttpWebRequest.UserAgent = "Mozilla/5.0 (compat ble; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
Dim streamRead = New StreamReader(CType(myHttpWebRequest.GetResponse(), HttpWebResponse).GetResponseStream)
Dim res As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
document.Load(res.GetResponseStream, True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM