简体   繁体   English

使用vb.net提取html文件中的特定文本

[英]Pulling out specific text in an html file using vb.net

I am trying to get three values from a large html file. 我试图从一个大的html文件中获取三个值。 I thought I could use the substring method, but was informed that the position of the data may change. 我以为可以使用substring方法,但得知数据的位置可能会改变。 Basically, in the following code I need to pick out "Total number of records: 106", "Number of records imported:106", and "Number of records rejected: 0" 基本上,在下面的代码中,我需要选择“记录总数:106”,“导入的记录数:106”和“拒绝的记录数:0”

<B>Total number of records : </B>106</Font><br><Font face="arial" size="2"><B>Number of records imported : </B>106</Font><br><Font face="arial" size="2"><B>Number of records rejected : </B>0</Font>

I hope this is clear enough. 我希望这足够清楚。 Thanks in advance! 提前致谢!

Simple string operations like IndexOf() and Substring() should be plenty to do the job. IndexOf()Substring()这样的简单字符串操作应该足以完成这项工作。 Regular Expressions would be another approach that'd take less code (and may allow more flexibility if the HTML tags can vary), but as Mark Twain would say, I didn't have time for a short solution, so I wrote a long one instead. 正则表达式将是另一种需要较少代码的方法(如果HTML标签可以变化,则可能会提供更大的灵活性),但是正如Mark Twain所说的那样,我没有时间寻求一个简短的解决方案,所以我写了一个很长的解决方案代替。

In general you'll get better results around here by showing you've at least made a reasonable attempt first and showing where you got stuck. 通常,通过显示您至少已首先进行了合理的尝试并显示出卡住的位置,您将在此处获得更好的结果。 But for this time...here you go. 但是这次……你去了。 :-) :-)

Private Shared Function GetMatchingCount(allInputText As String, textBefore As String, textAfter As String) As Integer?

    'Find the first occurrence of the text before the desired number
    Dim startPosition As Integer = allInputText.IndexOf(textBefore)

    'If text before was not found, return Nothing
    If startPosition < 0 Then Return Nothing

    'Move the start position to the end of the text before, rather than the beginning.
    startPosition += textBefore.Length

    'Find the first occurrence of text after the desired number
    Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)

    'If text after was not found, return Nothing
    If endPosition < 0 Then Return Nothing

    'Get the string found at the start and end positions
    Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)

    'Try converting the string found to an integer
    Try
        Return CInt(textFound)
    Catch ex As Exception
        Return Nothing
    End Try
End Function

Of course, it'll only work if the text before and after is always the same. 当然,只有前后文本始终相同时,它才起作用。 If you use that with a driver console app like this (but without the Shared , since it'd be in a Module then)... 如果您将其与这样的驱动程序控制台应用程序一起使用(但不使用Shared ,因为它将位于Module )...

Sub Main()
    Dim allText As String = "<B>Total number of records : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records imported : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records rejected : </B>0</Font>"""""

    Dim totalRecords As Integer? = GetMatchingCount(allText, "<B>Total number of records : </B>", "<")
    Dim recordsImported As Integer? = GetMatchingCount(allText, "<B>Number of records imported : </B>", "<")
    Dim recordsRejected As Integer? = GetMatchingCount(allText, "<B>Number of records rejected : </B>", "<")

    Console.WriteLine("Total: {0}", totalRecords)
    Console.WriteLine("Imported: {0}", recordsImported)
    Console.WriteLine("Rejected: {0}", recordsRejected)
    Console.ReadKey()
End Sub

...you'll get output like so: ...您将获得如下输出:

Total: 106 合计:106

Imported: 106 进口:106

Rejected: 0 拒绝:0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM