繁体   English   中英

在Vb.net中使用正则表达式提取电话号码

[英]Using regex in Vb.net to extract phone numbers

我写了这段代码来从Web链接中提取手机号码,基本上,我在列表框中有三个链接,并在下面尝试使用RegEx提取电话号码时使用下面的代码获取其源代码,但我又得到了相同的号码,再次。 这是我写的完整代码! 我提取链接的网站是

http://bolee.com/nf/all-results

Dim doc As New HtmlAgilityPack.HtmlDocument()

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    If ListBox1.Items.Count = 0 Then
        MsgBox("Please Extract Links First")
    Else
        ListBox1.SelectedIndex = 0
    End If
End Sub

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
    ScrapLinks()
End Sub



Private Function ScrapLinks()
    Dim hw As New HtmlWeb()
    Try
        doc = hw.Load(TextBox1.Text)
        doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())

        For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")

            Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)

            If hrefValue.Contains("/detail/") Then
                ListBox1.Items.Add(hrefValue)
            End If
        Next

        Dim items(ListBox1.Items.Count - 1) As Object
        ListBox1.Items.CopyTo(items, 0)
        ListBox1.Items.Clear()
        ListBox1.Items.AddRange(items.AsEnumerable().Distinct().ToArray())
        lbllinks.Text = ListBox1.Items.Count

    Catch ex As Exception
        MsgBox("Error " + ex.Message)

    End Try
    Return Nothing

End Function
Private Sub ListBox1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ListBox1.SelectedIndexChanged
        Try
        Dim re As New Regex("(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")

        ' For Each link As String In ListBox1.Items

        Dim hw As New HtmlWeb()
        doc = hw.Load(ListBox1.SelectedItem)
        Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText

        '    For Each match As Match In re.Matches(data)

        TextBox2.Text = Data


        '    Next
        'Next

    Catch ex As Exception
        MsgBox("Error " + ex.Message)

    End Try
End Sub

这是我得到的输出样本

03152405552 03152405552 03152405552 03152405552 03152405552 03152405552

尝试改用以下代码:

Try

    For Each link As String In ListBox1.Items
        Listbox1.SelectedIndex += 1
        Dim hw As New HtmlWeb()
        doc = hw.Load(ListBox1.SelectedItem)
        Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText

        For Each match As Match In Regex.Matches(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
            TextBox2.Text += vbNewLine & match.Value
        Next
    Next

Catch ex As Exception
    MsgBox("Error " + ex.Message)

End Try

这个想法是在每个新的输入数据上创建一个新的正则表达式,以避免任何缓存。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM