[英]Using regex in Vb.net to extract phone numbers
我写了这段代码来从Web链接中提取手机号码,基本上,我在列表框中有三个链接,并在下面尝试使用RegEx提取电话号码时使用下面的代码获取其源代码,但我又得到了相同的号码,再次。 这是我写的完整代码! 我提取链接的网站是
http://bolee.com/nf/all-results
Dim doc As New HtmlAgilityPack.HtmlDocument()
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
If ListBox1.Items.Count = 0 Then
MsgBox("Please Extract Links First")
Else
ListBox1.SelectedIndex = 0
End If
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
ScrapLinks()
End Sub
Private Function ScrapLinks()
Dim hw As New HtmlWeb()
Try
doc = hw.Load(TextBox1.Text)
doc.LoadHtml(doc.DocumentNode.SelectSingleNode("//*[@id='ad_list']").InnerHtml())
For Each link As HtmlNode In doc.DocumentNode.SelectNodes("//a[@href]")
Dim hrefValue As String = link.GetAttributeValue("href", String.Empty)
If hrefValue.Contains("/detail/") Then
ListBox1.Items.Add(hrefValue)
End If
Next
Dim items(ListBox1.Items.Count - 1) As Object
ListBox1.Items.CopyTo(items, 0)
ListBox1.Items.Clear()
ListBox1.Items.AddRange(items.AsEnumerable().Distinct().ToArray())
lbllinks.Text = ListBox1.Items.Count
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
Return Nothing
End Function
Private Sub ListBox1_SelectedIndexChanged(sender As Object, e As EventArgs) Handles ListBox1.SelectedIndexChanged
Try
Dim re As New Regex("(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
' For Each link As String In ListBox1.Items
Dim hw As New HtmlWeb()
doc = hw.Load(ListBox1.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
' For Each match As Match In re.Matches(data)
TextBox2.Text = Data
' Next
'Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
End Sub
这是我得到的输出样本
03152405552 03152405552 03152405552 03152405552 03152405552 03152405552
尝试改用以下代码:
Try
For Each link As String In ListBox1.Items
Listbox1.SelectedIndex += 1
Dim hw As New HtmlWeb()
doc = hw.Load(ListBox1.SelectedItem)
Dim data = doc.DocumentNode.SelectSingleNode("//*[@class='det_ad f_left']").InnerText
For Each match As Match In Regex.Matches(data, "(\+92|0092)-?\d{3}-?\d{7}|\d{11}|\d{4}-\d{7}")
TextBox2.Text += vbNewLine & match.Value
Next
Next
Catch ex As Exception
MsgBox("Error " + ex.Message)
End Try
这个想法是在每个新的输入数据上创建一个新的正则表达式,以避免任何缓存。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.