简体   繁体   中英

filter extracted link from webpage using htmlagility - vb.net

Problem one:

I have a program for extracting urls of a webpage ( WebSource ) with a specific content ( /articles/ )

Dim links As New List(Of String)()
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()
htmlDoc.LoadHtml(WebSource)
For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
    Dim att As HtmlAttribute = link.Attributes("href")
    If att.Value.Contains("/articles/") Then
        links.Add(att.Value)
    End If
Next

Is it possible to search in urls and filter them by two value, for example in a tech site i want find all urls contain /articles/ and LG

Problem two:

Extracted urls are not complete HTTP address for example one of my results is

/articles/car

Instead of complete address for example

http://website.com/articles/car

How can i fix this?

you are checking ONE content now . for checking multiple items in htmlagility you can use multiple if statement as follow

If att.Value.Contains("content1") Then
    If att.Value.Contains("content2") Then
        If att.Value.Contains("content3") Then
            links.Add(att.Value)
        End If
    End If
End If

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM