簡體   English   中英

如何使用asp.net從網頁中抓取數據

[英]How to scrape data from web page using asp.net

我想從我的 html 頁面中獲取值。 我試圖使用 HttpWebRequest 獲得相同的結果,但到目前為止我無法做到,請幫忙?

    <div class="container">

        <div class="one-third column">
<ol start="181">
<li><a href="/lyrics/hindi-lyrics-of-Aaye%20Din%20Bahar%20Ke.html">Aaye Din Bahar Ke</a>
</li><li><a href="/lyrics/hindi-lyrics-of-Aayega%20Aane%20Wala.html">Aayega Aane Wala</a>
</li><li><a href="/lyrics/hindi-lyrics-of-Aayi%20Milan%20Ki%20Raat.html">Aayi Milan Ki Raat</a>

</li><li><a href="/lyrics/hindi-lyrics-of-Aiyyaa.html">Aiyyaa</a>
</li><li><a href="/lyrics/hindi-lyrics-of-Ajab%20Gazabb%20Love.html">Ajab Gazabb Love</a>

</li></ol>
    </div>

<div class="sixteen columns">

<hr>
More Pages: 
<a href="hindi-songs-starting-A.html">1</a> : <a href="hindi-songs-starting-A-page-2.html">2</a> : 3 : <a href="hindi-songs-starting-A-page-4.html">4</a> : <a href="hindi-songs-starting-A-page-5.html">5</a> : <a href="hindi-songs-starting-A-page-6.html">6</a> : 
        <hr>
<center>

<h4>Hindi Lyrics By Movie Title</h4>
<p>         
<a href="/lyrics/hindi-songs-starting-0.html">0-9</a>
<a href="/lyrics/hindi-songs-starting-A.html">A</a>
<a href="/lyrics/hindi-songs-starting-B.html">B</a>

<a href="/lyrics/hindi-songs-starting-W.html">W</a>
X
<a href="/lyrics/hindi-songs-starting-Y.html">Y</a>
<a href="/lyrics/hindi-songs-starting-Z.html">Z</a>
 | <a href="http://www.hindilyrics.net/songs/">Top Songs</a>
</p>
</center>

    </div>

這是我的 html,我想獲取所有鏈接

我們可以使用 htmlagilitypack 進行抓取。 你可以從這里下載http://htmlagilitypack.codeplex.com/

string urls = "your web page";
        string result = string.Empty;

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urls);
        request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";

        using (var stream = request.GetResponse().GetResponseStream())
        using (var reader = new StreamReader(stream, Encoding.UTF8))
        {
            result = reader.ReadToEnd();
        }

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(new StringReader(result));

        var elements = doc.DocumentNode.SelectNodes("//div[@class='one-third column']");
        foreach (HtmlNode item in elements)
        {
            var node1 = item.SelectNodes(".//li");
            foreach (HtmlNode li in node1)
            {
                var a = li.SelectSingleNode("//a").Attributes["href"].Value;//your link
            }

        }

您可以使用System.Net.Http.HttpClient類及其GetAsync()方法。 HttpClient 類具有很好的異步下載網站功能。 或者您可以使用WebRequest類 - 一種非常基本的方法。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM