從網站讀取信息C＃

Question

在我的項目中，我希望能夠查看一個網站，從該網站檢索文本，並在以后使用該信息進行某些操作。

我的問題是從網站檢索數據（文本）的最佳方法是什么。 我不確定在處理靜態頁面與處理動態頁面時該如何做。

通過一些搜索，我發現了這一點：

        WebRequest request = WebRequest.Create("anysite.com");
        // If required by the server, set the credentials.
        request.Credentials = CredentialCache.DefaultCredentials;
        // Get the response.
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        // Display the status.
        Console.WriteLine(response.StatusDescription);
        Console.WriteLine();

        // Get the stream containing content returned by the server.
        using (Stream dataStream = response.GetResponseStream())
        {
            // Open the stream using a StreamReader for easy access.
            StreamReader reader = new StreamReader(dataStream, Encoding.UTF8);
            // Read the content. 
            string responseString = reader.ReadToEnd();
            // Display the content.
            Console.WriteLine(responseString);
            reader.Close();
        }

        response.Close();

因此，通過我自己運行它，我可以看到它從網站返回了html代碼，而不是我想要的。 我最終希望能夠鍵入一個站點（例如新聞文章），並返回該文章的內容。 這在c＃或Java中可能嗎？

謝謝

Answer 1

我不願意向您剎車，但這就是網頁的外觀，這是一堆長長的html標記/內容。 這將由瀏覽器呈現為您在屏幕上看到的樣子。 我能想到的唯一方法是自己解析為html。

在Google上快速搜索后，我發現了此堆棧溢出文章。 在C＃中解析html的最佳方法是什么？

但是我敢打賭，您認為這樣做會比您預期的要容易一些，但這是編程中經常挑戰問題的樂趣

Answer 2

您可以只使用WebClient：

using(var webClient = new WebClient())
{
   string htmlFromPage = webClient.DownloadString("http://myurl.com");
}

在上面的示例中， htmlFromPage將包含HTML，您可以對其進行解析以查找所需的數據。

Answer 3

您所描述的稱為Web抓取 ，並且有很多庫都針對Java和C＃做到了這一點。 目標站點是靜態還是動態並不重要，因為最后兩個站點都輸出HTML。 另一方面，JavaScript或Flash繁重的網站往往會出現問題。

Answer 4

請嘗試一下

System.Net.WebClient wc = new System.Net.WebClient();

string webData = wc.DownloadString("anysite.com");

從網站讀取信息C＃

問題描述

4 個解決方案

解決方案1
1 2013-10-07 17:31:11

解決方案2
0 2013-10-07 17:28:32

解決方案3
0 2013-10-07 17:29:09

解決方案4
0 2013-10-07 17:30:55

從網站讀取信息C＃

問題描述

4 個解決方案

解決方案1 1 2013-10-07 17:31:11

解決方案2 0 2013-10-07 17:28:32

解決方案3 0 2013-10-07 17:29:09

解決方案4 0 2013-10-07 17:30:55

解決方案1
1 2013-10-07 17:31:11

解決方案2
0 2013-10-07 17:28:32

解決方案3
0 2013-10-07 17:29:09

解決方案4
0 2013-10-07 17:30:55