简体   繁体   English

使用C#从网站读取数据

[英]Reading data from a website using C#

I have a webpage which has nothing on it except some string(s). 我有一个除了一些字符串之外什么都没有的网页。 No images, no background color or anything, just some plain text which is not really that long in length. 没有图像,没有背景颜色或任何东西,只是一些长度不是很长的纯文本。

I am just wondering, what is the best (by that, I mean fastest and most efficient) way to pass the string in the webpage so that I can use it for something else (eg display in a text box)? 我只是想知道,什么是最好的(通过这种方式,我的意思是最快和最有效)的方式来传递网页中的字符串,以便我可以用它来做其他事情(例如在文本框中显示)? I know of WebClient, but I'm not sure if it'll do what I want it do and plus I don't want to even try that out even if it did work because the last time I did it took approximately 30 seconds for a simple operation. 我知道WebClient,但我不确定它是否会做我想做的事情,而且即使它确实有效,我也不想尝试它,因为我上次做的时间大约需要30秒一个简单的操作。

Any ideas would be appreciated. 任何想法,将不胜感激。

The WebClient class should be more than capable of handling the functionality you describe, for example: WebClient类应该能够处理您描述的功能,例如:

System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.yoursite.com/resource/file.htm");

string webData = System.Text.Encoding.UTF8.GetString(raw);

or (further to suggestion from Fredrick in comments) 或(进一步由Fredrick在评论中提出建议)

System.Net.WebClient wc = new System.Net.WebClient();
string webData = wc.DownloadString("http://www.yoursite.com/resource/file.htm");

When you say it took 30 seconds, can you expand on that a little more? 当你说花了30秒时,你可以再扩展一下吗? There are many reasons as to why that could have happened. 关于为什么会发生这种情况的原因有很多。 Slow servers, internet connections, dodgy implementation etc etc. 缓慢的服务器,互联网连接,狡猾的实施等。

You could go a level lower and implement something like this: 你可以降低一级并实现这样的事情:

HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.yoursite.com/resource/file.htm");

using (StreamWriter streamWriter = new StreamWriter(webRequest.GetRequestStream(), Encoding.UTF8))
{
    streamWriter.Write(requestData);
}

string responseData = string.Empty;
HttpWebResponse httpResponse = (HttpWebResponse)webRequest.GetResponse();
using (StreamReader responseReader = new StreamReader(httpResponse.GetResponseStream()))
{
    responseData = responseReader.ReadToEnd();
}

However, at the end of the day the WebClient class wraps up this functionality for you. 但是,在一天结束时,WebClient类会为您包装此功能。 So I would suggest that you use WebClient and investigate the causes of the 30 second delay. 所以我建议您使用WebClient并调查30秒延迟的原因。

If you're downloading text then I'd recommend using the WebClient and get a streamreader to the text: 如果您正在下载文本,那么我建议使用WebClient并获取文本的流读取器:

        WebClient web = new WebClient();
        System.IO.Stream stream = web.OpenRead("http://www.yoursite.com/resource.txt");
        using (System.IO.StreamReader reader = new System.IO.StreamReader(stream))
        {
            String text = reader.ReadToEnd();
        }

If this is taking a long time then it is probably a network issue or a problem on the web server. 如果这需要很长时间,那么它可能是网络问题或Web服务器上的问题。 Try opening the resource in a browser and see how long that takes. 尝试在浏览器中打开资源,看看需要多长时间。 If the webpage is very large, you may want to look at streaming it in chunks rather than reading all the way to the end as in that example. 如果网页非常大,您可能希望查看以块为单位进行流式传输,而不是像在该示例中那样一直读到最后。 Look at http://msdn.microsoft.com/en-us/library/system.io.stream.read.aspx to see how to read from a stream. 查看http://msdn.microsoft.com/en-us/library/system.io.stream.read.aspx以了解如何从流中读取。

Regarding the suggestion So I would suggest that you use WebClient and investigate the causes of the 30 second delay. 关于建议所以我建议您使用WebClient并调查30秒延迟的原因。

From the answers for the question System.Net.WebClient unreasonably slow 从问题System.Net.WebClient的答案不合理地慢

Try setting Proxy = null; 尝试设置Proxy = null;

WebClient wc = new WebClient(); WebClient wc = new WebClient(); wc.Proxy = null; wc.Proxy = null;

Credit to Alex Burtsev 感谢Alex Burtsev

 WebClient client = new WebClient();
            using (Stream data = client.OpenRead(Text))
            {
                using (StreamReader reader = new StreamReader(data))
                {
                    string content = reader.ReadToEnd();
                    string pattern = @"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)";
                    MatchCollection matches = Regex.Matches(content,pattern);
                    List<string> urls = new List<string>();
                    foreach (Match match in matches)
                    {
                            urls.Add(match.Value);
                    }

              }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM