我怎样才能使用c#将.doc,.pdf等文件从互联网下载到我的硬盘上
using (var client = new System.Net.WebClient())
{
client.DownloadFile( "url", "localFilename");
}
use the WebClient
class:
using(WebClient wc = new WebClient())
wc.DownloadFile("http://a.com/foo.pdf", @"D:\foo.pdf");
Edit based on comments:
Based on your comments I think what you are trying to do is download ie PDF files that are linked to from an HTML page. In that case you can
Download the page (with WebClient, see above)
Use the HtmlAgilityPack to find all the links within the page that point to pdf files
Download the pdf files
i am developing a crawler were if i specify a keyword for eg:SHA algorithm and i select the option .pdf or .doc from the crawler it should download the file with selected format in to a targeted folder ..
Based on your clarification this is a solution using google to get the results of the search:
DownloadSearchHits("SHA", "pdf");
...
public static void DownloadSearchHits(string searchTerm, string fileType)
{
using (WebClient wc = new WebClient())
{
string html = wc.DownloadString(string.Format("http://www.google.com/search?q={0}+filetype%3A{1}", searchTerm, fileType));
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var pdfLinks = doc.DocumentNode
.SelectNodes("//a")
.Where(link => link.Attributes["href"] != null
&& link.Attributes["href"].Value.EndsWith(".pdf"))
.Select(link => link.Attributes["href"].Value)
.ToList();
int index = 0;
foreach (string pdfUrl in pdfLinks)
{
wc.DownloadFile(pdfUrl,
string.Format(@"C:\download\{0}.{1}",
index++,
fileType));
}
}
}
In general though you should ask a question related to a particular problem you have with a given implementation that you already have - based on your question you are very far off being able to implement a standalone crawler.
最简单的方法是使用WebClient.DownloadFile
使用System.Net中的WebClient.DownloadFile()
Using WebClient.DownloadFile
http://msdn.microsoft.com/en-us/library/system.net.webclient.downloadfile.aspx
using (var client = new WebClient())
{
var data = client.DownloadFile(url, filename);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.