[英]I have a link of a website how can i download all the files from the website?
For example: http://www.test.com My program is digging crawling. 例如: http : //www.test.com我的程序正在挖掘爬网。 So i want that it will download all the files each time. 所以我希望它每次都下载所有文件。
For example: 例如:
using (WebClient Client = new WebClient ())
{
Client.DownloadFile("http://www.abc.com/file/song/a.mpeg", "a.mpeg");
}
This will download only the specific a.mpeg file. 这将仅下载特定的a.mpeg文件。 I want to do something like: 我想做类似的事情:
using (WebClient Client = new WebClient ())
{
Client.DownloadFile(address, "*.*");
}
Since the address is changing all the time and i want to download all the files not a specific file like mpeg or jpg or avi...any extetion. 由于地址一直在变化,我想下载所有文件,而不是像mpeg或jpg或avi ...任何其他特定文件。
Doing " . " is the right way ? 做“ 。 ”是正确的方法吗?
EDIT** 编辑**
This is how i'm downloading images today: 这就是我今天下载图像的方式:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using HtmlAgilityPack;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Net;
using System.Web;
using System.Threading;
using DannyGeneral;
using GatherLinks;
namespace GatherLinks
{
class RetrieveWebContent
{
HtmlAgilityPack.HtmlDocument doc;
string imgg;
int images;
public RetrieveWebContent()
{
images = 0;
}
public List<string> retrieveFiles(string address)
{
}
public List<string> retrieveImages(string address)
{
System.Net.WebClient wc = new System.Net.WebClient();
List<string> imgList = new List<string>();
try
{
doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(wc.OpenRead(address));
string t = doc.DocumentNode.InnerText;
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
if (imgs == null) return new List<string>();
foreach (HtmlNode img in imgs)
{
if (img.Attributes["src"] == null)
continue;
HtmlAttribute src = img.Attributes["src"];
imgList.Add(src.Value);
if (src.Value.StartsWith("http") || src.Value.StartsWith("https") || src.Value.StartsWith("www"))
{
images++;
string[] arr = src.Value.Split('/');
imgg = arr[arr.Length - 1];
//imgg = Path.GetFileName(new Uri(src.Value).LocalPath);
//wc.DownloadFile(src.Value, @"d:\MyImages\" + imgg);
wc.DownloadFile(src.Value, "d:\\MyImages\\" + Guid.NewGuid() + ".jpg");
}
}
return imgList;
}
catch
{
Logger.Write("There Was Problem Downloading The Image: " + imgg);
return null;
}
}
}
}
Now in this place of code: 现在在此代码位置:
public List<string> retrieveFiles(string address)
{
}
I dont want to download only jpg files but any type of files. 我不想仅下载jpg文件,但不想下载任何类型的文件。 And if the link is for example: http://tes.com \\i.jpg why i need to parse the website instead of save as it somehow ? 如果链接是例如: http ://tes.com \\ i.jpg为什么我需要解析该网站而不是以某种方式保存它?
No, WebClient.DownloadFile will never ever act like a Crawler. 不,WebClient.DownloadFile永远不会像爬网程序那样起作用。 You would need to download the page and use a C# HtmlParser on the returned page HTML, enumerate over the resources you are interested in and download them all individually. 您需要下载页面,并在返回的页面HTML上使用C#HtmlParser ,枚举您感兴趣的资源,然后分别下载所有资源。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.