How to get all contents of a website not only a webpage in c#

Question

How can I extract all contents of a website, not only a webpage? If we consider a website named www.abc.com , how can we get all of the contents from all of the page of this site? I have tested a code but it is to get the contents of a single page of a website only using C#.

string urlAddress = "https://www.motionflix.xyz/";

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        if (response.StatusCode == HttpStatusCode.OK)
        {
            Stream receiveStream = response.GetResponseStream();
            StreamReader readStream = null;

            if (String.IsNullOrWhiteSpace(response.CharacterSet))
                readStream = new StreamReader(receiveStream);
            else
                readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));

            string data = readStream.ReadToEnd();
            Console.WriteLine(data);
            response.Close();
            readStream.Close();
        }

Answer 1

When you load that page in a browser, it will only get (server-sided browser switching aside) what you get with your request. What the browser then does and what you need to do in your code is parse this content - it contains references (eg via <script> , <img> , <link> , <iframe> and others) that will give the URLs of the other resources to load.

It might be easier to use a prebuilt application such as wget if it does what you need or use browser automation.

Answer 2

If you wants to Download a complete website including all of its contents, then you can use a software HTTrack.HTTrack allows users to download World Wide Web sites from the Internet to a local computer.Here is the link you can follow. https://www.httrack.com/page/2/en/index.html

Answer 3

Create a list containing all the URLs that have already been scraped
Create a loop that starts with a given URL, which is added to the URL list and then scrape the content of that page and search it for href tags (=new URLs). If the new URL is not in the list already repeat step 2 with this new URL. Go on as long as there are new URLs that have not been scraped yet.

Note, that you may want to check whether an URL is still on the same Domain, otherwise you might accidently scan the whole internet.

How to get all contents of a website not only a webpage in c#

Question

3 answers

solution1
0 2020-01-26 12:28:18

solution2
0 2020-01-26 12:29:28

solution3
0 ACCPTED 2020-01-26 12:33:39

How to get all contents of a website not only a webpage in c#

Question

3 answers

solution1 0 2020-01-26 12:28:18

solution2 0 2020-01-26 12:29:28

solution3 0 ACCPTED 2020-01-26 12:33:39

solution1
0 2020-01-26 12:28:18

solution2
0 2020-01-26 12:29:28

solution3
0 ACCPTED 2020-01-26 12:33:39