简体   繁体   中英

System.NullReferenceException in LINQ

Based upon a previous written code snippet I'm now trying to store multiple images at once from a certain subreddit into a local directory. My Problem is that I can't get my LINQ statement working properly. I also don't want to download the thumbnail pictures which was why I took a look at the HTML-page and found out that the links I aim to retrieve are hidden in level 5 within the href attribute:

(...)
Level 1: <div class="content">...</div>
    Level 2: <div class="spacer">...</div>
        Level 3: <div class="siteTable">...</div>
            Level 4: <div class=" thing id-t3_6dj7qp odd  link ">...</div>                      
                Level 5: <a class="thumbnail may-blank outbound" href="href="http://i.imgur.com/jZ2ZAyk.jpg"">...</a>

That was my best bet in line '???':

.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))

Sadly enough it throws out an error stating that

 Object reference not set to an instance of an object

Well now I know why it's not working but I've still got no clue how to rewrite this line since I'm still fairly new to Lambda Expressions. To be honest, I don't really know why I got a System.NullReferenceException in the first place but not in the next line. What's the difference? Maybe my approach on this problem isn't even good practice at all so please let me know how I could proceed further.

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Net;
using HtmlAgilityPack;

namespace GetAllImages
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> imageLinks = new List<string>();

            // Specify Directory manually
            string dirName = "Jessica Clements";
            string rootPath = @"C:\Users\Stefan\Desktop";
            string dirPath = Path.Combine(rootPath, dirName);

            // Specify the subReddit manually
            string subReddit = "r/Jessica_Clements";
            string url = @"https://www.reddit.com/" + subReddit;

            try
            {
                DirectoryInfo imageFolder = Directory.CreateDirectory(dirPath);                

                HtmlDocument document = new HtmlWeb().Load(url);
                imageLinks = document.DocumentNode.Descendants("a")
                            .Select(element => element.GetAttributeValue("href", null))
                            .Where(???) 
                            .Where(stringLink => !String.IsNullOrEmpty(stringLink))
                            .ToList();

                foreach(string link in imageLinks)
                {
                    using (WebClient _wc = new WebClient())
                    {
                        _wc.DownloadFileAsync(new Uri(link), Path.Combine(dirPath, Path.GetFileName(link)));
                    }                        
                 }

            Console.WriteLine($"Files successfully saved in '{Path.GetFileName(dirPath)}'.");             

            }

            catch(Exception e)
            {
                while(e != null)
                {
                    Console.WriteLine(e.Message);
                    e = e.InnerException;
                }
             }

            if(System.Diagnostics.Debugger.IsAttached)
            {
                Console.WriteLine("Press any key to continue . . .");
                Console.ReadKey(true);
            }
        }
    }
}

Edit : Just in case someone is interested in this solution that's how I made it work in the end using the answers below:

HtmlDocument document = new HtmlWeb().Load(url);
imageLinks = document.DocumentNode.Descendants("a")
            .Select(element => element.GetAttributeValue("href", null))
            .Where(link => (link?.Contains(@"http://i.imgur.com") == true))
            .Distinct()
            .ToList();

Given that this line throws the exception:

.Where(link => Directory.GetParent(link).Equals(@"http://i.imgur.com"))

I'd make sure that link is not null and that the result of GetParent(link) is not null either. So you could do:

.Where(link => link != null && (Directory.GetParent(link)?.Equals(@"http://i.imgur.com") ?? false))

Notice the null check and the ?. after GetParent() . This one stops the execution of the term if null is returned from GetParent() . It is called the Null Conditional Operator or "Elvis Operator" because it can be seen as two eyes with twirly hair. The ?? false ?? false gives the default value in case the execution was stopped because of a null value.

However , if you plan to parse HTML code you should definitely have a look at the Html Agility Pack (HAP) .

if you are trying to get all links pointing to the http://i.imgur.com , you need something like this

    imageLinks = document.DocumentNode.Descendants("a")
                .Select(element => element.GetAttributeValue("href", null))
                .Where(link => link?.Contains(@"http://i.imgur.com") == true)
                .ToList();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM