简体   繁体   中英

C# HtmlAgilityPack - GetAttributeValue returning false bool

I'm having trouble with the HtmlAgilityPack and the GetAttributeValue method.

In the code below, I'm expecting that my GetAttributeValue test for "href" will fail only on the html element without the attribute, however, it returns false on all elements.

using System;
using HtmlAgilityPack;

public class Program
{
    public static void Main()
    {
        var html = @"<!DOCTYPE html>
        <html>
        <body>
            <a href=""http://www.google.com"" title=""Google"" />
            <a id=""someotherlink"" title=""Some Other Title"" />
        </body>
        </html> ";
        var htmlDoc = new HtmlDocument();
        htmlDoc.LoadHtml(html);
        var node = htmlDoc.DocumentNode.SelectNodes("//a");
        foreach (var link in node)
        {
            if (link.HasAttributes)
            {
                Console.WriteLine(link.OuterHtml);
                if (link.GetAttributeValue("href", false))
                {
                    Console.WriteLine("\t" + link.Attributes["href"].Value);
                }
                else
                {
                    Console.WriteLine("\tThis link don't have a href dude");
                }
            }
        }
    }
}

The doco states that GetAttributeValue should only return the false value when the value is not found. Strangely enough, if I use the string, string signature, it works fine.

Doco at https://docs.workflowgen.com/wfgmy/v400/html/211ece6d-1ae3-7c29-b86f-e908e4766d4c.htm

It's because the value of attribute is string not a boolean and AgilityPack won't let you to convert that string to false .

By the way you can use Linq expression instead of GetAttributeValue

if (link.Attributes.Any(x => x.Name.Equals("href")))

HtmlAgilityPack contains four methods:

  1. public string GetAttributeValue(string name, string def)
  2. public int GetAttributeValue(string name, int def)
  3. public bool GetAttributeValue(string name, bool def)

Added in 2020 :

  1. public T GetAttributeValue<T>(string name, T def)

In the third case (your question was asked in 2019, at that time there was no fourth option), the method converts the attribute value to boolean:

    try
    {
        return Convert.ToBoolean(att.Value);
    }
    catch
    {
        return def;
    }

Method Convert.ToBoolean converts a string to Boolean if the string contains the string True or string False (method ignores case as well as leading and trailing white space).

If the string contains other text, an exception is thrown and the GetAttributeValue method returns the default value def .

On <a href=""http://www.google.com"" title=""Google"" /> method will try to convert string http://www.google.com to Boolean, throws an exception and returns def that is equal to False .

On <a id=""someotherlink"" title=""Some Other Title"" /> (and other HTML elements without href ) method returns def that is equal to False .

In the first case the method simply returns the attribute value:

return att.Value;

thus method with string signature works fine for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM