简体   繁体   中英

Parse attribute value from HTML

I want to get data from webpage:

<embed type="application/x-vlc-plugin" 
    name="vlc" ratio="16:9" 
    autoplay="yes" loop="no" 
    rtsp-caching="3000" volume="100" 
    width="670" height="380" 
    target="http://185.2.42.106/stream/?d=1402956552&token=5119675517383">

I want to get the target attribute value. I searched whole internet and didn't found anything that I understood. Could someone show me the trick and explain it?

Ok, so I got it working with this code:

        var webGet = new HtmlWeb();
        var doc = webGet.Load("http://www.spusti.net/play-vlc-185");
        HtmlNode node = doc.DocumentNode.SelectNodes("//embed")[0];
        var val = node.Attributes["target"].Value; //10743
        MessageBox.Show(val);

Don't ask me how, but I did it. Thanks for your time CopyPaste!

The thing that is discussed in the link you commented on is that you work on nodes in this library, but not on attributes directly. (You use XPATH to point to what you want to select in that library, though XPATH itself supports the selection of attributes, HTML Agility Pack does not, it just supports selecting nodes)

That said, attributes being part of nodes, that should be no problem for you at all! You can easily get attributes' values from nodes that you select in Html Agility Pack .

Since you asked for some more clarification in your comment, I hope I can clear up at least some of your troubles:

First Html Agility Pack works on nodes, that is what an html-tag (like <embed> ) gets transformed into.

The value you want to get is an attribute on that node, so you need to select that node first.

To be able to select the node you need to identify that, the easiest way (if you have control over the html source) to do that is to just assign an id to it (like <embed id='someId' someAttribute='someValue' /> ) .

Now you're almost done, now you just need the xpath-expression to tell the parser where to look for your node, if you cannot assign an id to that node you will have to try to find it by document structure (for example: the embed-node inside the div-node inside body, but then you could get a collection of nodes, not just a single one, then you will need to use SelectNodes() and iterate over them and decide in each iteration if you got the right one)

TLDR;

Following is an example setting up a simple html with two nodes of the same tag-type having different ids the code selects just one of them and outputs the attribute-value of someAttribute (note in your production code you should check for the existence of that attribute before using it ;)) :

// to run this start a new console project in visual studio
// and paste this code into main method of program.cs
// open nuget package console and type "Install-Package HtmlAgilityPack"
// and hit enter
// add a "using HtmlAgilityPack;"
HtmlDocument doc = new HtmlDocument();
string someStupidHtml = @"
<html>
    <head>
        <title>jusATest</title>
    </head>
    <body>
        <embed id='someId' someAttribute='someValue' />
        <embed id='anotherId' someAttribute='anotherValue' />
    </body>
</html>";

byte[] byteArray = Encoding.UTF8.GetBytes(someStupidHtml);
MemoryStream stream = new MemoryStream(byteArray);
doc.Load(stream);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//embed[@id='anotherId']");

Console.WriteLine("its a node already with someAttribute={0}", node.Attributes["someAttribute"].Value);

Console.Read();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM