简体   繁体   English

从HTML解析属性值

[英]Parse attribute value from HTML

I want to get data from webpage: 我想从网页获取数据:

<embed type="application/x-vlc-plugin" 
    name="vlc" ratio="16:9" 
    autoplay="yes" loop="no" 
    rtsp-caching="3000" volume="100" 
    width="670" height="380" 
    target="http://185.2.42.106/stream/?d=1402956552&token=5119675517383">

I want to get the target attribute value. 我想获取目标属性值。 I searched whole internet and didn't found anything that I understood. 我搜索了整个互联网,但没有发现我所了解的任何内容。 Could someone show me the trick and explain it? 有人可以告诉我这个技巧并加以解释吗?

Ok, so I got it working with this code: 好的,因此我可以使用以下代码:

        var webGet = new HtmlWeb();
        var doc = webGet.Load("http://www.spusti.net/play-vlc-185");
        HtmlNode node = doc.DocumentNode.SelectNodes("//embed")[0];
        var val = node.Attributes["target"].Value; //10743
        MessageBox.Show(val);

Don't ask me how, but I did it. 不要问我如何,但是我做到了。 Thanks for your time CopyPaste! 感谢您的宝贵时间CopyPaste!

The thing that is discussed in the link you commented on is that you work on nodes in this library, but not on attributes directly. 您评论的链接中讨论是您在此库中的节点工作,而不是直接在属性上工作。 (You use XPATH to point to what you want to select in that library, though XPATH itself supports the selection of attributes, HTML Agility Pack does not, it just supports selecting nodes) (您可以使用XPATH指向要在该库中选择的内容,尽管XPATH本身支持选择属性,但HTML Agility Pack不支持,它仅支持选择节点)

That said, attributes being part of nodes, that should be no problem for you at all! 就是说,属性是节点的一部分,对您来说应该完全没有问题! You can easily get attributes' values from nodes that you select in Html Agility Pack . 您可以轻松地从Html Agility Pack中选择的节点获取属性值。

Since you asked for some more clarification in your comment, I hope I can clear up at least some of your troubles: 由于您要求在评论中提供更多说明,因此我希望至少可以消除您的一些麻烦:

First Html Agility Pack works on nodes, that is what an html-tag (like <embed> ) gets transformed into. 首先,Html Agility Pack在节点上起作用,这就是将html标签(例如<embed> )转换成的内容。

The value you want to get is an attribute on that node, so you need to select that node first. 您要获取的值是该节点上的属性,因此您需要首先选择该节点。

To be able to select the node you need to identify that, the easiest way (if you have control over the html source) to do that is to just assign an id to it (like <embed id='someId' someAttribute='someValue' /> ) . 为了能够选择需要识别的节点,最简单的方法(如果您可以控制html源)是为其分配一个ID (例如<embed id='someId' someAttribute='someValue' />

Now you're almost done, now you just need the xpath-expression to tell the parser where to look for your node, if you cannot assign an id to that node you will have to try to find it by document structure (for example: the embed-node inside the div-node inside body, but then you could get a collection of nodes, not just a single one, then you will need to use SelectNodes() and iterate over them and decide in each iteration if you got the right one) 现在您差不多完成了,现在您只需要xpath-expression来告诉解析器在哪里寻找您的节点,如果您不能为该节点分配一个id,则必须尝试按文档结构查找它(例如:体内的div节点内部的embed节点,但随后您可以获得节点的集合,而不仅仅是一个节点,那么您将需要使用SelectNodes()进行迭代,并在每次迭代中确定是否获得了正确对象,真爱)

TLDR; TLDR;

Following is an example setting up a simple html with two nodes of the same tag-type having different ids the code selects just one of them and outputs the attribute-value of someAttribute (note in your production code you should check for the existence of that attribute before using it ;)) : 以下是一个示例,该示例设置了一个简单的html,其中两个具有相同标签类型的节点具有不同的id,代码仅选择其中一个,并输出someAttribute的attribute-value (请注意生产代码中应检查是否存在该属性) 属性,然后再使用它;))

// to run this start a new console project in visual studio
// and paste this code into main method of program.cs
// open nuget package console and type "Install-Package HtmlAgilityPack"
// and hit enter
// add a "using HtmlAgilityPack;"
HtmlDocument doc = new HtmlDocument();
string someStupidHtml = @"
<html>
    <head>
        <title>jusATest</title>
    </head>
    <body>
        <embed id='someId' someAttribute='someValue' />
        <embed id='anotherId' someAttribute='anotherValue' />
    </body>
</html>";

byte[] byteArray = Encoding.UTF8.GetBytes(someStupidHtml);
MemoryStream stream = new MemoryStream(byteArray);
doc.Load(stream);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//embed[@id='anotherId']");

Console.WriteLine("its a node already with someAttribute={0}", node.Attributes["someAttribute"].Value);

Console.Read();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM