简体   繁体   English

提取标签内的数据

[英]Extract data inside a tag

I read some posts about this theme, and i try to implement the answers, but i dont have the output that i want. 我阅读了有关此主题的一些文章,并尝试实现答案,但是我没有想要的输出。

This is the code HTML 这是代码HTML

<div class="span-8">
  <dl>
    <dt>
      <a title="A Coruña" href="http://www.paginasamarillas.es/all_a-coru%C3%B1a_.html"> A Coruña</a>
    </dt>
    <dt>
      <a title="Álava" href="http://www.paginasamarillas.es/all_alava_.html"> Álava</a>
    </dt>
    <dt>
      <a title="Albacete" href="http://www.paginasamarillas.es/all_albacete_.html"> Albacete</a>
    </dt>
    <dt>
      <a title="Alicante" href="http://www.paginasamarillas.es/all_alicante_.html"> Alicante</a>
      </dt>
...
...

And i want to get "Barcelona", "Alicante","Albacete", etc. So, I try the follow code: 而且我想获取“巴塞罗那”,“阿利坎特”,“ Albacete”等。因此,我尝试使用以下代码:

var nodos = doc.DocumentNode.SelectNodes("//div[@class='container']");

and

var nodos = doc.DocumentNode.SelectNodes("//a[@title]");

or 要么

var nodos = doc.DocumentNode.SelectNodes("//div[@class='span-8']");

But doesn't work, it's like if the class "container", the attribute "title" or class "span-8" don't exist in the page. 但是不起作用,就像页面中不存在类“容器”,属性“标题”或类“ span-8”一样。 Also try others variants. 也可以尝试其他变体。 Exist others "div" with the class 'container', and others "a" with attribute 'title' in the code, that extract fine, but it's not what I want. 在代码中存在其他具有类“ container”的“ div”和其他具有属性“ title”的“ a”,它们可以很好地提取出来,但这不是我想要的。

EDIT 编辑

Sory, I explain wrong. 对不起,我解释错了。 Is not a single word, is a group of data. 不是一个字,而是一组数据。 I modify the HTML code of above. 我修改了上面的HTML代码。

I have tested your sample html and it works: 我已经测试了您的示例html,它可以工作:

string html = @"<div class=""container"">
  <div class=""span-24"">
    <div class=""span-8"">
      <dl>
        <dt>
          <a title=""A Coruña"" href=""http://www.example.com/all_example.html""> Barcelona</a>
        </dt>
      </dl>
    </div>
  </div>
</div>";

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var div = doc.DocumentNode.SelectSingleNode("//div[@class='span-8']");
if(div != null)
{
    List<string> linkTexts = div.Descendants("a")
            .Select(a => a.InnerText)
            .ToList();  // one item " Barcelona"
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM