简体   繁体   English

如何在 C# 中使用 HTMLNode 和 HtmlAgility-Pack 从网站获取值

[英]How to get values from website using HTMLNode and HtmlAgility-Pack in C#

I'm trying to get data from this website我正在尝试从该网站获取数据

I want to get: Level, Vocation and Name from the table.我想从表中得到:Level、Vocation 和 Name。 They are located directly in tr class -> td .它们直接位于tr class -> td中。 How can I get those informations out?我怎样才能得到这些信息? This is how data looks like:这是数据的样子:

<table width="100%" class="tabi">
  <tr>
    <td colspan=7>
      Characters
    </td>
  </tr>

  <tr>
    <td height='30' style='background-color:#9f8f6d;'>
      <a href=?page=whoisonline&ord=name&sort=DESC&id=1>&#8593;Name</a>
    </td>
    <td width='240' style='background-color:#9f8f6d;'>
      <a href=?page=whoisonline&ord=voc&sort=DESC&id=1>Vocation</a>
    </td>
    <td width='120' style='background-color:#9f8f6d;'>
      <a href=?page=whoisonline&ord=lvl&sort=DESC&id=1>Level</a>
    </td>
  </tr>

  <tr class='hover'> 
    <td>
      <a href='?page=character&name=Abe' class='menulink_hs'>Abe</a>
    </td>
    <td>
      Elder Druid
    </td>
    <td>
      19
    </td>
  </tr>

Right now I'm stuck on getting this data out of tds using Nodes, with bad results.现在我一直坚持使用节点从 tds 中获取这些数据,结果很糟糕。 My htmlNodes is either NULL or it gives more than one Node(that I cant actually get out of it for some reason).我的 htmlNodes 要么是 NULL,要么它提供了多个节点(由于某种原因我实际上无法摆脱它)。 What might be good solution to this?什么可能是解决这个问题的好方法?

My code:我的代码:

var html = @"https://tibiantis.online/?page=whoisonline";
                HtmlWeb web = new HtmlWeb();
                var htmlDoc = web.Load(html);

                HtmlNode htmlNodes = htmlDoc.DocumentNode.SelectSingleNode("/html/body/div[2]/table/tbody/tr[1]/td[3]/div[2]/div[2]/table/tbody/tr[3]");
                foreach (var node in htmlNodes)
                {
                    foreach (var cell in htmlNodes.SelectNodes(".//td"))
                    {
                        listBox1.Items.Add(cell.InnerText);
                    }
                }

**I'm stuck with this.SelectNodes thing which no metter what gives me either null or too many Nodes. **我坚持使用 this.SelectNodes 东西,不管是什么给我 null 或太多节点。 I tried many combinations both with.SelectSingleNode and.SelectNode **我尝试了很多组合 with.SelectSingleNode 和 .SelectNode **

Second thing is that I've got no clue how to get number of items that I will receive.第二件事是我不知道如何获得我将收到的物品数量。

I was looking for the anwser on stack and google with some results, but noone of them was close to my situation我在堆栈和谷歌上寻找 anwser 并获得了一些结果,但没有一个接近我的情况

Try with this:试试这个:

public class Person
{
    public string Name { get; set; }
    public string Vocation { get; set; }
    public int Level { get; set; }

    public static List<Person> LoadPersons(HtmlAgilityPack.HtmlDocument doc)
    {
        var persons = new List<Person>();

        var rowsNodes = doc.DocumentNode.SelectNodes("//table//tr[contains(@class, 'hover')]");
        if (rowsNodes == null)
        {
            return persons;
        }

        foreach (var rowNode in rowsNodes)
        {
            var cells = rowNode.SelectNodes(".//td");
            if (cells != null && cells.Count >= 3)
            {
                var name = cells[0].InnerText;
                var vocation = cells[1].InnerText;
                var levelText = cells[2].InnerText;

                if (int.TryParse(levelText, out int level))
                {
                    persons.Add(new Person
                    {
                        Name = name,
                        Vocation = vocation,
                        Level = level
                    });
                }
            }
        }

        return persons;
    }
}

This class represent a person (a row in the table) and include a method to scrap the table.这个 class 代表一个人(表中的一行)并包含一个废弃表的方法。 When you make scraping you must try to be a bit general because putting all tags in the query makes the query to fail with a bit HTML change.当您进行抓取时,您必须尝试变得有点笼统,因为将所有标签放入查询中会使查询失败并发生 HTML 位更改。

I simply search in the document (//) a table and, inside a table (// because maybe some browsers add tbody or not automatically), select all rows (tr) with the "hover" class (your persons).我只是在文档 (//) 中搜索一个表格,然后在表格中搜索(// 因为可能某些浏览器会自动添加 tbody 或不自动添加),select 所有行 (tr) 都带有“悬停” class(您的人员)。

Iterate each row getting the 3 cells texts.迭代每一行获取 3 个单元格文本。 The last one (the level), convert to integer. And then, create the person.最后一个(级别),转换为 integer。然后,创建人。

Now, you can create a class to define each item in your list.现在,您可以创建一个 class 来定义列表中的每个项目。 I almost always create a class to work with the class when I get an item from the ListBox (get selected item as PersonItem and do any work with it...):当我从 ListBox 中获取一个项目时,我几乎总是创建一个 class 来使用 class(将选定的项目作为 PersonItem 并对其进行任何处理......):

public class PersonItem
{
    public PersonItem(Person person)
    {
        this.Person = person;
    }

    public Person Person { get; }

    public override string ToString()
    {
        return $"{this.Person.Name} ({this.Person.Level})";
    }
}

It's simply a wrapper around Person.它只是 Person 的包装器。 Override ToString with the text to show in the ListBox.用要在 ListBox 中显示的文本覆盖 ToString。

Test it:测试它:

var web = new HtmlWeb();
var doc = web.Load("https://tibiantis.online/?page=whoisonline");

var persons = Person.LoadPersons(doc);
foreach (var person in persons)
{
    var item = new PersonItem(person);
    listBox1.Items.Add(item);
}

// In any moment, you may do things like this:
var personItem = listBox1.SelectedItem as PersonItem;
if (personItem != null)
{
    var person = personItem.Person;
    // ...
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM