简体   繁体   中英

How do I select certain 'Nodes' in a text file - based off of what a certain line contains using HTMLAgilityPack?

I know the Title says a lot, don't worry, I'll break it down for you. Ok so I have one.txt file with the words - Horsemen - in it, called TeamName.txt I have another 6.txt files with HTML code which my code fetches and downloads - this is called Ladder-1-100.txt - NOW: The easy part:

Here's the idea, the code sifts through the HTML ladder.txt file for the team name, which my code does now fine. BUT, I want it to pull out other information too, whilst inside that specific @class . Not Specfic enough in my explaination? I'll show you.

<tr class="vrml_table_row">
    <td class="pos_cell">59</td>
    <td class="div_cell"><img src="/images/div_gold_40.png" title="Gold" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/RHNkUmJMV1p5UEU90" class="team_link"><img src="/images/logos/teams/9b3b1917-a56b-40a3-80ee-52b1c9f31910.png" class="team_logo" /><span class="team_name">Echoholics</span></a></td>
    <td class="group_cell"><img src="/images/group_ame.png" class="group_logo" title="America East" /></td>
    <td class="gp_cell">14</td>
    <td class="win_cell">10</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">340</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>
<tr class="vrml_table_row">
    <td class="pos_cell">60</td>
    <td class="div_cell"><img src="/images/div_diamond_40.png" title="Diamond" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/cUJmVGlKajFGRlE90" class="team_link"><img src="/images/logos/teams/dff8310a-a429-4c60-af82-0333d530d22d.png" class="team_logo" /><span class="team_name">Horsemen</span></a></td>
    <td class="group_cell"><img src="/images/group_aa.png" class="group_logo" title="Oceania/Asia" /></td>
    <td class="gp_cell">10</td>
    <td class="win_cell">6</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">235</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>
<tr class="vrml_table_row">
    <td class="pos_cell">61</td>
    <td class="div_cell"><img src="/images/div_gold_40.png" title="Gold" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/UDd1dTJQRzBiRzQ90" class="team_link"><img src="/images/logos/teams/8eb6109e-f765-4d64-a766-cc5605a01ad0.png" class="team_logo" /><span class="team_name">Femboys</span></a></td>
    <td class="group_cell"><img src="/images/group_ame.png" class="group_logo" title="America East" /></td>
    <td class="gp_cell">12</td>
    <td class="win_cell">8</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">348</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>

Here is my current code that will spit out: Team Name: Horsemen.

                HtmlNode[] team_name = document1.DocumentNode
                    .SelectSingleNode("//*[@class='vrml_table_row']")
                    .SelectNodes("//td[@class='team_cell']")
                    .Where(x => x.InnerHtml.Contains($"{TeamName}"))
                    .ToArray();

                foreach (HtmlNode item in team_name)
                {
                    await ReplyAsync("**Team Name:** " + item.InnerHtml);
                }

However, I want it to spit out: Team Name: Horsemen, Wins: 6, Losses: 4, Games Played: 10, MMR: 1200, Points Scored: 235, Division: Diamond, Ladder Position: 60.

You get my point. As you can see, each of those classes are labeled the same, expect for their information inside. By the way, the Team Name - Horsemen - is Dynamic, meaning it can be replaced with another team name. So how do I acheive this?

A sample solution would be this one:

Firstly create a Model class

class Model
{
    public int Position { get; set; }
    public string TeamName { get; set; }
    public string ImageSource { get; set; }
    public string Division { get; set; }
    //whatever you want to store
}

After that should keep the desired nodes in HtmlNodeCollection and our model in a List:

var table = htmlDoc.DocumentNode.SelectNodes("//tr[contains(@class, 'vrml_table_row')]");
var models = new List<Model>();
foreach (var t in table)
{
    var model = new Model
    {
       //I used the first 8 columns of the desired table
        Position = int.Parse(t.SelectSingleNode("td[contains(@class, 'pos_cell')]").InnerText),
        ImageSource = t.SelectSingleNode("td[contains(@class, 'div_cell')]/img").Attributes["src"].Value,
        Division = t.SelectSingleNode("td[contains(@class, 'div_cell')]/img").Attributes["title"].Value,
        TeamLink = t.SelectSingleNode("td[contains(@class, 'team_cell')]/a").Attributes["href"].Value,
        TeamLogo = t.SelectSingleNode("td[contains(@class, 'team_cell')]/a/img").Attributes["src"].Value,
        TeamName = t.SelectSingleNode("td/a/span[contains(@class, 'team_name')]").InnerText,
        GroupLogo = t.SelectSingleNode("td[contains(@class, 'group_cell')]/img").Attributes["src"].Value,
        GroupTitle = t.SelectSingleNode("td[contains(@class, 'group_cell')]/img").Attributes["title"].Value
        // etc
     };
     models.Add(model);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM