简体   繁体   中英

How to extract data from an HTML file using HTML Agility pack

So I'm trying to extract all the team names from the URL that's in the code using the HTML agility pack, as of right now only the first team name is being outputted to the console and there is more than one team.

If someone could point me in the right direction on how to solve my problem that'd be great, thanks.

namespace Html_Parser
{
public partial class MainPage : ContentPage
{
    public MainPage()
    {
        InitializeComponent();
        StartWebCrawl();



    }
    private static async Task StartWebCrawl()
    {
        var url = "http://challonge.com/lhswaterwars17/module";

        var httpClient = new HttpClient();
        var html = await httpClient.GetStringAsync(url);

        var htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);

        var divs = htmlDocument.DocumentNode.Descendants("div")
            .Where(node => node.GetAttributeValue("class", "")
            .Equals("tournament-bracket--search-layer")).ToList();

        var participants = new List<Particpants>();

        foreach (var div in divs)
        {

            var participant = new Particpants
            {
                 TeamName = div.Descendants("div").FirstOrDefault().InnerText

            };

            participants.Add(participant);


        }   
         foreach(var name in participants)
        {
            Debug.WriteLine(name.TeamName);
        }

    }

}
public class Particpants
{
    public string TeamName {get; set;}

}

}

You can change your css selector to be the equivalent of 'svg.match--player title' eg in a Chrome console type: jQuery('svg.match--player title')

This will enumerate all Participants from the Finals to Round 1. So in other words the first 6 (0-5) will be blank. And there will be duplicates to handle.

Off-topic. Personally I prefer the AngleSharp library available through Nuget.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM