简体   繁体   中英

Web Page scraping - WP8 - HTMLAgilityPack

please tell me what's the problem in fetching lyrics from http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html . I want that only lyrics will be fetched only. thank you in advance

   protected async override void OnNavigatedTo(NavigationEventArgs e)
    {
        base.OnNavigatedTo(e);
        string htmlPage = "";
        using (var client = new HttpClient())
        {
            htmlPage = await client.GetStringAsync("http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html/");
        }

        HtmlDocument htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(htmlPage);

        List<Lyrics> lyrics = new List<Lyrics>();

        foreach (var div in htmlDocument.DocumentNode.SelectNodes("//div[@style='margin-left:10px;margin-right:10px']"))
        {
            Lyrics newMovie = new Lyrics();
           newMovie.Summary= div.SelectSingleNode("br\\").InnerText.Trim();
           //newMovie.Summary =    div.SelectSingleNode(".//div[@id='lyrics']").InnerText.Trim();
           //newMovie.Title = div.SelectSingleNode(".//div[@class='title']").InnerText.Trim();
            lyrics.Add(newMovie);
        }

        lstMovies.ItemsSource = lyrics;
    }
}

}

Your query is wrong.

//div[@style='margin-left:10px;margin-right:10px']

should be

//div[@id='main']/div[3]

I wrote an article about scraping if you want : Get content from a webpage or “How to Scrape the Sky” .


By the way, azlyrics.com is powered by musicxmatch. Maybe you should check their API instead of scraping? Safe drinking water starts at the source.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM