I need to parse an HTML string that I receive from a server.
<html>
<head/>
<body style="margin: 0;padding: 0">
<a href="http://itunes.apple.com/WebObjects/MZStore.woa
/wa/viewSoftware?id=319737742&mt=8&uo=6" style="margin: 0;padding: 0"><img
src="https://s3.amazonaws.com/sportschatter/postcard.jpg" style="margin: 0;padding:
0"/></a>
</body>
</html>
This is the response I get from the server. I need to retrieve the img
URL https://s3.amazonaws.com/sportschatter/postcard.jpg
as well as the href
part. I have HTML Agility pack for WP7, but I don't know how to write the query to get this information. I tried something like this:
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(htmlString);
var value = document.DocumentNode.Descendants("img src").
Select(
x =>
x.InnerText);
This does not give me any value. I also tried Regex
:
string parseString = htmlstring;
Regex expression = new Regex(@".*img src=(\d+).*$");
Match match = expression.Match(parseString);
MessageBox.Show(match.Groups[1].Value);
but this does not work either. Please let me know what I am doing wrong.
You clearly misunderstood how you're meant to use the LINQ2XML syntax (without XPath, since XPath isn't supported on Windows Phone)
You need to do something like this instead:
var image = document.DocumentNode.Descendants("img").First()
var source = image.GetAttribute("src", "").Value;
Use HtmlAgilityPack - do not use regex.
The 'query string' inside Descendants
is an XPath, not CSS-like selector.
Here's an example: http://htmlagilitypack.codeplex.com/wikipage?title=Examples Here's some info about XPath: http://msdn.microsoft.com/en-us/library/ms256086.aspx
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.