简体   繁体   English

需要帮助以C#解析HTML

[英]Need help for parsing HTML in C#

For personal use i am trying to parse a little html page that show in a simple grid the result of the french soccer championship. 对于个人用途,我试图解析一个小的html页面,以简单的网格显示法国足球锦标赛的结果。

var Url = "http://www.lfp.fr/mobile/ligue1/resultat.asp?code_jr_tr=J01";
WebResponse result = null;
WebRequest req = WebRequest.Create(Url);
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(0);
StreamReader sr = new StreamReader(ReceiveStream, encode);

                while (sr.Read() != -1)
                {
                    Line = sr.ReadLine();
                    Line = Regex.Replace(Line, @"<(.|\n)*?>", " ");
                    Line = Line.Replace("&nbsp;", "");
                    Line = Line.TrimEnd();
                    Line = Line.TrimStart();

and then i really dont have a clue either take line by line or the whole stream at one and how to retreive only the team's name with the next number that would be the score. 然后我真的不知道一个线索,要么是逐行,要么是整个流,以及如何只取回球队的名字和下一个将要得分的数字。

At the end i want to put both 2 team's with scores in a liste or xml to use it with an phone application 最后,我想将带分数的两个团队都放在一个liste或xml中,以将其与电话应用程序一起使用

If anyone has an idea it would be great thanks! 如果有人有想法,将非常感谢!

You could put the stream into an XmlDocument , allowing you to query via something like XPath . 您可以将流放入XmlDocument中 ,从而允许通过XPath之类的查询。 Or you could use LINQ to XML with an XDocument . 或者,您可以将LINQ to XMLXDocument一起使用。

It's not perfect though, because HTML files aren't always well-formed XML (don't we know it!), but it's a simple solution using stuff already available in the framework. 不过,它并不完美,因为HTML文件并不总是格式正确的XML(我们不知道!),但这是使用框架中已有内容的简单解决方案。

您将需要一个SgmlReader ,它可以在任何SGML文档(实际上是HTML文档)上提供类似XML的API。

You could use the Regex.Match method to pull out the team name and score. 您可以使用Regex.Match方法提取球队名称和得分。 Examine the html to see how each row is built up. 检查html以查看每一行是如何建立的。 This is a common technique in screen scraping. 这是屏幕抓取中的常用技术。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM