[英]How to read a line of HTML using C#
I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. 我知道如何在txt文件中读取一行,但由于某种原因,C#没有检测到HTML文件的行尾。 This code basically opens the html file and tries to parse line by line in search of the specified string.
此代码基本上打开html文件并尝试逐行解析以搜索指定的字符串。 Even when just trying to print the first line of text in the HTML file nothign is displayed.
即使只是尝试打印HTML文件中的第一行文本,也会显示nothign。
using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
{
String line;
while ((line = sr.ReadLine()) != null)
{
if(line == ("<td><strong>String I wantstrong></td>"))
{
Label1.Text = "Text Found";
break;
}
}
}
I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file. 我已经尝试使用普通的txt文件,它完美地工作,只是在尝试解析HTML文件时。
Thanks. 谢谢。
The best way by far is the use the HTML Agility Pack 到目前为止,最好的方法是使用HTML Agility Pack
More about this can be found on a previous Stack overflow Question 有关这方面的更多信息可以在之前的Stack overflow问题中找到
You don't need to invent the wheel. 你不需要发明轮子。 Much better way to parse HTML is to use HTML parsers:
解析HTML的更好方法是使用HTML解析器:
http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx http://htmlagilitypack.codeplex.com/或http://www.justagile.com/linq-to-html.aspx
Also similar question is here What is the best way to parse html in C#? 同样类似的问题是什么在C#中解析html的最佳方法是什么?
Hope it helps. 希望能帮助到你。
如果你知道这个HTML你正在解析XHTML,为什么不使用System.XML将这个HTML解析为XML?
Your outer loop that reads line works fine. 读取行的外部循环工作正常。 My guess is one of the following is taken place:
我的猜测是发生以下情况之一:
In either case, you won't see anything printed. 在任何一种情况下,您都不会看到任何打印。
Now, to your loop: 现在,到你的循环:
You likely don't see what you expect, because 你可能看不到你的期望,因为
if(line == ("<td><strong>String I wantstrong></td>"))
{
Label1.Text = "Text Found";
break;
}
Looks for an EXACT match. 寻找一个确切的匹配。 If this is your actual code, you're missing the open bracket
</
on </strong>
and you're likely forgetting that there is white space (indentation) in your HTML content. 如果这是您的实际代码,则错过了开放式括号
</
on </strong>
,您可能会忘记HTML内容中有空格(缩进)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.