简体   繁体   English

如何使用C#读取一行HTML

[英]How to read a line of HTML using C#

I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. 我知道如何在txt文件中读取一行,但由于某种原因,C#没有检测到HTML文件的行尾。 This code basically opens the html file and tries to parse line by line in search of the specified string. 此代码基本上打开html文件并尝试逐行解析以搜索指定的字符串。 Even when just trying to print the first line of text in the HTML file nothign is displayed. 即使只是尝试打印HTML文件中的第一行文本,也会显示nothign。

using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                if(line == ("<td><strong>String I wantstrong></td>"))
                {
                    Label1.Text = "Text Found";
                    break;
                }
            }
        }

I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file. 我已经尝试使用普通的txt文件,它完美地工作,只是在尝试解析HTML文件时。

Thanks. 谢谢。

The best way by far is the use the HTML Agility Pack 到目前为止,最好的方法是使用HTML Agility Pack

More about this can be found on a previous Stack overflow Question 有关这方面的更多信息可以在之前的Stack overflow问题中找到

Looking for C# HTML parser 寻找C#HTML解析器

You don't need to invent the wheel. 你不需要发明轮子。 Much better way to parse HTML is to use HTML parsers: 解析HTML的更好方法是使用HTML解析器:

http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx http://htmlagilitypack.codeplex.com/http://www.justagile.com/linq-to-html.aspx

Also similar question is here What is the best way to parse html in C#? 同样类似的问题是什么在C#中解析html的最佳方法什么?

Hope it helps. 希望能帮助到你。

如果你知道这个HTML你正在解析XHTML,为什么不使用System.XML将这个HTML解析为XML?

Your outer loop that reads line works fine. 读取行的外部循环工作正常。 My guess is one of the following is taken place: 我的猜测是发生以下情况之一:

  • The HTML file is empty HTML文件为空
  • The first line in the HTML file is empty HTML文件中的第一行是空的

In either case, you won't see anything printed. 在任何一种情况下,您都不会看到任何打印。

Now, to your loop: 现在,到你的循环:

You likely don't see what you expect, because 你可能看不到你的期望,因为

 if(line == ("<td><strong>String I wantstrong></td>"))
 {
    Label1.Text = "Text Found";
    break;
 }

Looks for an EXACT match. 寻找一个确切的匹配。 If this is your actual code, you're missing the open bracket </ on </strong> and you're likely forgetting that there is white space (indentation) in your HTML content. 如果这是您的实际代码,则错过了开放式括号</ on </strong> ,您可能会忘记HTML内容中有空格(缩进)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM