使用C＃在HTML中解析完整字符串

Question

I have the following two examples of html- 我有以下两个html-示例

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word"></a> blue elephant  &middot;

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word">@<b>word</b></a> blue elephant  &middot;

I am trying to parse this using C# to put into a csv file and it is working to an extent however, when the html contains the '@' symbol in it, it will either leave the csv cell blank or not include the word with '@' before it. 我正在尝试使用C＃将此解析到csv文件中，并且在某种程度上可以正常工作，但是当html中包含'@'符号时，它将使csv单元格留空或不包含带有' @'。 The main part I am trying to get is @word blue elephant however this is bringing back a blank cell, whereas the first html example brings back blue elephant as desired. 我要获取的主要部分是@word blue elephant但是这带回了一个空白单元格，而第一个html示例根据需要带回了blue elephant 。

I am using the following technique to do this- 我正在使用以下技术来做到这一点-

string[] comm = System.Text.RegularExpressions.Regex.Split(content[1], "<a");

How can I alter this to work for the second html example? 我如何更改它以使其适用于第二个html示例？

Answer 1

You want to use a proper HTML parser like the one in HTML agility pack in this situation (and save yourself from invoking the wrath of Cthulhu ) 您想在这种情况下使用适当的HTML解析器，例如HTML敏捷包中的解析器（并避免遭受Cthulhu的愤怒）

Some examples of how to use it 一些使用方法的例子

使用C＃在HTML中解析完整字符串

问题描述

1 个解决方案

解决方案1
6 已采纳 2011-10-24 21:53:04

使用C＃在HTML中解析完整字符串

问题描述

1 个解决方案

解决方案1 6 已采纳 2011-10-24 21:53:04

解决方案1
6 已采纳 2011-10-24 21:53:04