简体   繁体   English

如何使用HTML Agility Pack编辑HTML片段

[英]How do I use HTML Agility Pack to edit an HTML snippet

So I have an HTML snippet that I want to modify using C#. 所以我有一个HTML代码片段,我想用C#修改。

<div>
This is a specialSearchWord that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that specialSearchWord again.
</div>

and I want to transform it to this: 我想把它转换成这个:

<div>
This is a <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> again.
</div>

I'm going to use HTML Agility Pack based on the many recommendations here, but I don't know where I'm going. 我将根据这里的许多建议使用HTML Agility Pack,但我不知道我要去哪里。 In particular, 尤其是,

  1. How do I load a partial snippet as a string, instead of a full HTML document? 如何将部分片段加载为字符串,而不是完整的HTML文档?
  2. How do edit? 怎么编辑?
  3. How do I then return the text string of the edited object? 然后如何返回已编辑对象的文本字符串?
  1. The same as a full HTML document. 与完整的HTML文档相同。 It doesn't matter. 没关系。
  2. The are 2 options: you may edit InnerHtml property directly (or Text on text nodes) or modifying the dom tree by using eg AppendChild , PrependChild etc. 有两个选项:您可以直接编辑InnerHtml属性(或Text节点上的文本)或使用例如AppendChildPrependChild等修改dom树。
  3. You may use HtmlDocument.DocumentNode.OuterHtml property or use HtmlDocument.Save method (personally I prefer the second option). 您可以使用HtmlDocument.DocumentNode.OuterHtml属性或使用HtmlDocument.Save方法(我个人更喜欢第二个选项)。

As to parsing, I select the text nodes which contain the search term inside your div , and then just use string.Replace method to replace it: 至于解析,我选择在div包含搜索词的文本节点,然后使用string.Replace方法替换它:

var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.SelectNodes("/div/text()[contains(.,'specialSearchWord')]");
if (textNodes != null)
    foreach (HtmlTextNode node in textNodes)
        node.Text = node.Text.Replace("specialSearchWord", "<a class='special' href='http://mysite.com/search/specialSearchWord'>specialSearchWord</a>");

And saving the result to a string: 并将结果保存为字符串:

string result = null;
using (StringWriter writer = new StringWriter())
{
    doc.Save(writer);
    result = writer.ToString();
}

Answers: 回答:

  1. There may be a way to do this but I don't know how. 可能有办法做到这一点,但我不知道如何做。 I suggest loading the entire document. 我建议加载整个文档。
  2. Use a combination of XPath and regular expressions 使用XPath和正则表达式的组合
  3. See the code below for a contrived example. 请参阅下面的代码以获得一个人为的例子。 You may have other constraints not mentioned but this code sample should get you started. 您可能还有其他未提及的约束,但此代码示例应该可以帮助您入门。

Note that your Xpath expression may need to be more complex to find the div that you want. 请注意,您的Xpath表达式可能需要更复杂才能找到所需的div。

HtmlDocument doc = new HtmlDocument();

doc.Load(yourHtmlFile);
HtmlNode divNode = doc.DocumentNode.SelectSingleNode("//div[2]");
string newDiv = Regex.Replace(divNode.InnerHtml, @"specialSearchWord", 
"<a class='special' href='http://etc'>specialSearchWord</a>");
divNode.InnerHtml = newDiv;
Console.WriteLine(doc.DocumentNode.OuterHtml);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM