简体   繁体   English

HtmlAgility-从HTML提取并替换纯文本部分(在任何标签之外)

[英]HtmlAgility - extract and replace plain text part (outside any tags) from HTML

I use HtmlAgility pack and I want to extract and replace each plain text part (not inside tags) from HTML. 我使用HtmlAgility包,并且想从HTML中提取并替换每个纯文本部分(而不是标签内)。

<html><body>bla bla 1<br />bla bla 2<br />bla bla 3<img src="img.jpg" /></body></html>

The output should be a list including bla bla 1 ; 输出应为包含bla bla 1的列表; bla bla 2 ; bla bla 2 ; bla bla 3 ; bla bla 3 ;

node.InnerText does not apply here. node.InnerText在这里不适用。

I used : 我用了 :

// loop over innerhtml and process
var thenode = document.DocumentNode.Descendants().Where(n => n.Name == "body").FirstOrDefault();
if (thenode != null)
{
    // InnerHtml replaces <br /> with <br>
    String[] strings = thenode.InnerHtml.Split(new string[] { "<br>" }, StringSplitOptions.RemoveEmptyEntries);
    foreach (String str in strings)
    {
        String lstr = str.Trim();
        if (lstr != String.Empty && !lstr.StartsWith("<"))
        {
            // do processing
            String loutput = Processing(lstr);
            thenode.InnerHtml = thenode.InnerHtml.Replace(lstr, loutput);
        }
    }
}

One possible way to replace all text nodes within <body> tag with some new text : 一种将<body>标记内的所有文本节点替换为一些新文本的可能方法:

//select all text nodes that is "direct child of <body>" and "not empty"
var textNodes = doc.DocumentNode.SelectNodes("//body/text()[normalize-space()]");
foreach (HtmlNode textNode in textNodes)
{
    textNode.ParentNode
            //replace each text node with "new text" for the sake of demo
            .ReplaceChild(HtmlNode.CreateNode("new text")
                          , textNode
            );
}

Side note: I didn't see the text nodes as outside any tag , because they are inside the <body> tag. 旁注:我没有看到文本节点在任何标签外部 ,因为它们 <body>标签内部 I see them as direct child of <body> tag. 我将它们视为<body>标记的直接子代

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM