简体   繁体   中英

Add newline in HTML source code using HTML Agility Pack

I am modifying a HTML file using the HTML Agility Pack.

Here is an example on a HTML file containing tables:

Dim document As New HtmlDocument
Dim tables As Array

document.Load(path_html)

Dim div1 As HtmlNode = HtmlNode.CreateNode("<div></div>")
Dim div2 As HtmlNode = HtmlNode.CreateNode("<div></div>")

tables = document.DocumentNode.Descendants("table").ToArray()

For Each tr As HtmlNode In tables.Descendants("tr").ToArray
   tr.AppendChild(div1)
   tr.AppendChild(div2)
Next

document.save(path_html)

And here is the result in the HTML file:

<div></div><div></div>

What I would like is:

<div></div>
<div></div>

I think this should be implemented by default as it makes my HTML file unclear.

I saw this question (which is my exact issue) here but the answer is not working for me (maybe because of VB.NET and the answer is C#).

Can anyone help?

Haven't written any vb.net in a long time, so first tried this in C# :

var document = new HtmlDocument();
var div = HtmlNode.CreateNode("<div></div>");
var newline = HtmlNode.CreateNode("\r\n");
div.AppendChild(newline);
for (int i = 0; i < 2; ++i)
{
    div.AppendChild(HtmlNode.CreateNode("<div></div>"));
    div.AppendChild(newline);
}
document.DocumentNode.AppendChild(div);
Console.WriteLine(document.DocumentNode.WriteTo());

Works great - the output:

<div>
<div></div>
<div></div>
</div>

Then thought, " no way....it can't be " - note the commented lines:

Dim document = New HtmlDocument()
Dim div = HtmlNode.CreateNode("<div></div>")
' this writes the literal string...
Dim newline = HtmlNode.CreateNode("\r\n")
' this works!
' Dim newline = HtmlNode.CreateNode(Environment.NewLine)
div.AppendChild(newline)
For i = 1 To 2
    div.AppendChild(HtmlNode.CreateNode("<div></div>"))
    div.AppendChild(newline)
Next
document.DocumentNode.AppendChild(div)
Console.WriteLine(document.DocumentNode.WriteTo())

Unfortunately it is so, and probably why the question you linked to was not marked answered - the output:

<div>\r\n<div></div>\r\n<div></div>\r\n</div>

Finally, instead of using the newline string as \\r\\n tried Environment.NewLine , which does work and outputs:

<div>
<div></div>
<div></div>
</div>

Works either way in C#.

Based on this answer you would need to add in a node that represents a Carriage Return ( \\r ) and a Line Feed ( \\n ):

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("\r\n")

Based on your comment:

I tried this but it adds '\\r\\n' in my HTML, it's not going back to line.

You've already tried this and instead it prints the string literal "\\r\\n". I too have managed to replicate this issue.

Instead look at using <br> tag which is a line break:

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")

Based on your example code, your code would look something like this:

Dim newLineNode As HtmlNode = HtmlNode.CreateNode("<br>")

For Each tr As HtmlNode In tables.Descendants("tr").ToArray
   tr.AppendChild(div1)
   tr.AppendChild(newLineNode)
   tr.AppendChild(div2)
Next

However tables.Descendants("tr").ToArray did provide a compile error for me. As that's out of the scope of this question and you haven't raised it as an issue I'll make an assumption that it works for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM