简体   繁体   English

避免使用JSoup进行无空间级联

[英]Avoid spaceless concatenation with JSoup

Suppose I have a div as such: 假设我有一个div这样的:

<div>
This is a paragraph
written by someone
on the internet.
</div>

The problem is that when JSoup parses this, it puts it all on one line, so that when I call text() it reads as such: 问题是,当JSoup解析此语句时,它将全部放在一行上,因此当我调用text()时,它的读法如下:

This is a paragraphwritten by someoneon the internet.

Now, I realize this isn't really a JSoup problem, in that the actual html doesn't contain a space. 现在,我意识到这并不是一个真正的JSoup问题,因为实际的html不包含空格。 However, is there any way to use JSoup (perhaps some override or maybe an option I haven't seen) so that as it parses it will add a space between lines? 但是,有什么方法可以使用JSoup(也许有些替代或我可能没有看到的选项),以便在解析时会在行之间添加空格? I imagine it must be possible (as I can inspect element in Chrome and unselect word wrap and it gets what I want) but I'm not sure JSoup can do this. 我认为这一定有可能(因为我可以检查Chrome中的元素并取消选择自动换行,并且可以得到我想要的东西),但是我不确定JSoup可以做到这一点。

Any thoughts? 有什么想法吗?

Can you provide a full example of your code? 您能否提供代码的完整示例? What version of jsoup are you using? 您正在使用哪个版本的jsoup?

In the current version (1.6.1), this code: 在当前版本(1.6.1)中,此代码:

Document doc = Jsoup.parse("<div>\n" +
    "This is a paragraph\n" +
    "written by someone\n" +
    "on the internet.\n" +
    "</div>");
System.out.println(doc.text());

Produces: 生产:

This is a paragraph written by someone on the internet.

Ie, \\n (and \\r\\n etc) are converted to text as spaces. 即, \\n (和\\r\\n等)被转换为文本作为空格。

Happy to fix or improve it, if I can replicate :) 如果可以复制,很高兴修复或改善它:)

the following post shows how you get everything including the line break 以下文章显示了如何获取包括换行符在内的所有内容

Removing HTML entities while preserving line breaks with JSoup 使用JSoup在保留换行符的同时删除HTML实体

the answer and comment in the following also has another way (read the comment in it) 下面的答案和评论也有另一种方式(请阅读其中的评论)

Remove HTML tags from a String 从字符串中删除HTML标签

and this one has even another way if you check all the answers and the comments 如果您检查所有答案和评论,这还有另一种方式

How do I preserve line breaks when using jsoup to convert html to plain text? 使用jsoup将html转换为纯文本时,如何保留换行符?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM