[英]How can I get the actual displayed text from an HTML TextNode instead of the HTML markup?
I'm trying to turn a DOM node and all its children into a plain text markup of my design. 我正在尝试将DOM节点及其所有子节点转换为设计的纯文本标记。 I can use
node.childNodes
to get a list of all the content and recursively turn it into my string format. 我可以使用
node.childNodes
获取所有内容的列表,然后将其递归地转换为我的字符串格式。
However, when I take text out of a TextNode
, it includes newlines and spaces that aren't visible on the page. 但是,当我从
TextNode
取出文本时,它包含换行和空格,这些换行和空格在页面上不可见。 For plain text I want to get the same appearance that was on the HTML - so there shouldn't be lots of indentations before the text or newlines after it, even if they were in the HTML markup, because my browser stripped those out when it rendered the HTML. 对于纯文本,我希望获得与HTML相同的外观-因此,即使在HTML标记中,也不应在文本或换行符之后出现很多缩进,因为我的浏览器在将其删除时会去除这些缩进呈现HTML。
The obvious answer would be to .trim()
the string myself - except that this can take out spaces that are supposed to exist in the text, in the case of something like <em>text.</em> moretext
. 显而易见的答案是自己对字符串
.trim()
进行.trim()
-除非在<em>text.</em> moretext
类的情况下,它可以删除文本中应该存在的空格。 The latter textnode loses the space before it. 后者的textnode失去了前面的空间。
Even if that was working it's also philosophically unappealing. 即使这行得通,但从哲学上讲也没有吸引力。 I want this algorithm to be based on the text presented to the user.
我希望该算法基于呈现给用户的文本。 The webpage conceals implementation details like spaces, tabs, and newlines in the underlying markup and I would like to remain within that abstraction using whatever it used to trim them down, rather than the approximation granted by
trim()
. 该网页在底层标记中隐藏了诸如空格,制表符和换行符之类的实现细节,我希望使用任何用于修剪它们的东西(而不是
trim()
授予的近似值trim()
来保留该抽象。 Ideally there would be an equivalent of node.textContent
that has a list of both plain textand child elements somehow. 理想情况下,将有一个等效的
node.textContent
,它以某种方式同时包含纯文本和子元素的列表。
I haven't been able to find anything about this and I can't see a good way to code it to be smart about those spaces (short of comparing the .textContent
and .nodeValue
strings or parsing innerHTML
myself or something). 我还没有找到任何关于此的信息,也看不出有什么好方法可以对其进行编码以使其对这些空间变得精明(缺少比较
.textContent
和.nodeValue
字符串或自己解析innerHTML
或其他内容的方法)。 Help? 救命?
document.getElementById("someid").innerText.replace(/\s+/g," ")
trim方法删除字符串开头和结尾处的空格,但不删除中间的空格
我已经在我的Rangy库的TextRange模块中编写了与此完全相同的实现,但是为此要包括很多代码。
var displayedText = rangy.innerText(node);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.