如何从HTML TextNode而不是HTML标记获取实际显示的文本？

Question

I'm trying to turn a DOM node and all its children into a plain text markup of my design. 我正在尝试将DOM节点及其所有子节点转换为设计的纯文本标记。 I can use node.childNodes to get a list of all the content and recursively turn it into my string format. 我可以使用node.childNodes获取所有内容的列表，然后将其递归地转换为我的字符串格式。

However, when I take text out of a TextNode , it includes newlines and spaces that aren't visible on the page. 但是，当我从TextNode取出文本时，它包含换行和空格，这些换行和空格在页面上不可见。 For plain text I want to get the same appearance that was on the HTML - so there shouldn't be lots of indentations before the text or newlines after it, even if they were in the HTML markup, because my browser stripped those out when it rendered the HTML. 对于纯文本，我希望获得与HTML相同的外观-因此，即使在HTML标记中，也不应在文本或换行符之后出现很多缩进，因为我的浏览器在将其删除时会去除这些缩进呈现HTML。

The obvious answer would be to .trim() the string myself - except that this can take out spaces that are supposed to exist in the text, in the case of something like <em>text.</em> moretext . 显而易见的答案是自己对字符串.trim()进行.trim() -除非在<em>text.</em> moretext类的情况下，它可以删除文本中应该存在的空格。 The latter textnode loses the space before it. 后者的textnode失去了前面的空间。

Even if that was working it's also philosophically unappealing. 即使这行得通，但从哲学上讲也没有吸引力。 I want this algorithm to be based on the text presented to the user. 我希望该算法基于呈现给用户的文本。 The webpage conceals implementation details like spaces, tabs, and newlines in the underlying markup and I would like to remain within that abstraction using whatever it used to trim them down, rather than the approximation granted by trim() . 该网页在底层标记中隐藏了诸如空格，制表符和换行符之类的实现细节，我希望使用任何用于修剪它们的东西（而不是trim()授予的近似值trim()来保留该抽象。 Ideally there would be an equivalent of node.textContent that has a list of both plain textand child elements somehow. 理想情况下，将有一个等效的node.textContent ，它以某种方式同时包含纯文本和子元素的列表。

I haven't been able to find anything about this and I can't see a good way to code it to be smart about those spaces (short of comparing the .textContent and .nodeValue strings or parsing innerHTML myself or something). 我还没有找到任何关于此的信息，也看不出有什么好方法可以对其进行编码以使其对这些空间变得精明（缺少比较.textContent和.nodeValue字符串或自己解析innerHTML或其他内容的方法）。 Help? 救命？

Answer 1

document.getElementById("someid").innerText.replace(/\s+/g," ")

trim方法删除字符串开头和结尾处的空格，但不删除中间的空格

Answer 2

我已经在我的Rangy库的TextRange模块中编写了与此完全相同的实现，但是为此要包括很多代码。

var displayedText = rangy.innerText(node);

如何从HTML TextNode而不是HTML标记获取实际显示的文本？

问题描述

2 个解决方案

解决方案1
0 2013-02-19 03:23:55

解决方案2
0 2013-02-19 10:55:45

如何从HTML TextNode而不是HTML标记获取实际显示的文本？

问题描述

2 个解决方案

解决方案1 0 2013-02-19 03:23:55

解决方案2 0 2013-02-19 10:55:45

解决方案1
0 2013-02-19 03:23:55

解决方案2
0 2013-02-19 10:55:45