简体   繁体   English

如何从HTML TextNode而不是HTML标记获取实际显示的文本?

[英]How can I get the actual displayed text from an HTML TextNode instead of the HTML markup?

I'm trying to turn a DOM node and all its children into a plain text markup of my design. 我正在尝试将DOM节点及其所有子节点转换为设计的纯文本标记。 I can use node.childNodes to get a list of all the content and recursively turn it into my string format. 我可以使用node.childNodes获取所有内容的列表,然后将其递归地转换为我的字符串格式。

However, when I take text out of a TextNode , it includes newlines and spaces that aren't visible on the page. 但是,当我从TextNode取出文本时,它包含换行和空格,这些换行和空格在页面上不可见。 For plain text I want to get the same appearance that was on the HTML - so there shouldn't be lots of indentations before the text or newlines after it, even if they were in the HTML markup, because my browser stripped those out when it rendered the HTML. 对于纯文本,我希望获得与HTML相同的外观-因此,即使在HTML标记中,也不应在文本或换行符之后出现很多缩进,因为我的浏览器在将其删除时会去除这些缩进呈现HTML。

The obvious answer would be to .trim() the string myself - except that this can take out spaces that are supposed to exist in the text, in the case of something like <em>text.</em> moretext . 显而易见的答案是自己对字符串.trim()进行.trim() -除非在<em>text.</em> moretext类的情况下,它可以删除文本中应该存在的空格。 The latter textnode loses the space before it. 后者的textnode失去了前面的空间。

Even if that was working it's also philosophically unappealing. 即使这行得通,但从哲学上讲也没有吸引力。 I want this algorithm to be based on the text presented to the user. 我希望该算法基于呈现给用户的文本。 The webpage conceals implementation details like spaces, tabs, and newlines in the underlying markup and I would like to remain within that abstraction using whatever it used to trim them down, rather than the approximation granted by trim() . 该网页在底层标记中隐藏了诸如空格,制表符和换行符之类的实现细节,我希望使用任何用于修剪它们的东西(而不是trim()授予的近似值trim()来保留该抽象。 Ideally there would be an equivalent of node.textContent that has a list of both plain textand child elements somehow. 理想情况下,将有一个等效的node.textContent ,它以某种方式同时包含纯文本和子元素的列表。

I haven't been able to find anything about this and I can't see a good way to code it to be smart about those spaces (short of comparing the .textContent and .nodeValue strings or parsing innerHTML myself or something). 我还没有找到任何关于此的信息,也看不出有什么好方法可以对其进行编码以使其对这些空间变得精明(缺少比较.textContent.nodeValue字符串或自己解析innerHTML或其他内容的方法)。 Help? 救命?

document.getElementById("someid").innerText.replace(/\s+/g," ")

trim方法删除字符串开头和结尾处的空格,但不删除中间的空格

我已经在我的Rangy库的TextRange模块中编写了与此完全相同的实现,但是为此要包括很多代码。

var displayedText = rangy.innerText(node);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不显示 html 标记的情况下过滤和标记输入中的单词? - How can I filter and mark words from an input without html markup being displayed? 如何将字符串转换为HTML。 我希望实际的href显示/显示而不是文字 - how can I convert string into HTML. I want the actual href to show/display instead of the text 用Javascript中的HTML文本替换textNode? - Replace a textNode with HTML text in Javascript? 如何从状态数据的字符串中删除HTML标记? - How can I remove HTML markup from strings in state data? 如何从服务器获取带有HTML标记的文本并通过HTML标记在客户端显示它? (使用php和angularjs) - How get text with HTML tags from server and display it in client side with HTML markup? (using php and angularjs) 如何使解析后的html显示为html而不是纯文本? - How can I get my parsed html to appear as html instead of plain text? 如何检查html元素是否包含其自身的文本(包含textNode) - How to check if an html element contains it's own text (contains textNode) 如何从HTML文档中仅获取文本(无标记)? - How can I get the text only (no tags) from a HTML document? 如何使用JavaScript在HTML中标记一些文本,这些文本跨越多个标签? - How can I markup some text spanning multiple tags in HTML using Javascript? 如何从其文本节点或文本内容中获取跨度? - How to get an span from its textnode or text content?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM