简体   繁体   English

使用间距检索内部文本

[英]Retrieve inner text with spacing

I want to extract text out of an arbitrary block of HTML. 我想从任意HTML块中提取文本。 Naive attempt: 天真的尝试:

$('<div><p>Some</p>Inner<div>Text</div></div>').text()

This gives SomeInnerText , but I want Some Inner Text 这给了SomeInnerText ,但我想要Some Inner Text

What is a better way to extract text out of HTML, while maintaining some concept of the visual structure with which the HTML would be rendered? 什么是从HTML中提取文本的更好方法,同时保留用于呈现HTML的可视化结构的一些概念?

In the example above, new lines between block elements would be great & spaces could be a sort of "flattened" output. 在上面的例子中,块元素之间的新行很好,空格可能是一种“扁平”输出。

您可以在脚本中插入“&nbsp”:

$('<div><p>Some&nbsp;</p>Inner&nbsp;<div>Text</div></div>').text();

Well, you can extend jQuery to do that: 好吧,你可以扩展jQuery来做到这一点:

$.fn.textRespectingBlocks = function() {
    return this.map(function() {
        var $this = $(this);
        var display = $this.css('display');
        var isBlock = display !== 'none' && display !== 'inline' && display !== 'inline-block' && display !== 'inline-flex' && display !== 'inline-table';
        var childText = Array.prototype.map.call(this.childNodes, function(node) {
            if (node.nodeType === 1) {
                return $(node).textRespectingBlocks();
            }

            if (node.nodeType === 3) {
                return node.nodeValue;
            }

            return '';
        }).join('');

        return isBlock ? ' ' + childText + ' ' : childText;
    }).get().join('');
};

Do a .replace(/^\\s+|\\s+$|\\s(?=\\s)/g, '') on the result, if you like. 如果您愿意,可以对结果执行.replace(/^\\s+|\\s+$|\\s(?=\\s)/g, '')

Use a regular expression to inject spaces before closing tags: 在关闭标记之前使用正则表达式注入空格:

$('<div><p>Some</p>Inner<div>Text</div></div>'.replace(/</g, ' <')).text();

Fiddle: http://jsfiddle.net/mattdlockyer/uau6S/ 小提琴: http//jsfiddle.net/mattdlockyer/uau6S/

Simply adding the spaces yourself will do the trick. 简单地添加空间就可以了。 However, due to the variations in the way that html is parsed by different browsers, this may result in variations of white space across browsers. 但是,由于不同浏览器解析html的方式不同,这可能会导致浏览器中的空白区域发生变化。

$('<div> <p>Some</p> Inner <div>Text</div></div>').text()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM