简体   繁体   English

如何在不丢失html标签的情况下获取大文本的一部分?

[英]How to get part of a big text without losing html tags with php?

I get a big content from an API, something like this: 我从API获得了大量内容,例如:

Lorem <div class="highlighted">ipsum dolor</div> 
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit 
esse cillum dolore eu fugiat nulla pariatur

I want to show around 10 words from this content. 我想显示大约10个单词。 And also I do not want to miss the <div class="highlighted">ipsum dolor</div> part. 而且我也不想错过<div class="highlighted">ipsum dolor</div>部分。 I mean the div and the class="highlighted" should not be removed. 我的意思是divclass="highlighted"不应删除。

I tried this function: 我试过这个功能:

 function getPartialContent($content, $words_number)
    {
        $no_tags_content = preg_replace("/\r|\n/", "", html_entity_decode(filter_var($content, FILTER_SANITIZE_STRING)));

        $words = explode(" ", $no_tags_content);
        $result = implode(" ", array_splice($words, 0, $words_number));
        return $result;
    }

The only problem is that this function removes all html tags first. 唯一的问题是此函数首先删除所有html标签。 If I don't use preg_replace to remove html tags, the result will be something like this (the div is not closed): 如果我不使用preg_replace删除html标签,结果将是这样的(div未关闭):

Lorem sed do eiusmod tempor incididunt is that this <div class="highlighted">ipsum

which is not what I want. 这不是我想要的。

I expect the result to be with closed tags or without any tags at all. 我希望结果是带有封闭标签或根本没有任何标签。 Usually there are one or two words in the div . 通常div有一个或两个单词。 The number of words in the result is not that important. 结果中的单词数量不是那么重要。 I just want it to be short, around 10 to 15 words. 我只希望它简短,大约10到15个字。

You could try something like this: 您可以尝试这样的事情:

$rgxp = '/^(\W*(<[^>]+>\W*)?\w+(\W*<[^>]+>)?\W*){10,15}/';
preg_match($rgxp, $text, $mtch);
echo "\n",$mtch[0], "\n";

Expanded: 扩展:

$rgxp = '/
^             # start of line
(             # group to quantify
\W*           # ignore space & punctuation
(<[^>]+>\W*)? # optional opening tag group
\w+           # the words to count
(\W*<[^>]+>)? # optional closing tag group
\W*           # ignore space & punctuation
) {10,15}     # quantifier
/x';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM