简体   繁体   中英

How to get part of a big text without losing html tags with php?

I get a big content from an API, something like this:

Lorem <div class="highlighted">ipsum dolor</div> 
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit 
esse cillum dolore eu fugiat nulla pariatur

I want to show around 10 words from this content. And also I do not want to miss the <div class="highlighted">ipsum dolor</div> part. I mean the div and the class="highlighted" should not be removed.

I tried this function:

 function getPartialContent($content, $words_number)
    {
        $no_tags_content = preg_replace("/\r|\n/", "", html_entity_decode(filter_var($content, FILTER_SANITIZE_STRING)));

        $words = explode(" ", $no_tags_content);
        $result = implode(" ", array_splice($words, 0, $words_number));
        return $result;
    }

The only problem is that this function removes all html tags first. If I don't use preg_replace to remove html tags, the result will be something like this (the div is not closed):

Lorem sed do eiusmod tempor incididunt is that this <div class="highlighted">ipsum

which is not what I want.

I expect the result to be with closed tags or without any tags at all. Usually there are one or two words in the div . The number of words in the result is not that important. I just want it to be short, around 10 to 15 words.

You could try something like this:

$rgxp = '/^(\W*(<[^>]+>\W*)?\w+(\W*<[^>]+>)?\W*){10,15}/';
preg_match($rgxp, $text, $mtch);
echo "\n",$mtch[0], "\n";

Expanded:

$rgxp = '/
^             # start of line
(             # group to quantify
\W*           # ignore space & punctuation
(<[^>]+>\W*)? # optional opening tag group
\w+           # the words to count
(\W*<[^>]+>)? # optional closing tag group
\W*           # ignore space & punctuation
) {10,15}     # quantifier
/x';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM