简体   繁体   English

如何将长 HTML 内容拆分为多个 div 而不会破坏 php 中的单词或格式

[英]How to split long HTML content to multiple div without breaking words or formatting in php

For now I got:现在我得到了:

public static function splitContent($string, $lenght,  $maxCols){
        if (strlen($string)<($lenght*$maxCols) && strlen($string)> $lenght){
            $string = wordwrap($string, $lenght, "||"); //assume your string doesn't contain `||`
            $parts = explode("||", $string);
            $result='';
            foreach ($parts as $part){
                $result=$result.'<div>'.$part.'</div>';
            }
            return $result;
        }
        return $string;
    }

and it works well when it comes to not breaking words but it often split HTML formatting tags like <span </div><div> style=....> how to prevent that?它在不破坏单词时效果很好,但它经常拆分 HTML 格式标签,如<span </div><div> style=....>如何防止这种情况发生? I see there is many problems like this when splitting html formatted string.我看到在拆分 html 格式的字符串时有很多这样的问题。 Does anyone know about library to do it without hassle.有谁知道图书馆可以轻松做到这一点。 it would be great if it would count only visible characters如果它只计算可见字符,那就太好了

From what I know this can not be achieved by simple string splitting because as you already found out - there is a very high possibility of breaking html. 据我所知,这不能通过简单的字符串拆分来实现,因为您已经发现-破坏html的可能性很高。

However you could: 但是,您可以:

1) Load the HTML string char by char and track tags' structure 1)通过char和track标签的结构加载HTML字符串char

2) Load the HTML as an object and count elements' text nodes 2)加载HTML作为对象并计算元素的文本节点

2.1) For loading you could use 2.1)对于加载,您可以使用

  1. DOM - http://php.net/manual/en/book.dom.php DOM- http://php.net/manual/zh/book.dom.php
  2. SimpleXML - http://php.net/manual/en/book.simplexml.php SimpleXML- http://php.net/manual/zh/book.simplexml.php
  3. There are many more PHP libraries that handles HTML load 还有更多处理HTML负载的PHP库

2.2) Go through loaded elements and count their text nodes 2.2)遍历加载的元素并计算其文本节点

  1. Use an algorithm that goes through the code 使用遍历代码的算法
  2. Count text nodes until the count is the desired length 计算文本节点,直到计数达到所需的长度
  3. After that clean all text nodes that would be next in display 之后,清理将显示的所有文本节点

As for visible characters - PHP itself doesn't know what CSS your elements have - but eg if you would load it as an object you could getAttribute('style') and search your "hide css" in that :) 至于可见字符-PHP本身不知道元素具有什么CSS-但是例如,如果您将其加载为对象,则可以getAttribute('style')并在其中搜索“隐藏css” :)

Note: both cases 1) and 2) requires a bit performance, sou if you are applying this to some higher traffic site you should consider some kind of caching for these results . 注意:情况1)和2)都需要一点性能,因此,如果将其应用到流量较高的站点,则应考虑对这些结果进行某种形式的缓存

EDIT: ad 1) 编辑:广告1)

I've created example function on how to track open tags 我已经创建了有关如何跟踪打开标签的示例函数


NOTE: this function assumes XHTML ! 注意:此功能假定XHTML! (expects selfclosing tags as <img> to be selfeclosed as <img /> And please note that I just made this quick so it might not be best nor efficiant way to do it :) (期望将自动关闭标签标记为<img> ,并将其自身封闭为<img /> ,请注意,我只是快速完成了此操作,因此它可能不是最佳方法,也不是有效方法:)

You can see it work at http://ideone.com/erSDlg 您可以在http://ideone.com/erSDlg上看到它的工作原理

//PHP
function closeTags( &$html, $length = 20 ){
    $htmlLength = strlen($html);
    $unclosed = array();
    $counter = 0;
    $i=0;
    while( ($i<$htmlLength) && ($counter<$length) ){
        if( $html[$i]=="<" ){
            $currentTag = "";
            $i++;
            if( ($i<$htmlLength) && ($html[$i]!="/") ){
                while( ($i<$htmlLength) && ($html[$i]!=">") && ($html[$i]!="/") ){
                    $currentTag .= $html[$i];
                    $i++;
                }
                if( $html[$i] == "/" ){  
                    do{ $i++; } while( ($i<$htmlLength) && ($html[$i]!=">") );  
                } else {
                    $currentTag = explode(" ", $currentTag);
                    $unclosed[] = $currentTag[0];
                }
            } elseif( $html[$i]=="/" ){
                array_pop($unclosed);
                do{ $i++; } while( ($i<$htmlLength) && ($html[$i]!=">") );
            }
        } else{
            $counter++; 
        }
        $i++;
    }
    $result = substr($html, 0, $i-1);
    $unclosed = array_reverse( $unclosed );
    foreach( $unclosed as $tag ) $result .= '</'.$tag.'>';
    print_r($result);
}

$html = "<div>123890<span>1234<img src='i.png' /></span>567890<div><div style='test' class='nice'>asfaasf";
closeTags( $html, 20 );

I had to split any random HTML text into 2 equal parts to display them in 2 columns next to each other.我必须将任何随机 HTML 文本分成 2 个相等的部分,以将它们显示在彼此相邻的 2 列中。

The logic below splits the HTML into 2 parts taking into account the word boundaries and the HTML tags.考虑到字边界和 HTML 标签,下面的逻辑将 HTML 分成 2 部分。 You can extend it splitting the HTML into multiple divs with a bit more effort.您可以扩展它,将 HTML 拆分为多个 div 并付出更多努力。

I have used @jave.web's logic to close the undisclosed HTML tags.我已经使用@jave.web 的逻辑来关闭未公开的 HTML 标签。

// splitHtmlTextIntoTwoEqualColumnsTrait.php
<?php

trait splitHtmlTextIntoTwoEqualColumnsTrait
{
    protected function splitHtmlTextIntoTwoEqualColumns(string $htmlText): array
    {
        // removes unnecessary characters and HTML tags
        $htmlText = str_replace("\xc2\xa0",' ',$htmlText);
        $pureText = $this->getPureText($htmlText);

        // calculates the length of the text
        $fullLength = strlen($pureText);
        $halfLength = ceil($fullLength / 2);

        $words = explode(' ', $pureText);

        // finds the word which is in the middle of the text
        $middleWordPosition = $this->getPositionOfMiddleWord($words, $halfLength);

        // iterates through the HTML and split the text into 2 parts when it reaches the middle word.
        $columns = $this->splitHtmlStringInto2Strings($htmlText, $middleWordPosition);

        return $this->closeUnclosedHtmlTags($columns, $halfLength*2);
    }

    private function getPureText(string $htmlText): string
    {
        $pureText = strip_tags($htmlText);
        $pureText = preg_replace('/[\x00-\x1F\x7F]/', '', $pureText);

        return str_replace(["\r\n", "\r", "\n"], ['','',''], $pureText);
    }

    /**
     * finds the word which is in the middle of the text
     */
    private function getPositionOfMiddleWord(array $words, int $halfLength): int
    {
        $wordPosition = 0;
        $stringLength = 0;
        for ($p=0; $p<count($words); $p++) {
            $stringLength += mb_strlen($words[$p], 'UTF-8') + 1;
            if ($stringLength > $halfLength) {
                $wordPosition = $p;
                break;
            }
        }

        return $wordPosition;
    }

    /**
     * iterates through the HTML and split the text into 2 parts when it reaches the middle word.
     */
    private function splitHtmlStringInto2Strings(string $htmlText, int $wordPosition): array
    {
        $columns = [
            1 => '',
            2 => '',
        ];
        $columnId = 1;
        $wordCounter = 0;
        $inHtmlTag = false;
        for ($s=0; $s <= strlen($htmlText)-1; $s++) {
            if ($inHtmlTag === false && $htmlText[$s] === '<') {
                $inHtmlTag = true;
            }

            if ($inHtmlTag === true) {
                $columns[$columnId] .= $htmlText[$s];
                if ($htmlText[$s] === '>') {
                    $inHtmlTag = false;
                }
            } else {
                if ($htmlText[$s] === ' ') {
                    $wordCounter++;
                }
                if ($wordCounter > $wordPosition) {
                    $columnId++;
                    $wordCounter = 0;
                }

                $columns[$columnId] .= $htmlText[$s];
            }
        }

        return array_map('trim', $columns);
    }

    private function closeUnclosedHtmlTags(array $columns, int $maxLength): array
    {
        $column1 = $columns[1];
        $unclosedTags = $this->getUnclosedHtmlTags($columns[1], $maxLength);
        foreach (array_reverse($unclosedTags) as $tag) {
            $column1 .= '</' . $tag . '>';
        }

        $column2 = '';
        foreach ($unclosedTags as $tag) {
            $column2 .= '<' . $tag . '>';
        }
        $column2 .= $columns[2];

        return [$column1, $column2];
    }

    /**
     * https://stackoverflow.com/a/26175271/5356216
     */
    private function getUnclosedHtmlTags(string $html, int $maxLength = 250): array
    {
        $htmlLength = strlen($html);
        $unclosed   = [];
        $counter    = 0;
        $i          = 0;
        while (($i < $htmlLength) && ($counter < $maxLength)) {
            if ($html[$i] == "<") {
                $currentTag = "";
                $i++;
                if (($i < $htmlLength) && ($html[$i] != "/")) {
                    while (($i < $htmlLength) && ($html[$i] != ">") && ($html[$i] != "/")) {
                        $currentTag .= $html[$i];
                        $i++;
                    }
                    if ($html[$i] == "/") {
                        do {
                            $i++;
                        } while (($i < $htmlLength) && ($html[$i] != ">"));
                    } else {
                        $currentTag = explode(" ", $currentTag);
                        $unclosed[] = $currentTag[0];
                    }
                } elseif ($html[$i] == "/") {
                    array_pop($unclosed);
                    do {
                        $i++;
                    } while (($i < $htmlLength) && ($html[$i] != ">"));
                }
            } else {
                $counter++;
            }
            $i++;
        }

        return $unclosed;
    }

}

how to use it:如何使用它:

// yourClass.php
<?php
declare(strict_types=1);

class yourClass
{
    use splitHtmlTextIntoTwoEqualColumnsTrait;

    public function do()
    {
        // your logic
        $htmlString = '';
        [$column1, $column2] = $this->splitHtmlTextIntoTwoEqualColumns($htmlString);
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM