简体   繁体   English

PHP:通过单词和标签将精确的字符串拆分为数组

[英]PHP: Accurate string splitting by words and tags into array

Task is to split string by 500 characters into array. 任务是将字符串拆分为500个字符到数组中。 I've done this with str_split, but I've got a problem. 我用str_split完成了这个,但是我遇到了问题。 Ofcourse it must be spitted by words, or else this text is not readable. 当然,它必须用文字吐出,否则这个文字是不可读的。 And more then that. 然后更多。 This text comes with links, and links will be broken if I split them (infact any html) =) So I need to start splitting only if tag ended or even not started yet... same goes to the words. 这个文本带有链接,如果我拆分链接将会被破坏(实际上是任何html)=)所以我只需要在标签结束或者甚至没有开始时才开始拆分......同样的话。 ±100 chars is not a problem. ±100个字符不是问题。

I would really appreciate a piece of code to do that. 我真的很感激一段代码来做到这一点。 I'm not very good with regexps. 我对regexp不太满意。

EDIT: Example 编辑:示例

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ac diam non nisl interdum tempus. Nam id ipsum id nunc tempus varius. Suspendisse ut neque a velit elementum placerat. Curabitur lobortis, lorem sit <a href="#">amet tincidunt ultricies,</a> eros ante feugiat dui, sit amet lacinia metus risus a magna. Duis velit dui, sollicitudin at aliquet et, elementum at dui. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Script: 脚本:

<?php

$str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href=\"http://example.com\">Phasellus condimentum
facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod id, pharetra eu libero.
Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu
augue aliquet dictum. Mauris at purus in lectus varius bibendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>,
at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend
ornare. Phasellus eget justo elit.";

$str = str_split($str, 200);

var_dump($str);

Output: 输出:

    array(4) {
  [0]=>
  string(200) "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href="http://example.com">Phasellus condimentum 
facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod "
  [1]=>
  string(200) "id, pharetra eu libero. 
Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu 
augue aliquet dictum. Mauris at purus in lectus varius bi"
  [2]=>
  string(200) "bendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>, 
at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend 
ornare. Phasellus"
  [3]=>
  string(17) " eget justo elit."
}

It's a harsh character split, half of word comes to $str[1]. 这是一个严厉的角色分裂,一半的词来到$ str [1]。 And if it was a link right by that place, it would be corrupted. 如果它是那个地方的链接,它就会被破坏。

It would probably be best not to do this with regexes but with PHP's native XML/HTML parsing capabilities. 最好不要使用正则表达式,而是使用PHP的原生XML / HTML解析功能。 Something like the following code may well do what you want: 像下面的代码可能会做你想要的:

<?php

$str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href=\"http://example.com\">Phasellus condimentum facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod id, pharetra eu libero. Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu augue aliquet dictum. Mauris at purus in lectus varius bibendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>, at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend ornare. Phasellus eget justo elit.";

$dom = new DOMDocument;

$root = $dom->createDocumentFragment();
$root->appendXML($str);

$bits = array();

foreach ($root->childNodes as $node) {
    if ($node->nodeType == XML_TEXT_NODE) {
        $bits = array_merge($bits, explode(' ', $node->nodeValue));
    } elseif ($node->nodeType == XML_ELEMENT_NODE) {
        $dom->appendChild($newnode = $node->cloneNode(true));
        $bits[] = $dom->saveHTML();
        $dom->removeChild($newnode);
    }
}

var_dump($bits);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM