简体   繁体   English

PHP-如何通过空格和html标签的正则表达式将字符串拆分为数组?

[英]PHP - How to split string into array by regular expression at whitespace and html tag?

Here is a example string: 这是一个示例字符串:

$string = '<strong>Lorem ipsum dolor</strong> sit <img src="test.png" /> amet <span class="test" style="color:red">consec<i>tet</i>uer</span>.';

I want to split the string into array such that the string get split whenever a whitespace is hit or an html tag is hit (ignoring whitespace inside html tag). 我想将字符串拆分为数组,以便每当击中空白或击中html标签时都将字符串拆分(忽略html标签中的空白)。 For Example: 例如:

Array
(
    [0] => <strong>
    [1] => Lorem
    [2] => ipsum
    [3] => dolor
    [4] => </strong>
    [5] => sit
    [6] => <img src="test.png" />
    [7] => amet
    [8] => <span class="test" style="color:red">
    [9] => consec
    [10] => <i>
    [11] => tet
    [12] => </i>
    [13] => uer
    [14] => </span>
    [15] => .
)

But i am unable to achieve this. 但我无法实现这一目标。 I used preg_split to achieve this idea but i think i am mistaken in my regular expressions. 我用preg_split实现了这个想法,但是我认为我在正则表达式中弄错了。 Below are some expressions i tried but the results are not what i want. 下面是我尝试过的一些表达式,但结果不是我想要的。

$chars = preg_split('/(<[^>]*[^\/]>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */

Array
(
    [0] => <strong>
    [1] => Lorem ipsum dolor
    [2] => </strong>
    [3] =>  sit <img src="test.png" /> amet 
    [4] => <span class="test" style="color:red">
    [5] => consec
    [6] => <i>
    [7] => tet
    [8] => </i>
    [9] => uer
    [10] => </span>
    [11] => .
)

and the result of other regular expression is: 其他正则表达式的结果是:

$chars = preg_split('/\s+(?![^<>]*>)/x', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */
Array
(
    [0] => <strong>Lorem
    [1] => ipsum
    [2] => dolor</strong>
    [3] => sit
    [4] => <img src="test.png" />
    [5] => amet
    [6] => <span class="test" style="color:red">consec<i>tet</i>uer</span>.
)

and the result of another expression is (quite close): 并且另一个表达式的结果是(非常接近):

$chars = preg_split('/\s*(<[^>]*>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */
Array
(
    [0] => <strong>
    [1] => Lorem ipsum dolor
    [2] => </strong>
    [3] =>  sit
    [4] => <img src="test.png" />
    [5] =>  amet
    [6] => <span class="test" style="color:red">
    [7] => consec
    [8] => <i>
    [9] => tet
    [10] => </i>
    [11] => uer
    [12] => </span>
    [13] => .
)

You're almost near to get it. 您几乎可以得到它。 But you need to change <[^>]*> to a more specific regex <\\/?\\w+[^<>]*> then you need to set an alternation for whitespaces |\\s+ . 但是,您需要将<[^>]*>更改为更特定的正则表达式<\\/?\\w+[^<>]*>然后需要为空格|\\s+设置替代。 You don't need i flag either: 您也不需要i标记:

preg_split('/(<\/?\w+[^<>]*>)|\s+/', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM