简体   繁体   English

我正在尝试使用自定义嵌套标签解析 html 中的一些文本

[英]I'm trying to parse some text in html with custom nested tags

I would like to parse some text into an array:我想将一些文本解析成一个数组:

My text looks like this:我的文字如下所示:

You've come to the {right; correct; appropriate} place! Start by {searching; probing; inquiring} our site below, or {browse; {search; lookup; examine}} our list of popular support articles.

The third group of words has nested tags.第三组词有嵌套标签。 How can I ignore the opening and closing nested tags to achieve an array such as如何忽略打开和关闭嵌套标签以实现数组,例如

$tags[0][0] = 'right';
$tags[0][1] = 'suitable';
$tags[0][2] = 'appropriate';
$tags[1][0] = 'searching';
$tags[1][1] = 'probing';
$tags[1][2] = 'inquiring';
$tags[2][1] = 'browse';
$tags[2][2] = 'search';
$tags[2][3] = 'lookup';
$tags[2][4] = 'examine';

Essentially ignoring the nesting of the tags.基本上忽略了标签的嵌套。 Any help would be greatly appreciated.任何帮助将不胜感激。

My only current ideas for this is to traverse the text character by character until I find a { which would increment a "depth" variable.我目前唯一的想法是逐个字符地遍历文本,直到找到一个 { 这会增加一个“深度”变量。 Capture the words in between until I find a } decreasing the depth variable and upon it returning to zero, stop capturing words.捕获其间的单词,直到我找到一个 } 减小深度变量并在它返回零时停止捕获单词。 I was just wondering if there's a much easier way of doing this.我只是想知道是否有更简单的方法可以做到这一点。 Thanks.谢谢。

Thanks for your excellent help, I modified it a bit to come up with the following solution.感谢您的出色帮助,我对其进行了一些修改以提出以下解决方案。

$code = "You've come to {the right; the correct; the appropriate} place! 
    Start by {searching; probing; inquiring} our site below, or 
    {browse; {search; {foo; bar}; lookup}; examine} our list of 
    popular support articles.";
echo $code."\r\n\r\n";

preg_match_all('/{((?:[^{}]*|(?R))*)}/', $code, $matches);
$arr = array();
$r = array('{','}');

foreach($matches[1] as $k1 => $m)
{
    $ths = explode(';',str_replace($r,'',$m));
    foreach($ths as $key => $val)
    {
        if($val!='')
        $arr[$k1][$key] = trim($val);
        $code = str_replace($matches[0][$k1],'[[rep'.$k1.']]',$code);
    }
}    
echo $code;

Returns退货

You've come to {the right;你来了{右边; the correct;正确的; the appropriate} place;适当的}地方; Start by {searching;从{搜索开始; probing, inquiring} our site below;在下面探查、查询我们的网站; or {browse;或{浏览; {search; {搜索; {foo; {富; bar};酒吧}; lookup}.抬头}。 examine} our list of popular support articles.检查我们的热门支持文章列表。

You've come to [[rep0]] place, Start by [[rep1]] our site below.您来到了 [[rep0]] 的地方,从下面的 [[rep1]] 我们的网站开始。 or [[rep2]] our list of popular support articles.或 [[rep2]] 我们的热门支持文章列表。

My only current ideas for this is to traverse the text character by character until I find a { which would increment a "depth" variable.我目前唯一的想法是逐个字符地遍历文本,直到找到一个 { 这会增加一个“深度”变量。 Capture the words in between until I find a } decreasing the depth variable and upon it returning to zero, stop capturing words.捕获其间的单词,直到我找到一个 } 减小深度变量并在它返回零时停止捕获单词。 I was just wondering if there's a much easier way of doing this.我只是想知道是否有更简单的方法可以做到这一点。

That sounds like a reasonable way to do it.这听起来是一种合理的做法。 Another way to do this is by using a bit of regex, although that might result in a solution that is (far) less readable (and therefor less maintainable) than your own solution.另一种方法是使用一些正则表达式,尽管这可能会导致解决方案的可读性(远)低于您自己的解决方案(因此难以维护)。

<?php

$text = "You've come to the {right; correct; appropriate} place! 
    Start by {searching; probing; inquiring} our site below, or 
    {browse; {search; {foo; bar}; lookup}; examine} our list of 
    popular support articles. {the right; the correct; the appropriate}";

preg_match_all('/{((?:[^{}]*|(?R))*)}/', $text, $matches);

$arr = array();

foreach($matches[1] as $m) {
  preg_match_all('/\w([\w\s]*\w)?/', $m, $words);
  $arr[] = $words[0];
}    

print_r($arr);

?>

would produce:会产生:

Array
(
    [0] => Array
        (
            [0] => right
            [1] => correct
            [2] => appropriate
        )

    [1] => Array
        (
            [0] => searching
            [1] => probing
            [2] => inquiring
        )

    [2] => Array
        (
            [0] => browse
            [1] => search
            [2] => foo
            [3] => bar
            [4] => lookup
            [5] => examine
        )

    [3] => Array
        (
            [0] => the right
            [1] => the correct
            [2] => the appropriate
        )

)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM