简体   繁体   English

在PHP中使用Regex解析令牌

[英]Parse Tokens with Regex in PHP

I am looking to parse a token file that looks something like the one below to grab the token name/value pair. 我正在寻找一个令牌文件,它看起来像下面这样,以获取令牌名称/值对。 The token/value/nesting relationships are already defined, so i cant change the way the token files are made. 令牌/值/嵌套关系已经定义,因此我无法更改令牌文件的制作方式。 It would seem that a context free grammar might be the best way to go, but i've no experience writing or implementing one. 似乎上下文无关的语法可能是最好的方法,但是我没有编写或实现语法的经验。 Is it possible to do it with regex? 可以用正则表达式来做吗? I've not had any luck with the nested multiline tokens (like Master1, Servant2). 嵌套多行标记(例如Master1,Servant2)没有运气。

;token1 = I am a top level single line token  
;token2 {  
    I am a top level  
    multiline line token  
}  

master1 {  
;servant1 = I am Master1, Servant1 single line token  
;servant2 {  
    I am Master1, Servant2.   
    A mulit line token.  
}  
;servant3 = I am Master1, Servant3  
}  
master2 {  
;servant1 = I am Master2, Servant1  
;servant2 {  
    I am Master2, Servant2  
A mulit line token.  
}  
;servant3 = I am Master2, Servant3  
}

PHP has a function to tokenize strings with PHP具有使用以下功能标记字符串的功能

  • strtok - splits a string (str) into smaller strings (tokens), with each token being delimited by any character from token. strtok 将字符串(str)拆分为较小的字符串(令牌),每个令牌由令牌中的任何字符定界。 That is, if you have a string like "This is an example string" you could tokenize this string into its individual words by using the space character as the token. 也就是说,如果您有一个类似“这是示例字符串”的字符串,则可以使用空格字符作为标记,将此字符串标记为各个单词。

Here's a reasonably simple line-walking parser (I originally tried to write a regex for it, but the lack of a leading ; on the start of the multi-line-master really made it much harder (without that ; being missing, it's reasonably easy to write). I gave up and wrote this): 这是一个相当简单的行遍历解析器(我最初试图为其编写一个正则表达式,但是缺少前导;在多行主计算机开始时确实使它变得更加困难(没有它;缺少它就很合理了)。易于编写)。我放弃了并写下了):

function getTokens($string) {
    $string = trim($string);;
    $lines = explode("\n", $string);
    $data = array();
    $key = '';
    $open = 0;
    $buffer = '';
    foreach ($lines as $line) {
        $line = trim($line);
        if (empty($line)) {
            continue;
        } elseif (strpos($line, '}') === 0) {
            $open--;
            if ($open == 0) {
                $data[$key] = getTokens($buffer);
                $buffer = '';
            } elseif ($open < 0) {
                throw new Exception('Unmatched }');
            } else {
                $buffer .= "\n" . $line;
            }
        } elseif ($open > 0) {
            if (strpos($line, '{') !== false) {
                $open++;
            }
            $buffer .= "\n" . $line;
        } elseif ($line[0] == ';') {
            if (strpos($line, "=") !== false) {
                list ($key, $value) = explode("=", $line, 2);
                $key = trim(substr($key, 1));
                $value = trim($value);
                $data[$key] = $value;
            } elseif (strpos($line, "{") !== false) {
                $open++;
                list ($key, $value) = explode("{", $line, 2);
                $key = trim(substr($key, 1));
            } else {
                throw new Exception('Unmatched token ;');
            }
        } elseif (strpos($line, '{') !== false) {
            $open++;
            list ($key, $value) = explode("{", $line, 2);
            $key = trim($key);
        } else {
            $buffer .= "\n" . $line;
        }
    }
    if ($open > 0) {
        throw new Exception('Unmatched {');
    } elseif (empty($data) && !empty($buffer)) {
        return trim($buffer);
    }
    return $data;
}

When I feed it your string as input, I get: 当我输入您的字符串作为输入时,我得到:

Array(
    "token1" => "I am a top level single line token",
    "token2" => "I am a top level
                    multiline line token",
    "master1" => Array(
        "servant1" => "I am Master1, Servant1 single line token",
        "servant2" => "I am Master1, Servant2.
                            A mulit line token.",
        "servant3" => "I am Master1, Servant3",
    ),
    "master2" => Array(
        "servant1" => "I am Master2, Servant1",
        "servant2" => "I am Master2, Servant2
                            A mulit line token.",
        "servant3" => "I am Master2, Servant3",
    ),
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM