简体   繁体   English

将平面文件数据库信息解析为多维数组

[英]Parsing flat-file database information into a multidimensional array

I want to make a class for parsing flat-file database information into one large analogous multidimensional array. 我想创建一个类,用于将平面文件数据库信息解析为一个大型的类似多维数组。 I had the idea of formatting the database in a sort of python-esque format as follows: 我想到了以某种python-esque格式格式化数据库的想法,如下所示:

"tree #1":
    "key" "value"
    "sub-tree #1":
        "key" "value"
        "key #2" "value"
        "key #3" "value"

I am trying to make it parse this and build and array while parsing it to throw the keys/values into, and I want it to be very dynamic and expandable. 我试图使其解析并构建和数组,同时解析它以将键/值放入其中,并且我希望它具有很高的动态性和可扩展性。 I've tried many different techniques and I've been stumped in each of these attempts. 我尝试了许多不同的技术,但每次尝试都让我感到困惑。 This is my most recent: 这是我最近的:

function parse($file=null) {
    $file = $file ? $file : $this->dbfile;

    ### character variables

    # get values of 
    $src = file_get_contents($file);
    # current character number
    $p = 0;

    ### array variables

    # temp shit
    $a = array();
    # set $ln keys
    $ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
    # indent level
    $ilvl = 0;

    ### go time

    while (strlen($src) > $p) {
        $chr = $src[$p];
        # quote
        if ($chr == "\"") {
            if ($ln["q"] == 1) { // quote open?
                $ln["q"] = 0; // close it
                if (!$ln["k"]) { // key yet?
                    $ln["k"] = $ln["s"]; // set key
                    $ln["s"] = null;
                    $a[$ln["k"]] = $ln["v"]; // write to current array
                } else { // value time
                    $ln["v"] = $ln["s"]; // set value
                    $ln["s"] = null;
                }
            } else {
                $ln["q"] = 1; // open quote
            }
        }

        elseif ($chr == "\n" && $ln["q"] == 0) {
            $ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
            $llvl = $ilvl;

        }
        # beginning of subset
        elseif ($chr == ":" && $ln["q"] == 0) {
            $ilvl++;
            if (!array_key_exists($ilvl,$a)) { $a[$ilvl] = array(); }
            $a[$ilvl][$ln["k"]] = array("@mbdb-parent"=> $ilvl-1 .":".$ln["k"]);
            $ln = array("q"=>0,"k"=>null,"v"=>null,"s"=>null,"p"=>null);
            $this->debug("INDENT++",$ilvl);
        }
        # end of subset
        elseif ($chr == "}") {
            $ilvl--;
            $this->debug("INDENT--",$ilvl);
        }
        # other characters
        else {
            if ($ln["q"] == 1) {
                $ln["s"] .= $chr;
            } else {
                # error
            }
        }
        $p++;
    }
    var_dump($a);
}

I honestly have no idea where to go from here. 老实说,我不知道从这里去哪里。 The thing troubling me most is setting the multidimensional values like $this->c["main"]["sub"]["etc"] the way I have it here. 让我最困扰的是我在这里设置多维值的方法,例如$this->c["main"]["sub"]["etc"] Can it even be done? 还能做到吗? How can I actually nest the arrays as the data is nested in the db file? 当数据嵌套在db文件中时,如何实际嵌套数组?

This is all going to depend on how human-readable you want your "flat file" to be. 这一切都取决于您希望“平面文件”具有怎样的可读性。

Want human-readable? 想要人类可读?

  • XML XML格式
  • Yaml Yaml

Semi-human-readable? 半人类可读?

  • JSON JSON格式

Not really human-readable? 真的不可读吗?

  • Serialized PHP (also PHP-only) 序列化的PHP(也仅限PHP)
  • Mysql Dump MySQL转储

Writing your own format is going to be painful. 编写自己的格式会很痛苦。 Unless you want to do this purely for the academic experience, then I say don't bother. 除非您纯粹是出于学术经验,否则我不会打扰。

Looks like JSON might be a happy medium for you. 看起来JSON可能是您的理想选择。

$configData = array(
    'tree #1' => array(
        'key'         => 'value'
      , 'sub-tree #1' => array(
          'key'    => 'value'
        , 'key #2' => 'value'
        , 'key #3' => 'value'
      )
  )
);

//  Save config data
file_put_contents( 'path/to/config.json', json_format( json_encode( $configData ) ) );

//  Load it back out
$configData = json_decode( file_get_contents( 'path/to/config.json' ), true );

//  Change something
$configData['tree #1']['sub-tree #1']['key #2'] = 'foo';

//  Re-Save (same as above)
file_put_contents( 'path/to/config.json', json_format( json_encode( $configData ) ) );

You can get the json_format() function here , which just pretty-formats for easier human-reading. 您可以在此处获得json_format()函数,该函数的格式很漂亮,便于人们阅读。 If you don't care about human-readability, you can skip it. 如果您不关心人类可读性,则可以跳过它。

Well, you could use serialize and unserialize but that would be no fun, right? 好吧,您可以使用序列化和反序列化,但这不会很有趣,对吗? You should be using formats specifically designed for this purpose, but for sake of exercise, I'll try and see what I can come up with. 您应该使用专门为此目的设计的格式,但是为了便于练习,我将尝试看看能提供些什么。

There seems to be two kinds of datatypes in your flatfile, key-value pairs and arrays. 平面文件中似乎有两种数据类型,即键值对和数组。 key-value pairs are denoted with two sets of quotes and arrays with one pair of quotes and a following colon. 键值对用两组引号和带有一对引号和后跟冒号的数组表示。 As you go through the file, you must parse each row and determine what it represents. 浏览文件时,必须分析每一行并确定它代表什么。 That's easy with regular expressions. 使用正则表达式很容易。 The hard part is to keep track of the level we're going at and act accordingly. 困难的部分是跟踪我们要达到的水平并采取相应的措施。 Here's a function that parses the tree you provided: 这是一个解析您提供的树的函数:

function parse_flatfile($filename) {
    $file = file($filename);

    $result = array();
    $open = false;
    foreach($file as $row) {
        $level = strlen($row) - strlen(ltrim($row));
        $row = rtrim($row);
        // Regular expression to catch key-value pairs
        $isKeyValue = preg_match('/"(.*?)" "(.*?)"$/', $row, $match);        
        if($isKeyValue == 1) {
            if($open && $open['level'] < $level) {
                $open['item'][$match[1]] = $match[2];
            } else {
                $open = array('level' => $level - 1, 'item' => &$open['parent']);                
                if($open) {
                    $open['item'][$match[1]] = $match[2];
                } else {
                    $result[$match[1]] = $match[2];
                }
            }
        // Regular expression to catch arrays
        } elseif(($isArray = preg_match('/"(.*?)":$/', $row, $match)) > 0) {
            if($open && $open['level'] < $level) {
                $open['item'][$match[1]] = array();
                $open = array('level' => $level, 'item' => &$open['item'][$match[1]], 'parent' => &$open['item']);
            } else {
                $result[$match[1]] = array();
                $open = array('level' => $level, 'item' => &$result[$match[1]], 'parent' => false);
            }
        }    
    }    
    return $result;
}

I won't go into greater detail on how that works, but it short, as we progress deeper into the array, the previous level is stored in a reference $open and so on. 我不会更详细地介绍它是如何工作的,但是总之,随着我们深入到数组中,上一层存储在引用$open ,依此类推。 Here's a more complex tree using your notation: 这是使用您的符号表示的更复杂的树:

"tree_1":
    "key" "value"
    "sub_tree_1":
        "key" "value"
        "key_2" "value"
        "key_3" "value"
    "key_4" "value"
    "key_5" "value"
"tree_2":
   "key_6" "value"
    "sub_tree_2":
        "sub_tree_3":
            "sub_tree_4":
                "key_6" "value"
                "key_7" "value"
                "key_8" "value"
                "key_9" "value"
                "key_10" "value"

And to parse that file you could use: 要解析该文件,您可以使用:

$result = parse_flatfile('flat.txt');
print_r($result);

And that would output: 那会输出:

Array
(
[tree_1] => Array
    (
    [key] => value
    [sub_tree_1] => Array
        (
        [key] => value
        [key_2] => value
        [key_3] => value
        )    
    [key_4] => value
    [key_5] => value
    )    
[tree_2] => Array
    (
    [key_6] => value
    [sub_tree_2] => Array
        (
        [sub_tree_3] => Array
            (
            [sub_tree_4] => Array
                (
                [key_6] => value
                [key_7] => value
                [key_8] => value
                [key_9] => value
                [key_10] => value
                )    
            )    
        )    
    )    
)

I guess my test file covers all the bases, and it should work without breaking. 我猜我的测试文件涵盖了所有基础,并且应该可以正常工作。 But I won't give any guarantees. 但是我不会提供任何保证。

Transforming a multidimensional array to flatfile using this notation will be left as an exercise to the reader :) 使用此符号将多维数组转换为平面文件将留给读者练习:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM