简体   繁体   English

PHP | json_decode巨大的json文件

[英]PHP | json_decode huge json file

im trying to decode large json file 222mb file. 我试图解码大型json文件222mb文件。

i understand i can not use json_decode directly by using file_get_contents() to read whole file and decode whole string, as it would consume alot of memory and would return nothing(this is what its doing so far.) 我明白我不能直接使用json_decode来使用file_get_contents()来读取整个文件并解码整个字符串,因为它会占用大量内存并且不会返回任何内容(这就是它到目前为止所做的事情。)

so i went to try out libraries, The one i tried recently is JSONParser . 所以我去尝试库,我最近尝试过的是JSONParser what it does reads the objects one by one in json array. 它做的是在json数组中逐个读取对象。

but due to lack of documentation there, i want to ask here if anyone has worked with this library. 但是由于那里没有文档,我想在这里询问是否有人使用过这个库。

this is the example test code from github 这是来自github的示例测试代码

// initialise the parser object
$parser = new JSONParser();

// sets the callbacks
$parser->setArrayHandlers('arrayStart', 'arrayEnd');
$parser->setObjectHandlers('objStart', 'objEnd');
$parser->setPropertyHandler('property');
$parser->setScalarHandler('scalar');
/*
echo "Parsing top level object document...\n";
// parse the document
$parser->parseDocument(__DIR__ . '/data.json');*/

$parser->initialise();

//echo "Parsing top level array document...\n";
// parse the top level array

$parser->parseDocument(__DIR__ . '/array.json');

how to use a loop and save the object in php variable that we can easily decode to php array for our further use. 如何使用循环并将对象保存在php变量中,我们可以轻松解码到php数组以供我们进一步使用。

this would take some time as it would be doing this one by one for all objects of json array, but question stands how to loop over it using this library, or isn't there such option. 这将需要一些时间,因为它将逐一为json数组的所有对象执行此操作,但问题是如何使用此库循环它,或者没有这样的选项。

Or are any other better options or libraries for this sorta job? 或者是这种工作的任何其他更好的选择或库?

One alternative here is to use the salsify/jsonstreamingparser 这里的一个替代方案是使用salsify/jsonstreamingparser

You need to create your own Listener. 您需要创建自己的监听器。

$testfile = '/path/to/file.json';
$listener = new MyListener();
$stream = fopen($testfile, 'r');
try {
    $parser = new \JsonStreamingParser\Parser($stream, $listener);
    $parser->parse();
    fclose($stream);
} catch (Exception $e) {
    fclose($stream);
    throw $e;
}

To make things simply to understand, I"m using this json for example: 为了简单地理解,我使用这个json作为例子:

JSON Input JSON输入

{
    "objects": [
    {
        "propertyInt": 1,
        "propertyString": "string",
        "propertyObject": { "key": "value" }            
    },
    {
        "propertyInt": 2,
        "propertyString": "string2",
        "propertyObject": { "key": "value2" }
    }]
}

You need to implement your own listener. 您需要实现自己的侦听器。 In this case, I just want to get the objects inside array. 在这种情况下,我只想获取数组中的对象。

PHP PHP

class MyListener extends \JsonStreamingParser\Listener\InMemoryListener
{
    //control variable that allow us to know if is a child or parent object
    protected $level = 0;

    protected function startComplexValue($type)
    {
        //start complex value, increment our level
        $this->level++;
        parent::startComplexValue($type);
    }
    protected function endComplexValue()
    {
        //end complex value, decrement our level
        $this->level--;
        $obj = array_pop($this->stack);
        // If the value stack is now empty, we're done parsing the document, so we can
        // move the result into place so that getJson() can return it. Otherwise, we
        // associate the value
        if (empty($this->stack)) {
            $this->result = $obj['value'];
        } else {
            if($obj['type'] == 'object') {
                //insert value to top object, author listener way
                $this->insertValue($obj['value']);
                //HERE I call the custom function to do what I want
                $this->insertObj($obj);
            }
        }
    }

    //custom function to do whatever
    protected function insertObj($obj)
    {
        //parent object
        if($this->level <= 2) {
          echo "<pre>";
          var_dump($obj);
          echo "</pre>";
        }
    }
}

Output 产量

array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(1)
    ["propertyString"]=>
    string(6) "string"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(5) "value"
    }
  }
}
array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(2)
    ["propertyString"]=>
    string(7) "string2"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(6) "value2"
    }
  }
}

I tested it against a JSON file with 166MB and it works. 我测试了它与166MB的JSON文件,它的工作原理。 Maybe you need to adapt the listener to your needs. 也许你需要让听众适应你的需要。

Another alternative is to use halaxa/json-machine . 另一种选择是使用halaxa / json-machine

Usage in case of iteration over JSON is the same as in case of json_decode , but it will not hit memory limit no matter how big your file is. 在通过JSON迭代的情况下的用法与json_decode情况相同,但无论文件有json_decode ,它都不会达到内存限制。 No need to implement anything, just your foreach . 不需要实施任何东西,只需要你的foreach

Example: 例:

$users = \JsonMachine\JsonMachine::fromFile('500MB-users.json');

foreach ($users as $id => $user) {
    // process $user as usual
}

See github readme for more details. 有关详细信息,请参阅github自述文件。

You still need to use json_decode and file_get_contents to get full JSON (you can't parse partial JSON). 您仍然需要使用json_decodefile_get_contents来获取完整的JSON(您无法解析部分JSON)。 Just increase memory limit for PHP to bigger value using ini_set('memory_limit', '500M'); 只需使用ini_set('memory_limit', '500M');将PHP的内存限制增加到更大的值ini_set('memory_limit', '500M');

Also you will be processing longer so use set_time_limit(0); 你也将处理更长时间,所以使用set_time_limit(0);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM