简体   繁体   English

在PHP中的JSON文件中是否有更高效(更快)的搜索和匹配值的方法?

[英]Is there a more efficient (faster) way of searching and matching values in a JSON file in PHP?

I have an API that allows the search of information based on an order number like so: 我有一个API,允许根据订单号搜索信息,如下所示:

file_get_contents('https://www.example.com/api/status.php?os='.$orderId)

status.php has the following code: status.php具有以下代码:

if (isset($_GET['os'])) { $orderId = strip_tags(htmlspecialchars($_GET['os'])); }

if (isset($orderId)) {

    try {

        $array = \JsonMachine\JsonMachine::fromFile($file);
        // using JSON Machine https://github.com/halaxa/json-machine

        $result = getOrderStatus($orderId, $array);

        if (empty($result)) {

            http_response_code(206);

        } else {

            echo json_encode($result, JSON_UNESCAPED_UNICODE);

        }

    } catch (Exception $e) {

        echo json_encode(array(
            'error' => array(
                'code' => $e->getCode(),
                'message' => $e->getMessage()
            )
        ));

    }

} 

The above loads the json file (currently 1.4MB, approx 3000 objects, 15 key/value pairs per object) and passes the whole decoded json to the getOrderStatus function: 上面加载json文件(当前1.4MB,大约3000个对象,每个对象15个键/值对)并将整个解码的json传递给getOrderStatus函数:

function getOrderStatus($orderId, $array) {

    $resultArray = array();

    foreach ($array as $val) {

        $oid = explode(' ', $val['ORDER TITLE']);
        $oid = $oid[0];
        $oid = explode('_', $oid);
        $oid = $oid[0];

        if ($oid == $orderId) {

            $resultArray['status'] = $val['STAT ID'];

            $resultArray['email'] = $val['STAT CTRL'];

            $resultArray['mzId'] = $val['ORDER ID'];

            $resultArray['mcId'] = $val['COMPANY ID'];

            $resultArray['count'] = $val['NUM PCS'];

        }

    }

    return $resultArray;

}

if ($oid == $orderId) matches whether the id from the API matches the value of the key "ORDER TITLE", in which case the rest of the information in that object is relevant to the search. if($ oid == $ orderId)匹配来自API的id是否与键“ORDER TITLE”的值匹配,在这种情况下,该对象中的其余信息与搜索相关。

Here is an example of two objects from the JSON file: 以下是JSON文件中两个对象的示例:

[
   {
      "INTERNAL ID": "914693",
      "ORDER TITLE": "0108491 A_PRODUCT_NAME",
      "COMPANY ID": "",
      "STAT ID": "1.2",
      "STAT CTRL": "example@example.com",
      "POST ID": "Post",
      "SML": "Transfer",
      "UPDATE": "17.06.2019 10:52:45",
      "TOTAL": "0",
      "NUM PCS": "1",
      "PAID": "",
      "TEXT": "",
      "PROBLEM": ""
   },
   {
      "INTERNAL ID": "914694",
      "ORDER TITLE": "0108494 A_PRODUCT_NAME",
      "COMPANY ID": "",
      "STAT ID": "1.2",
      "STAT CTRL": "example@example.com",
      "POST ID": "Post",
      "SML": "Transfer",
      "UPDATE": "17.06.2019 10:52:45",
      "TOTAL": "0",
      "NUM PCS": "1",
      "PAID": "",
      "TEXT": "",
      "PROBLEM": ""
   }
]

The problem is that I have no control over the format of the JSON file, since it is an export from external software. 问题是我无法控制JSON文件的格式,因为它是从外部软件导出的。

Currently, I am searching and matching approx. 目前,我正在搜索和匹配约。 100 order numbers. 100个订单号。 This means that with the above code, for each order number - the API is called, the JSON file needs to be opened, each of 3000 objects tested for the match, array with info returned, all 100 times. 这意味着使用上面的代码,对于每个订单号 - 调用API,需要打开JSON文件,测试匹配的每个3000对象,返回信息的数组,全部100次。 This is why I think the process takes 40 seconds. 这就是为什么我认为这个过程需要40秒。 While this is bearable, there could be as much as 10x more objects in the JSON file in the future. 虽然这是可以忍受的,但未来JSON文件中可能会有多达10倍的对象。

What can do to make the search faster? 有什么办法可以加快搜索速度? I was thinking of calling the API with an array of order numbers, then opening the file once and matching each order number. 我想用一个订单号数组调用API,然后打开文件一次并匹配每个订单号。 Would this be the correct approach? 这是正确的做法吗?

What I ended up doing may be overly complicated, but serves the purpose well and should be resilient to the main json file getting larger in the future. 我最终做的可能过于复杂,但很好地服务于目的,并且应该能够适应未来更大的主json文件。 For some background: The web app is used for an overview of orders and detailed information about the orders, which means touching a few local and external APIs. 对于某些背景:Web应用程序用于订单概述和订单的详细信息,这意味着触摸一些本地和外部API。 Approx. 约。 5 people (5 browsers) may be using the app at any given time. 5人(5个浏览器)可能在任何给定时间使用该应用程序。

External software uploads a large json file with most of the information via ftp. 外部软件通过ftp上传大量json文件和大部分信息。

To automate the entire process, I found out about incron , which is similar to cron , but instead of acting on time, it acts on file/directory manipulation (creation, update, deletion, etc.). 为了使整个过程自动化,我发现了与cron类似的incron ,但它不是按时间行动,而是作用于文件/目录操作(创建,更新,删除等)。 This of course requires ssh and root access to the local server. 这当然需要ssh和root访问本地服务器。

sudo apt install incron

incrontab -e

/route/to/json/file/big.json IN_CLOSE_WRITE /usr/bin/php /route/to/php/load.php

load.php runs every time big.json is changed. 每次更改load.php时, big.json load.php运行。

To shorten the load time, I created a database table with columns based on big.json . 为了缩短加载时间,我创建了一个包含基于big.json列的数据库表。 load.php reads big.json and saves each object as a row in the database table. load.php读取big.json并将每个对象保存为数据库表中的一行。 The part of the code that provides this info via API now reads that info from the database instead of big.json , as suggested in the comments. 通过API提供此信息的代码部分现在从数据库而不是big.json读取该信息,如评论中所建议的那样。 This brought down the time to 22s, instead of 44s. 这使得时间减少到22秒,而不是44秒。

Although just converting json to mysql helped, I thought it was unnecessary for the individual browsers to keep repeating the load process, so instead incron does pretty much the same thing, but once and only upon file change. 虽然只是将json转换为mysql帮助,但我认为个别浏览器不必重复加载过程,因此incron ,但只有在文件更改时才会这样做。

load.php can filter unneeded data and also loads all external APIs and combines the data in correlation to the data from big.json , then saves everything to a new file called data-full.json . load.php可以过滤不需要的数据,并加载所有外部API,并将相关数据与big.json的数据相结合,然后将所有内容保存到名为data-full.json的新文件中。 The size of this resulting file is about 16x smaller than big.json . 此结果文件的大小比big.json小16倍。 The browser app with datatables.js then loads this file in a matter of milliseconds. 然后,带有datatables.js的浏览器应用程序会在几毫秒内加载此文件。

So in short: 简而言之:

  • Load large json file only upon its update with incron 仅在使用incron更新时加载大型json文件
  • Convert large json file to mysql 将大型json文件转换为mysql
  • Create a new json file combining filtered mysql data and external API data 创建一个新的json文件,结合过滤的mysql数据和外部API数据
  • Ajax call the "static" new json file in datatables.js Ajax在datatables.js调用“静态”新json文件
  • Add timestamp of when the new json file was last updated in php with filemtime() 使用filemtime()添加上次在php更新新json文件的时间戳

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM