如何在arangodb中处理大量带有时间戳的数据？

Question

i am new to handling a lot of data.我是处理大量数据的新手。

Every 100ms i write actually 4 json blocks to my arangodb in a collection.每 100 毫秒，我实际上在一个集合中向我的 arangodb 写入 4 个 json 块。

the content of the json ist something like that: json 的内容是这样的：

{
  "maintenence": {
    "holder_1": 1,
    "holder_2": 0,
    "holder_3": 0,
    "holder_4": 0,
    "holder_5": 0,
    "holder_6": 0
  },
  "error": 274,
  "pos": {
    "left": [
      21.45, // changing every 100ms
      38.36, // changing every 100ms
      10.53 // changing every 100ms
    ],
    "center": [
      0.25, // changing every 100ms
      0, // changing every 100ms
      2.42 // changing every 100ms
    ],
    "right": [
      0, // changing every 100ms
      0, // changing every 100ms
      0 // changing every 100ms
    ]
  },
  "sub": [
    {
      "type": 23,
      "name": "plate 01",
      "sensors": [
        {
          "type": 45,
          "name": "sensor 01",
          "state": {
            "open": 1,
            "close": 0,
            "middle": 0
          }
        },
        {
          "type": 34,
          "name": "sensor 02",
          "state": {
            "on": 1
          }
        }
      ]
    }
  ],
  "timestamp": "2018-02-18 01:56:08.423",
  "device": "12227225"
}

every block is another device每个块都是另一个设备

In only 2 days there are ~6 million of datasets in the collection.仅在 2 天内，集合中就有约 600 万个数据集。

if i want to get data to draw a line graph from "device 1 position left[0]"如果我想获取数据以从“设备 1 位置左侧 [0]”绘制折线图

with:和：

FOR d IN device
FILTER d.timestamp >= "2018-02-18 04:30:00.000" && d.timestamp <= "2018-02-18 04:35:00.000"
RESULT d.pos.left[0]

It tooks a veeeeeery long time so search in this ~6 million datasets.花了很长时间，所以在这个约 600 万个数据集中进行搜索。

My question is: is this normal and only machine power can fix this problem or is my way to handle this set of data wrong?我的问题是：这是正常的，只有机器电源才能解决这个问题，还是我处理这组数据的方式有误？

I think ~6 million datasets is not BIG DATA, but i think if i fail with this, how can i handle this if i add 50 more devices collect it not 2 days but 30 days.我认为大约 600 万个数据集不是大数据，但我认为如果我失败了，如果我再添加 50 个设备，我将如何处理它，而不是 2 天而是 30 天。

Answer 1

converting the timstamps to unix timestamp (number) helps alot.将时间戳转换为 unix 时间戳（数字）有很大帮助。

i added a skiplist index over timestamp & device.我在时间戳和设备上添加了一个跳过列表索引。

Now, with 13 million datatsets my query runs 920ms.现在，我的查询有 1300 万个数据集，运行时间为 920 毫秒。

Thank u!感谢你！

如何在arangodb中处理大量带有时间戳的数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-27 10:19:50

如何在arangodb中处理大量带有时间戳的数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-27 10:19:50

解决方案1
1 已采纳 2018-02-27 10:19:50