i am new to handling a lot of data.
Every 100ms i write actually 4 json blocks to my arangodb in a collection.
the content of the json ist something like that:
{
"maintenence": {
"holder_1": 1,
"holder_2": 0,
"holder_3": 0,
"holder_4": 0,
"holder_5": 0,
"holder_6": 0
},
"error": 274,
"pos": {
"left": [
21.45, // changing every 100ms
38.36, // changing every 100ms
10.53 // changing every 100ms
],
"center": [
0.25, // changing every 100ms
0, // changing every 100ms
2.42 // changing every 100ms
],
"right": [
0, // changing every 100ms
0, // changing every 100ms
0 // changing every 100ms
]
},
"sub": [
{
"type": 23,
"name": "plate 01",
"sensors": [
{
"type": 45,
"name": "sensor 01",
"state": {
"open": 1,
"close": 0,
"middle": 0
}
},
{
"type": 34,
"name": "sensor 02",
"state": {
"on": 1
}
}
]
}
],
"timestamp": "2018-02-18 01:56:08.423",
"device": "12227225"
}
every block is another device
In only 2 days there are ~6 million of datasets in the collection.
if i want to get data to draw a line graph from "device 1 position left[0]"
with:
FOR d IN device
FILTER d.timestamp >= "2018-02-18 04:30:00.000" && d.timestamp <= "2018-02-18 04:35:00.000"
RESULT d.pos.left[0]
It tooks a veeeeeery long time so search in this ~6 million datasets.
My question is: is this normal and only machine power can fix this problem or is my way to handle this set of data wrong?
I think ~6 million datasets is not BIG DATA, but i think if i fail with this, how can i handle this if i add 50 more devices collect it not 2 days but 30 days.
converting the timstamps to unix timestamp (number) helps alot.
i added a skiplist index over timestamp & device.
Now, with 13 million datatsets my query runs 920ms.
Thank u!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.