简体   繁体   中英

Which NoSQL DB (Azure Tables, Document DB, Mongo DB or others) is suitable for data analysis reports of Big Data?

I am working on an IOT project. In that we will be sending a JSON document of 15KB per minute from every device to DB. I thought of using Azure DocumentDB. I am little worried about Azure DocumentDB.

  1. Suppose there are 5000 IoT devices, what should be the expected RUs I have to purchase ?

  2. What happens when number of requests (Say customers count increased to 7000) increased above fixed Request Units, will it make the app slow?

  3. What happens when we execute long running queries in documentDB (like complex reports)?

  4. Can anyone please suggest if other NoSQL DBs which suits above profile ?

Thanks in advance

While there's no objective answer to which database for you to use, I can objectively answer your specific questions with Cosmos DB (DocumentDB API in your case, but applies to any of the supported APIs).

what should be the expected RUs I have to purchase ?

This will require some benchmarking by you, to determine how much RU to allocate. Each operation returns the RU cost, in the returned header. Based on that cost, you can calculate required RU for your sustained write workload. Also, you can lower the per-operation RU cost slightly if you change your indexing to use deferred indexing, vs consistent indexing.

What happens when number of requests (Say customers count increased to 7000) increased above fixed Request Units, will it make the app slow?

Typically, if you exceed the allotted RU, you will be throttled for a period of time. So, for example, if you have a 1000 RU setting, and you consume 3000 RU during an insert, you'll be throttled for about 2-3 seconds.

To avoid (or minimize) such throttling, you can enable per-minute RU burst, which is a 10x RU buffer (which applies to a 60-second interval). In my previous example of 1000 RU, that would give you 10,000 RU headroom, spread out across a 1-minute period. That way, if you have transient RU spikes pushing you over your allotted baseline RU, you'd have reserve RU to consume, to prevent throttling.

What happens when we execute long running queries in documentDB (like complex reports)?

Not sure exactly what you mean by long-running queries but... It's just like I described above: If you consume more than your RU/sec, you'll be throttled before your next query can be run (unless you enable per-minute RU).

Regarding your questions David has all the answers. I would like to zoom in on number four a bit.

IMHO you are asking the wrong questions. (NoSQL) DBs are just for storage (BTW, why limit to just NoSQL stores anyway?). Instead focus on what analysis you are interest in and dive into services that offer that type of analysis like Azure Data Lake Analytics, Azure Stream analytics etc. If you have a clear picture of what analysis you require it is easier to determine what Azure service is the best fit.

I suggest writing down your exact requirements and then think of the right type of storage, and do not limit your scope to just NoSQL services. There is also Azure Data Warehouse and Azure Analysis Services, Blob storage and such.

Analysis tools like Power BI or Azure Data Lake Analytics can handle many types of azure DBs and storages

By the way, there is also Azure IoT guidance than can be found at https://azure.microsoft.com/en-us/services/iot-hub/

EDIT : I know this might not be the type of answer that addresses maybe all the questions in an extended way like Davids answer but in my opinion to get the right type of storage you need to know what kind of analysis needs to be performed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM