简体   繁体   中英

Should I use a key/value database to store my API logs?

I get a lot of logs from my API. I analyse those logs to get interesting information like how many users for the API in this month or what type of activities they do.

All of the analysis I do depend on a period. So the timestamp is very important for me.

In fact, actually I use indexes on the timestamp. The problem is that timestamp is continue.

My question is which database is the more appropriate for my use case?

I heard about key/value databases, is it interesting to use the timestamp as a key?

Thanks.

This is a two-year-old article from IBM that talks more about SQL implementation, but it is also possibly something to keep in mind when you do a NoSQL implementation:

Of course, your app would be different, I'm not sure of the granularity of your time-stamping, but it is possible to have two items logfiled at the same timestamp.

You might be better off creating some other form of unique key algorithm for your key-value store, adding some sort of serialization per timestamp. So the first item at a timestamp is ".1", the second ".2", etc. So you'd have some sort of timestamp.serialid format.

The other thought I have is: are you merging API log files from multiple applications/processes or machines? You might be able to do some sort of elementid.appid.timestamp.serialid to make unique key.

It all depends on your use case, so I can't say more for sure. I also wonder what you want to do with your key-value store in terms of reads/analysis after-the-fact, as that might highly alter your NoSQL solution. If you are planning to do a lot of log analysis, then, yes, there's a good reason to put that into a NoSQL database, especially if you want to do something like fast analysis of data, and then push some of the older items back into disk for storage.

As for databases, obviously each vendor will stick up for their product; but choose the best tool for the job. Best to try before you buy, and test things out for your specific setup. I'm from Aerospike, so I'm obviously biased towards it as a Key-Value store: http://www.aerospike.com/


Talked to a Very Smart Guy today, and he also suggested that you might want to use something like "milliseconds since date-time 'x'" as a primary key. Depending on what you are logging, there might still be a chance of collision with that as a primary key.

Therefore, another suggestion would be to take all entries for that primary key (ex: all log entries for that millisecond) and load them into the same record, in a kind of "bucket." You'd need application logic to parse out the multiple log entries under the same primary key, but that's another way to skin the cat.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM