We are planning to use HBase in one of our projects.
We are getting some browse information from our internal systems, the data format is below.
Our requirement is we have to develop 3 different types of searches
I am thinking to create 3 HBase tables like
If I go with the above approach it will cost us lot of space to store this data.
S IP DateTime Method URL - ResponseCode - D IP -
176.204.134.111 20140421093842 GET http://googleads.g.doubleclick.net/pagead/adview?ai=CAbmt4K5UU47XB5GS8wPOi4C4CKH1-ZwCkbiU7inAjbcBEAEgptSKH1D0-ev7B2CRdsgBAakC4V3k_lZFkj6oAwHIA4oEqgSQAU_QtfygurroekV-h5dYCoVP70qKDV1sAkiI60NNZiQ1wICQkqb5XMC3TllLKrhD0KxX0kb9-LnGkCDTqGmDE3Do-UdLGIyluqQ7MwoAcuTJMUajYKOflKPd2ZDj6RlKUAI9pbdkb96-k-XTVpON9rjUM2vUkvjwW3BwSfQk656GjoyUcEwsjwWId7p7obHcTsAEqf_DzQKSBQQIBBgBkgUECAUYBJAGAdgGAoAHueeCC5gHAQ&sigh=7zrG0DRVvMA 0 TCP_MISS/200 - 173.194.66.155 - 0
2.50.165.129 20140421093842 GET http://www.alquds.co.uk/wp-content/uploads/2014/04/1217.jpg 0 TCP_MISS/200 - 46.165.251.78 - 0
What is a good schema design for these above requirements?
Consider using OpenTSDB , which is optimized for the storage of small key-value time series data.
Even if you don't choose to use it, definitely read this slide deck discussing the schema design decisions that went into it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.