简体   繁体   English

像SQL的Round Robin表跟踪最近的活动

[英]Round Robin like SQL table for tracking recent activity

We need to track user activity in different time periods like 24 hours, 7 days etc. We don't anticipate a very large number of different periods but the numbers of users will be very large, probably in the millions. 我们需要跟踪不同时间段(例如24小时,7天等)中的用户活动。我们预计不会有很多不同的时间段,但是用户数量会非常大,可能会达到数百万。 Nightly cronjob to summarize the stats for each user doesn't sound reasonable. 每晚cronjob总结每个用户的统计信息听起来并不合理。 I know in the past I've tracked network usage like this with RRD tables but those were just BerkeleyDB's and had to be one file per statistic which wouldn't work, but that idea seems like what I'm after. 我知道过去我曾使用RRD表跟踪这样的网络使用情况,但这些只是BerkeleyDB的,每个统计数据必须是一个文件,这是行不通的,但是这个想法似乎是我的想法。 Is there a pattern/best practice that I'm overlooking? 我有没有忽略的模式/最佳实践?

It depends on which architecture you want to use and which hardware you can afford. 这取决于您要使用哪种架构以及可以负担的硬件。

For massive data analysis I would go for a Cluster-based framework like Hadoop: and build map/reduce functions which will treat your data. 对于海量数据分析,我将使用像Hadoop这样的基于集群的框架:并构建将处理您的数据的map / reduce函数。

see http://hadoop.apache.org/ . 参见http://hadoop.apache.org/

User activities can be stored in dailiy files to be uploaded to the Hadoop cluster and then processed. 用户活动可以存储在dailiy文件中,然后上传到Hadoop集群,然后进行处理。

Such solutions can provide you with the necessary scalability with commodity only hardware required. 这样的解决方案可以通过仅需要商品的硬件为您提供必要的可伸缩性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM