简体   繁体   English

在GAE上保存数据:日志记录与数据存储

[英]Saving Data on GAE: logging vs. datastore

I have a google app engine app that has to deal with a lot of data collecting. 我有一个Google App Engine应用程序,它必须处理大量数据收集。 The data I gather is around millions of records per day. 我每天收集的数据约为数百万条记录。 As I see it, there are two simple approaches to dealing with this in order to be able to analyze the data: 如我所见,有两种简单的方法可以处理此数据,以便能够分析数据:

    1. use logger API to generate app engine logs, and then try to load these up to a big query (or more simply export to CSV and do the analysis with excel). 1.使用logger API生成应用程序引擎日志,然后尝试将其加载到大型查询中(或更简单地导出为CSV并使用excel进行分析)。
    2. saving the data in the app engine datastore (ndb), and then download that data later / try to load that up to big query. 2.将数据保存在App Engine数据存储区(ndb)中,然后稍后下载该数据/尝试将其加载到大型查询中。

Is there any preferable method of doing this? 有什么更好的方法吗?

Thanks! 谢谢!

BigQuery has a new Streaming API , which they claim was designed for high-volume real-time data collection. BigQuery拥有一个新的Streaming API ,他们声称该API是为大批量实时数据收集而设计的。

Advice from practice: we are currently logging 20M+ multi-event records a day via a method 1. as described above. 实践建议:如上所述,我们目前每天通过方法1记录20M +多事件记录。 It works pretty well, except when the batch uploader is not called (normally every 5min), then we need to detect this and re-run the importer. 它工作得很好,除了不调用批处理上传器(通常每5分钟)时,我们需要检测到此情况并重新运行导入器。 Also, we are currently in process of migrating to new Streaming API, but is not yet in production so I can't say how reliable it is. 另外,我们目前正在迁移到新的Streaming API,但尚未投入生产,因此我不能说它的可靠性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM