简体   繁体   English

AWS中用于Data Lake架构的数据目录和元数据管理

[英]Data catalog and Meta data management in AWS for a Data Lake architecture

We are setting up a data platform loosely based on the Data Lake architecture. 我们正在基于Data Lake架构松散地建立一个数据平台。 We are evaluating candidates that provide centralized data catalog and meta-data management and tagging. 我们正在评估提供集中式数据目录以及元数据管理和标记的候选人。 Glue seems very promising, but it's still not out for public consumption, so we looked into 胶水看起来非常有前途,但仍不适合公众消费,因此我们调查了

  • Ground 地面
  • Waterline 吃水线
  • Zaloni 扎洛尼

Ground is fairly DYI. 地面是相当DYI。 It seems we have to extend it extensively to make it work for us. 似乎我们必须对其进行广泛扩展,以使其对我们有用。 (Scavenging from S3, Writing to Titan) (从S3清除,写到Titan)

Waterline and Zaloni are packaged full-blown solutions that might not be what we are looking for since we prefer open-sources, point solutions. Waterline和Zaloni是打包的完整解决方案,可能不是我们想要的,因为我们更喜欢开源的点解决方案。

Are there are any alternatives that we should look for? 我们应该寻找其他选择吗? We like the MetaModel available in Ground and are leaning towards using this with Kinesis schema management. 我们喜欢Ground中提供的MetaModel,并倾向于将其与Kinesis模式管理一起使用。

It might be worth reconsidering the DIY route. 可能值得重新考虑自己动手做的路线。 You'll be wasting a lot of time on building the product you want, and supporting it, instead of using it. 您会浪费大量时间来构建所需的产品并提供支持,而不是使用它。 I know it's a little marketing fluff, but Zaloni's page says 650% ROI vs. build your own. 我知道这有点营销上的毛病,但Zaloni的页面上说与自己打造相比,投资回报率为650%。 There's got to be at least a little something in that. 至少要有一些东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM