简体   繁体   English

使用 hadoop + mapreduce 在 couchdb 中处理数据

[英]Processing data in couchdb with hadoop + mapreduce

I have a very very large quantity of data in CouchDB, but I have very recently found out how crippled the mapreduce functions in couch are (no chaining).我在 CouchDB 中有非常大量的数据,但我最近发现 couch 中的 mapreduce 函数有多么严重(没有链接)。

So I had this idea of running map reduce queries from the CouchDB database using Hadoop, and hopefully storing the final result in another CouchDB database?所以我有了使用 Hadoop 从 CouchDB 数据库运行 map reduce 查询的想法,并希望将最终结果存储在另一个 CouchDB 数据库中?

Is this too crazy?这也太疯狂了吧? I know I can set up Hbase to do this, but I do not want to migrate my data from CouchDB to Hbase.我知道我可以设置 Hbase 来执行此操作,但我不想将我的数据从 CouchDB 迁移到 Hbase。 And I love couch as a data store.我喜欢沙发作为数据存储。

Apparently CouchDB is supposed to be able to stream data to Hadoop via Sqoop , but I didn't see any other information than that link.显然 CouchDB 应该能够通过 Sqoop将数据流式传输到Hadoop ,但除了该链接之外,我没有看到任何其他信息。 Worst case, you can write your own input reader to read from CouchDB, or export your data regularly and throw it onto HDFS and run it from there.最坏的情况是,您可以编写自己的输入读取器以从 CouchDB 读取,或者定期导出数据并将其扔到 HDFS 上并从那里运行。

The MapReduce functions in CouchDB are constrained to simplify caching of the results. CouchDB 中的 MapReduce 函数被限制为简化结果的缓存。 Rather than having to search for views that are impacted by a change, views were designed to be self-contained.视图被设计为独立的,而不是必须搜索受更改影响的视图。

This means that if you have complex MapReduce code, you can use a tool like CouchApp to embed functions within a MapReduce function.这意味着如果您有复杂的 MapReduce 代码,您可以使用 CouchApp 之类的工具在 MapReduce 函数中嵌入函数。 I'm having trouble finding the reference for this, but you the macro !code to embed JavaScript functions in views.我无法找到此参考,但您可以使用宏!code在视图中嵌入 JavaScript 函数。 Using require() or // !json, !code in CouchDB? 在 CouchDB 中使用 require() 或 // !json, !code?

This could help to get some of the productivity benefit of chaining without chaining, by putting most of the code in shared functions, and merely calling the function in the different views.通过将大部分代码放在共享函数中,并且仅在不同视图中调用函数,这有助于获得链接而不链接的一些生产力优势。 For the performance benefit of chaining, if that's what you're after, you may be better off just moving to HBase.对于链接的性能优势,如果这就是您所追求的,那么您最好只迁移到 HBase。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM