简体繁体 English

hadoop-从很大的序列文件中获取数据的最佳方法是什么？

[英]hadoop - what is the best way to fetch data from a very big sequence file?

原文 2012-07-05 09:20:25 2 1 hadoop/ hive/ bigdata

I have a very big hadoop sequence file in the hdfs. 我在hdfs中有一个很大的hadoop序列文件。 what is the best way to fetch data from it? 从中获取数据的最佳方法是什么？ ie, select records and etc.. 即选择记录等。

can it be done by hive? 可以通过蜂巢完成吗？ how can i create a table in hive from a sequence file? 如何从序列文件在蜂巢中创建表？

thanks 谢谢

1 个解决方案

If you need 'quick' access to the data you should either consider loading the data into a datastore of some sort (DB or a noSQL store such as HBase, Accumulo). 如果您需要对数据的“快速”访问，则应考虑将数据加载到某种数据存储中（DB或NoSQL存储，例如HBase，Accumulo）。

Another option (if you can re-write your data) is to look into using a MapFile - this creates an index for the keys in your sequence file and provides quicker access to the data compared to full file scanning. 另一个选择（如果可以重写数据的话）是使用MapFile调查 -与完整文件扫描相比，这将为序列文件中的键创建索引，并提供对数据的更快访问。

Otherwise if you want to use Hive, there's a thread on the hive mailing list about this exact subject: 否则，如果您想使用Hive，则hive邮件列表中有一个与此主题相关的主题：

http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00173.html http://www.mail-archive.com/hive-user@hadoop.apache.org/msg00173.html

使用Informatica将数据从Terdata提取到Hadoop的最佳方法是什么？ - What is the best way to ingest data from Terdata into Hadoop with Informatica?

什么是hadoop中的序列文件？ - What is sequence file in hadoop?

如何从Hadoop序列文件中提取数据？ - How to extract data from Hadoop sequence file?

测试hadoop的最佳方法是什么？ - What is the best way to test hadoop?

在hadoop hdfs中查看数据格式的最佳方法是什么？ - What is best way to see data format in hadoop hdfs?

使用ES存储大数据并创建即时搜索的最佳方法是什么？ - What is the best way to store big data and create instant search with ES?

记录大数据以使用Hadoop组织和存储并使用Hive查询的正确方法是什么？ - What's the proper way to log big data to organize and store it with Hadoop, and query it using Hive?

Hadoop大数据文件文本搜索 - Hadoop Big Data File text search

了解 Hadoop 生态系统的最佳方式是什么 - What is the best way to learn about the Hadoop ecosystem

在 Hadoop 上运行 Lucene/Solr 的最佳方法是什么？ - What is the best way to run Lucene/Solr on Hadoop?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Informatica将数据从Terdata提取到Hadoop的最佳方法是什么？ - What is the best way to ingest data from Terdata into Hadoop with Informatica? 什么是hadoop中的序列文件？ - What is sequence file in hadoop? 如何从Hadoop序列文件中提取数据？ - How to extract data from Hadoop sequence file? 测试hadoop的最佳方法是什么？ - What is the best way to test hadoop? 在hadoop hdfs中查看数据格式的最佳方法是什么？ - What is best way to see data format in hadoop hdfs? 使用ES存储大数据并创建即时搜索的最佳方法是什么？ - What is the best way to store big data and create instant search with ES? 记录大数据以使用Hadoop组织和存储并使用Hive查询的正确方法是什么？ - What's the proper way to log big data to organize and store it with Hadoop, and query it using Hive? Hadoop大数据文件文本搜索 - Hadoop Big Data File text search 了解 Hadoop 生态系统的最佳方式是什么 - What is the best way to learn about the Hadoop ecosystem 在 Hadoop 上运行 Lucene/Solr 的最佳方法是什么？ - What is the best way to run Lucene/Solr on Hadoop?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM