简体   繁体   English

如何在HDFS中存储推文?

[英]How to Store Tweets in HDFS?

How to Store particular website tweets in HDFS ? 如何在HDFS中存储特定的网站推文?

Suppose one website www.abcd.com and I want to collect all user's tweet for this website and stored into HDFS or Hive. 假设一个网站www.abcd.com,我想收集该网站的所有用户推文并存储到HDFS或Hive中。

Flume and sqoop also helpful for storing data. Flume和Squoop还有助于存储数据。

so anyone please suggest me how flume and sqoop work in storing tweets in HDFS? 所以有人请建议我,在HDFS中存储鸣叫时,水槽和浓汤是如何工作的?

Sqoop was not made for this purpose. Sqoop并非出于此目的。 Flume is used for these kind of needs. Flume用于此类需求。 You can write your custom Flume source that will pull the tweets and dump them into your HDFS. 您可以编写您的自定义Flume源代码,该源代码将拉动这些推文并将其转储到HDFS中。 See this for example. 参见示例。 It shows how to use Flume to collect data from the Twitter Streaming API, and forward it to HDFS. 它显示了如何使用Flume从Twitter Streaming API收集数据,并将其转发到HDFS。

You can find more in the official documentation . 您可以在官方文档中找到更多信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM