[英]Best possible way to extract data from news web API in 'Near Real Time' on Big Data Platform
I have an use case, where in the first step is ingestion of data from news API's or news aggregator API's , into HDFS. 我有一个用例,其中第一步是将新闻API或新闻聚合器API的数据提取到HDFS中。 This data fetch is to be done on a NRT basis(say every 15 mins) Presently I am working on 2 approaches:
此数据获取将在NRT的基础上完成(例如每15分钟一次)目前,我正在研究2种方法:
It would be great to have few more suggestions for an approach which would be platform independent and could be used across different Hadoop Distributions(Cloudera,HW etc). 希望有更多关于该方法的建议,该方法应独立于平台并且可以在不同的Hadoop发行版(Cloudera,HW等)中使用。
Thanks. 谢谢。
Apache NiFi can definitely handle your process, and it works well on Windows, MacOS, and most Linux distributions (I've run it on Ubuntu, Redhat, CentOS, Amazon Linux, and Raspbian). Apache NiFi绝对可以处理您的过程,并且可以在Windows,MacOS和大多数Linux发行版上很好地运行(我已经在Ubuntu,Redhat,CentOS,Amazon Linux和Raspbian上运行了它)。 It doesn't need Hadoop but can work with either Hortonworks or Cloudera Hadoop distributions.
它不需要Hadoop,但可以与Hortonworks或Cloudera Hadoop发行版一起使用。
I built an RSS viewer with NiFi that fetched, extracted, and saved RSS to disk using GetHTTP -> TransformXML -> PutFile . 我使用NiFi构建了RSS查看器,并使用GetHTTP- > TransformXML- > PutFile将RSS提取,提取并保存到磁盘。 NiFi then listended for browser requests and returned the RSS as an HTML table using HandleHttpRequest -> GetFile -> TransformXML -> HandleHttpResponse .
NiFi然后listended的浏览器请求和返回的RSS作为使用HTML表格HandleHttpRequest - > 的GetFile - > 的TransformXML - > HandleHttpResponse 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.