简体   繁体   English

水槽流入天气数据

[英]Flume to stream in weather data

I am new to flume. 我是新来的水槽。 but i want to stream in weather data form any website to my hdfs location. 但我想将任何网站的天气数据流式传输到我的hdfs位置。 so i have created the sink, source and channel...as below 所以我创建了接收器,源和通道...如下

weather.channels= memory-channel
weather.channels.memory-channel.capacity=10000
weather.channels.memory-channel.type = memory
weather.sinks = hdfs-write
weather.sinks.hdfs-write.channel=memory-channel
 weather.sinks.hdfs-write.type = logger
 weather.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/hadoop/flume
weather.sinks.hdfs-write.rollInterval = 1200
weather.sinks.hdfs-write.hdfs.writeFormat=Text
weather.sinks.hdfs-write.hdfs.fileType=DataStream
weather.sources= Weather
weather.sources.Weather.bind =  api.openweathermap.org/data/2.5/forecast/city?id=524901&APPID=********************************
weather.sources.Weather.channels=memory-channel
weather.sources.Weather.type = netcat
weather.sources.Weather.port = 80

so i am using here API to work with this. 所以我在这里使用API​​来工作。 What else i can use to stream in weather data, what online website can i use, or which API i should use to configure the source? 我还可以使用什么来流式传输天气数据,我可以使用哪个在线网站,或者应该使用哪个API来配置源? While executing the flume-ng command to start the agent i am getting following 在执行flume-ng命令以启动代理时,我正在关注

15/03/18 11:13:28 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner:{
 source:org.apache.flume.source.http.HTTPSource{name:Weather,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Running HTTP Server found in 
source:Weather before I started one.Will not attempt to start.
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:189)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 
C15/03/18 11:13:31 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10
15/03/18 11:13:31 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
15/03/18 11:13:31 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel stopped

The "lyfecycle" error you see is the cause of a previous error trying to start the http server. 您看到的“ lyfecycle”错误是试图启动http服务器的先前错误的原因。

The original error is likely due to trying to bind to the priviledged 80 port with non root user. 原始错误可能是由于尝试以非root用户身份绑定到特权80端口所致。 Change the port to >1024, eg 8080 将端口更改为> 1024,例如8080

However, it won't work as you are trying to use. 但是,它在您尝试使用时不起作用。 A http or netcat source listens to calls, doesn't go an fetch the url you are setting in bind. http或netcat源侦听呼叫,不会获取您在bind中设置的URL。

I see two options: 我看到两个选择:

  1. Create a linux daemon to go a wget or curl to that url at regular intervals, save the result to a file and then configure flume with the spool source. 创建一个Linux守护进程,以定期间隔执行wget或curl到该URL,将结果保存到文件中,然后使用假脱机源配置flume。
  2. Create your own Flume source that pools that url at regular intervals 创建自己的Flume源,该源定期池化URL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用FLUME将数据存储在Hadoop中 - Using FLUME to store data in Hadoop 如何使用Flume将数据从一个系统传输到另一系统的HDFS(通过LAN连接)? - How to transfer data from one system to another system's HDFS (connected through LAN) using Flume? Flume传输gzip文件 - Flume transferring gzip files 错误node.PollingPropertiesFileConfigurationProvider:无法加载配置数据。 例外如下。 org.apache.flume.FlumeException: - ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows. org.apache.flume.FlumeException: Flume-twitter流API - Flume-twitter streaming API Apache Flume自定义拦截器-二进制和奇怪的HDFS文件 - Apache flume custom interceptor - HDFS file in binary and strange 使用Flume将CSV文件加载到HDFS中(假脱机目录作为源) - Loading csv file into HDFS using Flume (spool directory as source) Flume HDFS Sink无法从Kafka通道在hdfs中创建文件 - Flume HDFS sink is not creating files in hdfs from Kafka channel 在所有用例中,带有自定义接收器的Spark Streaming是否可以代替Flume? - Is Spark Streaming with a custom receiver a more generalized replacement for Flume in all use cases? hadoop流,如何设置分区? - hadoop stream, how to set partition?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM