簡體   English   中英

Flume NoSuchMethodError 將 Twitter 數據拉入 HDFS

[英]Flume NoSuchMethodError pulling Twitter data into HDFS

由於我無法擺脫的錯誤,我無法使用 Flume 將 Twitter 數據提取到 HDFS 中。

命令:

bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent

安慰:

2020-12-14 11:38:08,662 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:154)] Unhandled error
java.lang.NoSuchMethodError: 'boolean twitter4j.conf.Configuration.isStallWarningsEnabled()'
    at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
    at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
    at org.apache.flume.source.twitter.TwitterSource.configure(TwitterSource.java:110)
    at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
    at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:325)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:105)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

flume-env.sh:我手動將flume-sources-1.0-SNAPSHOT.jar添加到flume/lib中。

 export JAVA_HOME=/usr/lib/jvm/default-java
 export JAVA_OPTS="-Xms500m -Xmx2000m -Dcom.sun.management.jmxremote"
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "

FLUME_CLASSPATH="/home/jb/flume/lib/flume-sources-1.0-SNAPSHOT.jar"

twitter.conf:

# Naming the components on the current agent. 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS
  
# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx 
TwitterAgent.sources.Twitter.accessToken = xxx 
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = tutorials point,java, bigdata, mapreduce, mahout, hbase, nosql
  
# Describing/Configuring the sink 

TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 
TwitterAgent.sinks.HDFS.hdfs.minBlockReplicas = 1
 
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100 
TwitterAgent.channels.MemChannel.transactionCapacity = 100
  
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel

操作系統:Ubuntu Flume:v1.9.0 Hadoop:v3.3.0

我設法讓它工作。 想了解的小伙伴們可以看看這篇。

首先,更改 Flume 版本。 我現在使用水槽 1.7.0 https://flume.apache.org/releases/1.7.0.html 但也許更新版本會起作用,我不想分解它:)

其次,克隆這個 repo https://github.com/cloudera/cdh-twitter-example 在里面,有一個flume.conf 文件。 我是這樣配置的:

 # Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'TwitterAgent'

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xx
TwitterAgent.sources.Twitter.consumerSecret = xx
TwitterAgent.sources.Twitter.accessToken = xx
TwitterAgent.sources.Twitter.accessTokenSecret = xx
TwitterAgent.sources.Twitter.keywords =  hadoop, bigdata
TwitterAgent.sources.Twitter.locations = -54.5247541978, 2.05338918702, 9.56001631027, 51.1485061713
TwitterAgent.sources.Twitter.language = fr

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/%Y/%m/%d/%H/
#It specifies the File format. File formats that are currently supported are SequenceFile, DataStream or CompressedStream.
#The DataStream will not compress the output file and please don’t set codeC. The CompressedStream requires set hdfs.codeC with an available codeC
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
# It specifies the suffix to append to file. For  eg, .avro 
TwitterAgent.sinks.HDFS.hdfs.fileSuffix = .json
#It specifies the number of events written to file before it is flushed to HDFS.
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
# It specifies the file size to trigger roll, in bytes. If it is equal to 0 then it means never roll based on file size.
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
#It specifies the number of events written to the file before it rolled. If it is equal to 0 then it means never roll based on the number of events.
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
#It specifies the number of seconds to wait before rolling the current file. If it is equal to 0 then it means never roll based on the time interval.
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 60
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000

然后,修改 pom.xml(版本):

<dependency>
      <groupId>org.twitter4j</groupId>
      <artifactId>twitter4j-stream</artifactId>
      <version>3.0.3</version>
    </dependency>

與 maven 打包

cd flume-sources
mvn package

它創建一個 target/flume-sources-1.0-SNAPSHOT.jar 將其復制到您的 <YOUR_FLUME_HOME>/lib

cp ./target/flume-sources-1.0-SNAPSHOT.jar ~/flume/lib

我更改了之前提到的文件中的 CLASSPATH:

FLUME_CLASSPATH="/home/jb/flume/lib/flume-sources-1.0-SNAPSHOT.jar"

將我們剛剛寫入的 conf/flume.conf 復制到 <YOUR_FLUME_HOME>/conf

Thirdly, verify if lib/ twitter4j-core.jar, media-support.jar et stream.jar are in version 3.0.3. 如果不是 go 下載它們。

最后:

cd $FLUME_HOME
bin/flume-ng agent --conf ./conf/ -f ./conf/flume.conf -Dflume.root.logger=INFO,console -n TwitterAgent

哈利路亞:

2020-12-18 02:48:38,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 100 docs
2020-12-18 02:48:40,777 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 200 docs
2020-12-18 02:48:42,017 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 300 docs
2020-12-18 02:48:44,772 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 400 docs
2020-12-18 02:48:46,779 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 500 docs
2020-12-18 02:48:47,875 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 600 docs
2020-12-18 02:48:49,852 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 700 docs
2020-12-18 02:48:52,789 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 800 docs
2020-12-18 02:48:54,791 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 900 docs
2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.onStatus(TwitterSource.java:173)] Processed 1 000 docs
2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:295)] Total docs indexed: 1 000, total skipped docs: 0
2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:297)]     47 docs/second
2020-12-18 02:48:56,805 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:299)] Run took 21 seconds and processed:
2020-12-18 02:48:56,806 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:301)]     0,013 MB/sec sent to index
2020-12-18 02:48:56,807 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:303)]     0,266 MB text sent to index
2020-12-18 02:48:56,807 (Twitter4J Async Dispatcher[0]) [INFO - org.apache.flume.source.twitter.TwitterSource.logStats(TwitterSource.java:305)] There were 0 exceptions ignored:

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM