简体   繁体   English

如何使用 Spark Streaming Java API 将推特推文写入 HDFS

[英]How to write twitter tweets to HDFS using Spark Streaming Java API

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);

// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
        new Function<Status, String>() {
            public String call(Status status) { return status.getText(); }
        }
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");

I am able to fetch the Twitter tweets but I am getting error while writing to HDFS.我能够获取 Twitter 推文,但在写入 HDFS 时出现错误。

Can some one help me in saving the tweets to HDFS using Java有人可以帮助我使用 Java 将推文保存到 HDFS

Here is the Error I am getting:这是我得到的错误:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project SparkTwitterHelloWorldExample: Compilation failure [ERROR] /home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src/main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17] cannot find symbol [ERROR] symbol : method saveAsHadoopFiles(java.lang.String,java.lang.String) [ERROR] location: class org.apache.spark.streaming.api.java.JavaDStream [错误] 无法执行目标 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project SparkTwitterHelloWorldExample: Compilation failure [ERROR] /home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src /main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17] 找不到符号 [错误] 符号:方法 saveAsHadoopFiles(java.lang.String,java.lang.String) [错误] 位置:类 org.apache .spark.streaming.api.java.JavaDStream

You need to use rather saveAsTextFile() method.您需要使用saveAsTextFile()方法。 Hadoop output formats are applicable only to JavaPairDStream (it requires key and value ). Hadoop 输出格式仅适用于JavaPairDStream (它需要keyvalue )。

The solution is:解决办法是:

statuses.dstream().saveAsTextFiles(prefix, suffix);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM