[英]How to write twitter tweets to HDFS using Spark Streaming Java API
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkTwitterHelloWorldExample");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(60000));
System.setProperty("twitter4j.oauth.consumerKey", consumerKey);
System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret);
System.setProperty("twitter4j.oauth.accessToken", accessToken);
System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret);
String[] filters = new String[] {"Narendra Modi"};
JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc,filters);
// Without filter: Output text of all tweets
JavaDStream<String> statuses = twitterStream.map(
new Function<Status, String>() {
public String call(Status status) { return status.getText(); }
}
);
statuses.print();
statuses.saveAsHadoopFiles("hdfs://HadoopSystem-150s:8020/Spark_Twitter_out","txt");
I am able to fetch the Twitter tweets but I am getting error while writing to HDFS.我能够获取 Twitter 推文,但在写入 HDFS 时出现错误。
Can some one help me in saving the tweets to HDFS using Java有人可以帮助我使用 Java 将推文保存到 HDFS
Here is the Error I am getting:这是我得到的错误:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project SparkTwitterHelloWorldExample: Compilation failure [ERROR] /home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src/main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17] cannot find symbol [ERROR] symbol : method saveAsHadoopFiles(java.lang.String,java.lang.String) [ERROR] location: class org.apache.spark.streaming.api.java.JavaDStream [错误] 无法执行目标 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project SparkTwitterHelloWorldExample: Compilation failure [ERROR] /home/Hadoop/Mani/SparkTwitterHelloWorldExample-master/src /main/java/de/michaelgoettsche/SparkTwitterHelloWorldExample.java:[58,17] 找不到符号 [错误] 符号:方法 saveAsHadoopFiles(java.lang.String,java.lang.String) [错误] 位置:类 org.apache .spark.streaming.api.java.JavaDStream
You need to use rather saveAsTextFile()
method.您需要使用saveAsTextFile()
方法。 Hadoop output formats are applicable only to JavaPairDStream
(it requires key and value ). Hadoop 输出格式仅适用于JavaPairDStream
(它需要key和value )。
The solution is:解决办法是:
statuses.dstream().saveAsTextFiles(prefix, suffix);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.