简体   繁体   English

如何在不中断的情况下运行火花流

[英]How can i run spark-streaming wthout interruptions

i'm trying to save save tweets from twitter with help twitter-streaming.But i have one problem: my program stops working after some period of time(depends on Batch Interval for 1 millis near 4-5 sec). 我正在尝试使用twitter-streaming.help来保存来自Twitter的保存推文。但是我有一个问题:我的程序在一段时间后停止工作(取决于批处理间隔1毫秒,接近4-5秒)。 So, could you help me with this problem solving). 所以,您能帮我解决这个问题吗? Tell me please what is wrong? 告诉我怎么了?

When batch interval near 100 millis i see some records like 当批次间隔接近100毫秒时,我会看到一些记录,例如

19/08/06 23:45:26 INFO BlockRDD: Removing RDD 103 from persistence list
19/08/06 23:45:26 INFO BlockManager: Removing RDD 103
19/08/06 23:45:26 INFO TwitterInputDStream: Removing blocks of RDD BlockRDD[103] at createStream at Twitter.java:35 of time 1565124324340 ms
19/08/06 23:45:26 INFO ReceivedBlockTracker: Deleting batches: 1565124324320 ms
19/08/06 23:45:26 INFO InputInfoTracker: remove old batch metadata: 1565124324320 ms
-------------------------------------------
Time: 1565124325500 ms

When batch interval is "big" and any data isn't avaible, i just see message abou Spark UI starting and finished. 当批处理间隔为“大”且任何数据均不可用时,我只会在Spark UI看到开始和结束的消息。

package TwitterAnalysis;

import org.apache.spark.*;
import org.apache.spark.storage.StorageLevel;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.twitter.*;

import twitter4j.Status;



public class Twitter {

    private static void setTwitterOAuth() {
        System.setProperty("twitter4j.oauth.consumerKey", TwitterOAuthKey.consumerKey);
        System.setProperty("twitter4j.oauth.consumerSecret", TwitterOAuthKey.consumerSecret);
        System.setProperty("twitter4j.oauth.accessToken", TwitterOAuthKey.accessToken);
        System.setProperty("twitter4j.oauth.accessTokenSecret", TwitterOAuthKey.accessTokenSecret);
    }



    public static void main(String [] args) {

        setTwitterOAuth();

        SparkConf conf = new SparkConf().setMaster("local[*]")
                                         .setAppName("SparkTwitter");

      //  JavaSparkContext sparkContext = new JavaSparkContext(conf);
        JavaStreamingContext jssc = new JavaStreamingContext(conf, new Duration(10000));


        JavaReceiverInputDStream<Status> twitterStream = TwitterUtils.createStream(jssc);

        //Stream that contains just tweets in english
        JavaDStream<Status> enTweetsDStream=twitterStream.filter((status) -> "en".equalsIgnoreCase(status.getLang()));
        enTweetsDStream.persist(StorageLevel.MEMORY_AND_DISK());


        enTweetsDStream.print();
        jssc.start();


    }

}

根据此答案: Spark 2.0.0的Twitter流驱动程序不再可用,在Spark 2.0及更高版本上没有可用的twitter-streaming-driver。解决方案请选择Spark的早期版本)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka Spark-Streaming偏移问题 - Kafka Spark-Streaming offset issue 启动Java Spark Streaming应用程序时发生异常 - Exception while starting java spark-streaming application 如何更新火花流中的广播变量? - How can I update a broadcast variable in spark streaming? 如何只编译Spark Core和Spark Streaming(以便我可以获得Streaming的单元测试实用程序)? - How can I compile only the Spark Core and Spark Streaming (so that I can get unit test utilities of Streaming)? 如何使用Java在Spark结构化流中检查从Kafka获取数据? - How can I check I get data from Kafka in Spark-structured-streaming with Java? Kafka Spark Streaming:如何在 spark steaming 创建的多个表上运行 Spark SQL 查询? - Kafka Spark Streaming: How to run Spark SQL query on multiple tables created by spark steaming? 如何使Spark Streaming计算单元测试中文件中的单词? - How can I make Spark Streaming count the words in a file in a unit test? 如何从Eclipse远程运行Apache Spark? - How can I run an Apache Spark remote from eclipse? 将其发送到Spark Streaming时如何保持JSON结构 - How to keep a JSON structure when I send it to Spark Streaming 无法运行 JAR - Spark Twitter 流式传输 Java - Unable to run JAR - Spark Twitter Streaming with Java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM