简体   繁体   English

Spark 使用 java 流式传输一个日志文件不会生成任何输出

[英]Spark streaming one log file with java doesn't generate any output

I want to stream a log file by java and spark.我想通过 java 和 spark 流式传输日志文件。 My code is simple:我的代码很简单:

  String base = "c:/test";

    SparkConf conf = new SparkConf().setAppName("First_App").setMaster("local[2]");
    JavaStreamingContext ssc= new JavaStreamingContext(conf, Seconds.apply(1));

    JavaDStream<String> line = ssc.textFileStream(base);
    line.map(new Function<String, Integer>()
    {
        @Override
        public Integer call(String v1) throws Exception
        {
            System.out.println(v1);
            int l =  v1.length();
            return l;
        }
    });

    line.print();

    ssc.start();
    ssc.awaitTermination();

In c:/test is a log file that generates with log back.c:/test是一个日志文件,它生成日志返回。 Its content is :它的内容是:

INFO:Data=Do Save Entity
INFO:Data=Do Delete Entity

but when I run my app, following result print in the console:但是当我运行我的应用程序时,控制台中会打印以下结果:

18/02/18 19:55:30 INFO JobScheduler: Added jobs for time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Starting job streaming job 1518971130000 ms.0 from job set of time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Finished job streaming job 1518971130000 ms.0 from job set of time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Total delay: 0.291 s for time 1518971130000 ms (execution: 0.002 s)
-------------------------------------------
Time: 1518971130000 ms
-------------------------------------------

18/02/18 19:55:30 INFO FileInputDStream: Cleared 0 old files that were older than 1518971070000 ms: 
18/02/18 19:55:30 INFO ReceivedBlockTracker: Deleting batches: 
18/02/18 19:55:30 INFO InputInfoTracker: remove old batch metadata: 
18/02/18 19:55:31 INFO FileInputDStream: Finding new files took 16 ms
18/02/18 19:55:31 INFO FileInputDStream: New files at time 1518971131000 ms:

-------------------------------------------
Time: 1518971131000 ms
-------------------------------------------

and this output continues.这个输出还在继续。 My aim is simple: stream a log file and then print its content in the console, of course, this is temporary because finally, I want to save the file in the database.我的目标很简单:流式传输日志文件,然后在控制台中打印其内容,当然,这是临时的,因为最后,我想将文件保存在数据库中。

The reason why you don't see any output is that JavaStreamingContext.textFileStream monitors a directory for newly created files ( docs ) and does not react on changed files.您看不到任何输出的原因是JavaStreamingContext.textFileStream监视新创建的文件 ( docs ) 的目录并且不对更改的文件做出反应。 Some ideas how to deal with the situation you describe are mentioned here .这里提到如何处理你所描述的情况的一些想法。

A second (unrelated) issue in your code is that the call to line.map returns a new JavaDStream on which you should call print to see the result of the transformation.您的代码中的第二个(不相关)问题是对line.map的调用返回一个新的 JavaDStream,您应该在该 JavaDStream 上调用print以查看转换的结果。 Calling print directly on line will show you the contents of the stream without transformation.直接line调用 print 将向您显示未经转换的流的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM