[英]Spark streaming one log file with java doesn't generate any output
I want to stream a log file by java and spark.我想通过 java 和 spark 流式传输日志文件。 My code is simple:
我的代码很简单:
String base = "c:/test";
SparkConf conf = new SparkConf().setAppName("First_App").setMaster("local[2]");
JavaStreamingContext ssc= new JavaStreamingContext(conf, Seconds.apply(1));
JavaDStream<String> line = ssc.textFileStream(base);
line.map(new Function<String, Integer>()
{
@Override
public Integer call(String v1) throws Exception
{
System.out.println(v1);
int l = v1.length();
return l;
}
});
line.print();
ssc.start();
ssc.awaitTermination();
In c:/test
is a log file that generates with log back.在
c:/test
是一个日志文件,它生成日志返回。 Its content is :它的内容是:
INFO:Data=Do Save Entity
INFO:Data=Do Delete Entity
but when I run my app, following result print in the console:但是当我运行我的应用程序时,控制台中会打印以下结果:
18/02/18 19:55:30 INFO JobScheduler: Added jobs for time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Starting job streaming job 1518971130000 ms.0 from job set of time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Finished job streaming job 1518971130000 ms.0 from job set of time 1518971130000 ms
18/02/18 19:55:30 INFO JobScheduler: Total delay: 0.291 s for time 1518971130000 ms (execution: 0.002 s)
-------------------------------------------
Time: 1518971130000 ms
-------------------------------------------
18/02/18 19:55:30 INFO FileInputDStream: Cleared 0 old files that were older than 1518971070000 ms:
18/02/18 19:55:30 INFO ReceivedBlockTracker: Deleting batches:
18/02/18 19:55:30 INFO InputInfoTracker: remove old batch metadata:
18/02/18 19:55:31 INFO FileInputDStream: Finding new files took 16 ms
18/02/18 19:55:31 INFO FileInputDStream: New files at time 1518971131000 ms:
-------------------------------------------
Time: 1518971131000 ms
-------------------------------------------
and this output continues.这个输出还在继续。 My aim is simple: stream a log file and then print its content in the console, of course, this is temporary because finally, I want to save the file in the database.
我的目标很简单:流式传输日志文件,然后在控制台中打印其内容,当然,这是临时的,因为最后,我想将文件保存在数据库中。
The reason why you don't see any output is that JavaStreamingContext.textFileStream
monitors a directory for newly created files ( docs ) and does not react on changed files.您看不到任何输出的原因是
JavaStreamingContext.textFileStream
监视新创建的文件 ( docs ) 的目录并且不对更改的文件做出反应。 Some ideas how to deal with the situation you describe are mentioned here .这里提到了如何处理你所描述的情况的一些想法。
A second (unrelated) issue in your code is that the call to line.map
returns a new JavaDStream on which you should call print
to see the result of the transformation.您的代码中的第二个(不相关)问题是对
line.map
的调用返回一个新的 JavaDStream,您应该在该 JavaDStream 上调用print
以查看转换的结果。 Calling print directly on line
will show you the contents of the stream without transformation.直接
line
调用 print 将向您显示未经转换的流的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.