Print the content of streams (Spark streaming) in Windows system

Question

I want just to print the content of streams to console. I wrote the following code but it does not print anything. Anyone can help me to read text file as stream in Spark?? Is there a problem related to Windows system?

public static void main(String[] args) throws Exception {

     SparkConf sparkConf = new SparkConf().setAppName("My app")
        .setMaster("local[2]")
        .setSparkHome("C:\\Spark\\spark-1.5.1-bin-hadoop2.6")
        .set("spark.executor.memory", "2g");

    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

    JavaDStream<String> dataStream = jssc.textFileStream("C://testStream//copy.csv");
    dataStream.print();

    jssc.start();
    jssc.awaitTermination();
}

UPDATE: The content of copy.csv is

0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0

Answer 1

textFileStream is for Monitoring the hadoop Compatible Directories. This operation will watch the provided directory and as you add new files in the provided directory it will read/ stream the data from the newly added files.

You cannot read text/ csv files using textFileStream or rather I would say that you do not need streaming in case you are just reading the files.

My Suggestion would be to monitor some directory (may be HDFS or local file system) and then add files and capture the content of these new files using textFileStream .

May be in your code may be you can replace "C://testStream//copy.csv" with C://testStream" and once your Spark Streaming job is up and running then add file copy.csv to C://testStream folder and see the output on Spark Console.

OR

may be you can write another command line Scala/ Java program which read the files and throw the content over the Socket (at a certain PORT#) and next you can leverage socketTextStream for capturing and reading the data. Once you have read the data, you further apply other transformation or output operations.

You can also think of leveraging Flume too

Refer to API Documentation for more details

Answer 2

This worked for me on Windows 7 and Spark 1.6.3: (removing the rest of code, important one is how to define the folder to monitor)

val ssc = ...
val lines = ssc.textFileStream("file:///D:/tmp/data")
...
print

...

This monitors directory D:/tmp/data, ssc is my streaming context

Steps:

Create a file say 1.txt in D:/tmp/data
Enter some text
Start the spart application
Rename the file to data.txt (i believe any arbitrary name will do as long as it's changed while directory is monitored by spark)

One other thing I noticed is that I had to change the line separator to Unix style (used Notepad++) otherwise file wasn't getting picked up.

Answer 3

试试下面的代码，它的工作原理：

JavaDStream<String> dataStream = jssc.textFileStream("file:///C:/testStream/");

Print the content of streams (Spark streaming) in Windows system

Question

3 answers

solution1
4 2016-02-02 08:51:34

solution2
1 2017-01-05 22:21:38

solution3
0 2018-10-30 06:56:43

Print the content of streams (Spark streaming) in Windows system

Question

3 answers

solution1 4 2016-02-02 08:51:34

solution2 1 2017-01-05 22:21:38

solution3 0 2018-10-30 06:56:43

solution1
4 2016-02-02 08:51:34

solution2
1 2017-01-05 22:21:38

solution3
0 2018-10-30 06:56:43