简体   繁体   English

如何使用 sparkstreaming 读取文件并使用 Scala 写入简单文件?

[英]How to read a file using sparkstreaming and write to a simple file using Scala?

I'm trying to read a file using a scala SparkStreaming program.我正在尝试使用 scala SparkStreaming 程序读取文件。 The file is stored in a directory on my local machine and trying to write it as a new file on my local machine itself.该文件存储在我本地机器上的一个目录中,并试图将它写为本地机器上的一个新文件。 But whenever I write my stream and store it as parquet I end up getting blank folders.但是每当我编写流并将其存储为镶木地板时,我最终都会得到空白文件夹。

This is my code :这是我的代码:

 Logger.getLogger("org").setLevel(Level.ERROR)
 val spark = SparkSession
             .builder()
             .master("local[*]")
             .appName("StreamAFile")
             .config("spark.sql.warehouse.dir", "file:///C:/temp")
             .getOrCreate()
 
         
 import spark.implicits._            
 val schemaforfile = new StructType().add("SrNo",IntegerType).add("Name",StringType).add("Age",IntegerType).add("Friends",IntegerType)
             
 val file = spark.readStream.schema(schemaforfile).csv("C:\\SparkScala\\fakefriends.csv")  

 file.writeStream.format("parquet").start("C:\\Users\\roswal01\\Desktop\\streamed") 
 
 spark.stop()
 

Is there anything missing in my code or anything in the code where I've gone wrong?我的代码中是否缺少任何内容或我出错的代码中的任何内容?

I also tried reading this file from a hdfs location but the same code ends up not creating any output folders on my hdfs.我还尝试从 hdfs 位置读取此文件,但相同的代码最终没有在我的 hdfs 上创建任何输出文件夹。

You've mistake here:你在这里弄错了:

val file = spark.readStream.schema(schemaforfile).csv("C:\\SparkScala\\fakefriends.csv")  

csv() function should have directory path as an argument. csv() 函数应该有目录路径作为参数。 It will scan this directory and read all new files when they will be moved into this directory它将扫描此目录并读取所有将移动到此目录中的新文件

For checkpointing, you should add对于检查点,您应该添加

.option("checkpointLocation", "path/to/HDFS/dir")

For example:例如:

val query = file.writeStream.format("parquet")
    .option("checkpointLocation", "path/to/HDFS/dir")
    .start("C:\\Users\\roswal01\\Desktop\\streamed") 

query.awaitTermination()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM