[英]how to read and write (update) the same file using spark (scala)
I want to update a CSV file depending on some condition, for that I read the file, made all the needed update, however when I tried to write it I'm getting a FileNotFoundException
.我想根据某些条件更新一个 CSV 文件,为此我读取了文件,进行了所有需要的更新,但是当我尝试编写它时,我收到了
FileNotFoundException
。
I think that it is due to the writing process, because when I access the path (where the input/output file were located) I find it empty.我认为这是由于写入过程,因为当我访问路径(输入/输出文件所在的位置)时,我发现它是空的。
Is there a better way to update a file?有没有更好的方法来更新文件? And if not, how can I resolve the
FileNotFoundException
error?如果没有,我该如何解决
FileNotFoundException
错误?
you can do it either by writing a temporary table/csv or using checkpointing
:您可以通过编写临时表/ csv 或使用
checkpointing
来做到这一点:
This works :这有效:
sparkSession.sparkContext.setCheckpointDir("tmp")
ss.read.csv("test.csv") // read existing csv
.withColumn("test",lit(1)) // modify
.checkpoint(eager = true) // checkpoint, write to disk
.write.mode("overwrite")
.csv("test.csv") // write to same location
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.