简体   繁体   English

如何使用spark(scala)读写(更新)同一个文件

[英]how to read and write (update) the same file using spark (scala)

I want to update a CSV file depending on some condition, for that I read the file, made all the needed update, however when I tried to write it I'm getting a FileNotFoundException .我想根据某些条件更新一个 CSV 文件,为此我读取了文件,进行了所有需要的更新,但是当我尝试编写它时,我收到了FileNotFoundException

I think that it is due to the writing process, because when I access the path (where the input/output file were located) I find it empty.我认为这是由于写入过程,因为当我访问路径(输入/输出文件所在的位置)时,我发现它是空的。

Is there a better way to update a file?有没有更好的方法来更新文件? And if not, how can I resolve the FileNotFoundException error?如果没有,我该如何解决FileNotFoundException错误?

you can do it either by writing a temporary table/csv or using checkpointing :您可以通过编写临时表/ csv 或使用checkpointing来做到这一点:

This works :这有效:

sparkSession.sparkContext.setCheckpointDir("tmp")

ss.read.csv("test.csv") // read existing csv
  .withColumn("test",lit(1)) // modify
  .checkpoint(eager = true) // checkpoint, write to disk
  .write.mode("overwrite") 
  .csv("test.csv") // write to same location

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM