将 dataframe 写入现有 csv 文件 scala

Question

I have the following data frame with data我有以下带有数据的数据框

+---------------------------+-------+
|sport                      |value  |
+---------------------------+-------+
|table tennis               |12     |
+---------------------------+-------+

and I want to write this dataframe into an exisiting csv file.我想将此 dataframe 写入现有的 csv 文件。 and my code is follows我的代码如下

val existingSparkSession = SparkSession.builder().getOrCreate()
    import existingSparkSession.implicits._
    val data = Seq((inputSentence, analysedCategoryLabel))
    val emojiRdd = existingSparkSession.sparkContext.parallelize(data)
    val finalEmojiAnalyzedDataFrame = emojiRdd.toDF("sport", "value")
    finalEmojiAnalyzedDataFrame.write.format("com.springml.spark.sftp").option("delimiter",";").mode(SaveMode.Append).save("./src/main/resources/sportsData.csv")

But this code isn't working and im getting the following error.但是这段代码不起作用，我收到以下错误。

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.springml.spark.sftp.

To write into a csv file continuously do I need to use com.springml.spark.sftp?要连续写入 csv 文件，我需要使用 com.springml.spark.sftp 吗？ are there any other way of doing it?还有其他方法吗？ If this is the only way do I need to add this library import into my build file in scala?如果这是我需要将此库导入添加到 scala 中的构建文件中的唯一方法？

Answer 1

It's not possible to save to a single file across multiple partitions because Spark is meant to be a distributed processing library and write to a shared filesystem无法跨多个分区保存到单个文件，因为 Spark 旨在成为分布式处理库并写入共享文件系统

In other words, the output path needs to be a directory也就是说output路径需要是目录

Otherwise, you would need to collect the dataframe to row objects, then use non-Spark methods to write/append to a single, local file否则，您需要将 dataframe 收集到行对象，然后使用非 Spark 方法写入/附加到单个本地文件

将 dataframe 写入现有 csv 文件 scala

问题描述

1 个解决方案

解决方案1
0 2021-03-09 01:40:35

将 dataframe 写入现有 csv 文件 scala

问题描述

1 个解决方案

解决方案1 0 2021-03-09 01:40:35

解决方案1
0 2021-03-09 01:40:35