在文本文件中寫入/存儲數據幀

Question

我正在嘗試將dataframe寫入text文件。 如果文件包含單列，那么我可以在文本文件中寫入。 如果文件包含多列，那么我將面臨一些錯誤

文本數據源僅支持單列，您有 2 列。

object replace {

  def main(args:Array[String]): Unit = {

    Logger.getLogger("org").setLevel(Level.ERROR)

    val spark = SparkSession.builder.master("local[1]").appName("Decimal Field Validation").getOrCreate()

    var sourcefile = spark.read.option("header","true").text("C:/Users/phadpa01/Desktop/inputfiles/decimalvalues.txt")

     val rowRDD = sourcefile.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+1) +: indexedRow._1.toSeq)) //adding prgrefnbr               
                         //add column for prgrefnbr in schema
     val newstructure = StructType(Array(StructField("PRGREFNBR",LongType)).++(sourcefile.schema.fields))

     //create new dataframe containing prgrefnbr

     sourcefile = spark.createDataFrame(rowRDD, newstructure)
     val op= sourcefile.write.mode("overwrite").format("text").save("C:/Users/phadpa01/Desktop/op")

  }

}

Answer 1

您可以將數據幀轉換為 rdd 並將行轉換為字符串並將最后一行寫為

 val op= sourcefile.rdd.map(_.toString()).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

已編輯

正如@philantrovert 和@Pravinkumar 指出的那樣，上述內容會在輸出文件中附加[和] ，這是真的。 解決方案是replace它們replace為empty字符

val op= sourcefile.rdd.map(_.toString().replace("[","").replace("]", "")).saveAsTextFile("C:/Users/phadpa01/Desktop/op")

甚至可以使用regex

Answer 2

我建議使用csv或其他分隔格式。 以下是在 Spark 2+ 中以最簡潔/優雅的方式寫入 .tsv 的示例

val tsvWithHeaderOptions: Map[String, String] = Map(
  ("delimiter", "\t"), // Uses "\t" delimiter instead of default ","
  ("header", "true"))  // Writes a header record with column names

df.coalesce(1)         // Writes to a single file
  .write
  .mode(SaveMode.Overwrite)
  .options(tsvWithHeaderOptions)
  .csv("output/path")

Answer 3

您可以另存為文本CSV文件 ( .format("csv") )

結果將是一個 CSV 格式的文本文件，每列將用逗號分隔。

val op = sourcefile.write.mode("overwrite").format("csv").save("C:/Users/phadpa01/Desktop/op")

更多信息可以在火花編程指南中找到

Answer 4

我認為使用“子字符串”更適合我覺得的所有場景。

請檢查以下代碼。

sourcefile.rdd
.map(r =>  { val x = r.toString; x.substring(1, x.length-1)})
.saveAsTextFile("C:/Users/phadpa01/Desktop/op")

Answer 5

我使用 databricks api 將我的 DF 輸出保存到文本文件中。

myDF.write.format("com.databricks.spark.csv").option("header", "true").save("output.csv")

在文本文件中寫入/存儲數據幀

問題描述

5 個解決方案

解決方案1
10 已采納 2017-06-14 07:40:28

解決方案2
4 2017-06-14 16:37:15

解決方案3
2 2017-06-14 07:29:13

解決方案4
2 2019-10-25 03:08:56

解決方案5
1 2017-06-14 10:33:40

在文本文件中寫入/存儲數據幀

問題描述

5 個解決方案

解決方案1 10 已采納 2017-06-14 07:40:28

解決方案2 4 2017-06-14 16:37:15

解決方案3 2 2017-06-14 07:29:13

解決方案4 2 2019-10-25 03:08:56

解決方案5 1 2017-06-14 10:33:40

解決方案1
10 已采納 2017-06-14 07:40:28

解決方案2
4 2017-06-14 16:37:15

解決方案3
2 2017-06-14 07:29:13

解決方案4
2 2019-10-25 03:08:56

解決方案5
1 2017-06-14 10:33:40