Scala-将数据帧作为二进制文件写入文件

Question

I have a hive table of type parquet, with column Content storing various documents as base64 encoded. 我有一个镶木地板类型的配置单元表，其中“ Content列存储以base64编码的各种文档。

Now, I need to read that column and write into a file in HDFS, so that the base64 column will be converted back to a document for each row. 现在，我需要读取该列并写入HDFS中的文件，以便将base64列转换为每一行的文档。

val profileDF = sqlContext.read.parquet("/hdfspath/profiles/");
profileDF.registerTempTable("profiles")
val contentsDF = sqlContext.sql(" select unbase64(contents) as contents from profiles where file_name'file1'")

Now that contentDF is storing the binary format of a document as a row, which I need to write to a file. 现在， contentDF将文档的二进制格式存储为一行，我需要将其写入文件。 Tried different options but couldn't get back the dataframe content to a file. 尝试了其他选项，但无法将数据框内容恢复到文件中。

Appreciate any help regarding this. 感谢有关此的任何帮助。

Answer 1

I would suggest save as parquet: 我建议另存为实木复合地板：

https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/sql/DataFrameWriter.html#parquet(java.lang.String) https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/sql/DataFrameWriter.html#parquet(java.lang.String）

Or convert to RDD and do save as object: 或转换为RDD并保存为对象：

https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/rdd/RDD.html#saveAsObjectFile(java.lang.String) https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/rdd/RDD.html#saveAsObjectFile(java.lang.String）

Scala-将数据帧作为二进制文件写入文件

问题描述

1 个解决方案

解决方案1
0 2018-09-10 21:57:18

Scala-将数据帧作为二进制文件写入文件

问题描述

1 个解决方案

解决方案1 0 2018-09-10 21:57:18

解决方案1
0 2018-09-10 21:57:18