简体   繁体   English

Scala-将数据帧作为二进制文件写入文件

[英]Scala - Writing dataframe to a file as binary

I have a hive table of type parquet, with column Content storing various documents as base64 encoded. 我有一个镶木地板类型的配置单元表,其中“ Content列存储以base64编码的各种文档。

Now, I need to read that column and write into a file in HDFS, so that the base64 column will be converted back to a document for each row. 现在,我需要读取该列并写入HDFS中的文件,以便将base64列转换为每一行的文档。

val profileDF = sqlContext.read.parquet("/hdfspath/profiles/");
profileDF.registerTempTable("profiles")
val contentsDF = sqlContext.sql(" select unbase64(contents) as contents from profiles where file_name'file1'")

Now that contentDF is storing the binary format of a document as a row, which I need to write to a file. 现在, contentDF将文档的二进制格式存储为一行,我需要将其写入文件。 Tried different options but couldn't get back the dataframe content to a file. 尝试了其他选项,但无法将数据框内容恢复到文件中。

Appreciate any help regarding this. 感谢有关此的任何帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM