简体   繁体   中英

Spark : converting Array[Byte] data to RDD or DataFrame

I have data in the form of Array[Byte] which I want to convert into Spark RDD or DataFrame so that I can write my data directly into a Google bucket in the form of a file. I am not able to write Array[Byte] data into Google bucket directly. So looking for this conversion.

My below code is able to write data into Local FS, but not Google bucket

val encrypted = encrypt(original, readPublicKey(pubKey), outFile, true, true)
val dfis = new FileOutputStream(outFile)
dfis.write(encrypted)
dfis.close()

def encrypt(clearData: Array[Byte], encKey: PGPPublicKey, fileName: String, withIntegrityCheck: Boolean, armor: Boolean): Array[Byte] = {
...
}

So could someone please help me here in converting Array[Byte] data to RDD or DataFrame? I am using Scala.

Thanks for your help in advance.

just use .toDF() or .toDF().rdd

scala> val arr: Array[Byte] = Array(192.toByte, 168.toByte, 1.toByte, 4.toByte)
arr: Array[Byte] = Array(-64, -88, 1, 4)

scala> val df = arr.toSeq.toDF()
df: org.apache.spark.sql.DataFrame = [value: tinyint]

scala> df.show()
+-----+
|value|
+-----+
|  -64|
|  -88|
|    1|
|    4|
+-----+


scala> df.printSchema()
root
 |-- value: byte (nullable = false)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM