[英]convert spark column of nested json into string
I'm relatively new to spark/scala and i've got the following problem i'm hoping you can help me out with.我对 spark/scala 比较陌生,我遇到了以下问题,希望您能帮我解决。 In order for my hashing algorithm to work i need to convert an array type field into a string.为了让我的散列算法工作,我需要将数组类型字段转换为字符串。 The schema below is similar to what i'm dealing with:下面的架构类似于我正在处理的:
+-----------------+----------------+
| records | Partition |
+-----------------+----------------+
| [{data:[{...}..]| 20200101 |
+-----------------+----------------+
| [{data:[{...}..]| 20200102 |
+-----------------+----------------+
The field types are: {records: array, partition: string}
All i want is to convert the record field into a string, in the vein of:我想要的只是将记录字段转换为字符串,如下所示:
[{data:[{...}..] --> "[{data:[{...}..]"
Any help on this would be greatly appreciated.对此的任何帮助将不胜感激。
Thanks.谢谢。
It would be useful to have your data schema (using df.printSchema
).拥有您的数据模式(使用df.printSchema
)会很有用。 But with simple example I managed to do it using a simple cast like this:但是通过简单的例子,我设法使用这样的简单演员来做到这一点:
import org.apache.spark.sql.types._
val castDf = df.withColumn("ArrrayToString", $"myColName".cast(StringType))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.