简体   繁体   English

将嵌套json的spark列转换为字符串

[英]convert spark column of nested json into string

I'm relatively new to spark/scala and i've got the following problem i'm hoping you can help me out with.我对 spark/scala 比较陌生,我遇到了以下问题,希望您能帮我解决。 In order for my hashing algorithm to work i need to convert an array type field into a string.为了让我的散列算法工作,我需要将数组类型字段转换为字符串。 The schema below is similar to what i'm dealing with:下面的架构类似于我正在处理的:

+-----------------+----------------+
| records         | Partition      |
+-----------------+----------------+
| [{data:[{...}..]| 20200101       |
+-----------------+----------------+
| [{data:[{...}..]| 20200102       |
+-----------------+----------------+

The field types are: {records: array, partition: string}

All i want is to convert the record field into a string, in the vein of:我想要的只是将记录字段转换为字符串,如下所示:

[{data:[{...}..] --> "[{data:[{...}..]"

Any help on this would be greatly appreciated.对此的任何帮助将不胜感激。

Thanks.谢谢。

It would be useful to have your data schema (using df.printSchema ).拥有您的数据模式(使用df.printSchema )会很有用。 But with simple example I managed to do it using a simple cast like this:但是通过简单的例子,我设法使用这样的简单演员来做到这一点:

import org.apache.spark.sql.types._
val castDf = df.withColumn("ArrrayToString", $"myColName".cast(StringType))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM