简体   繁体   English

SparkR,将一列嵌套的JSON字符串拆分为多个列

[英]SparkR, split a column of nested JSON strings into columns

I am coming from R, new to SparkR, and trying to split a SparkDataFrame column of JSON strings into respective columns. 我来自R,是SparkR的新手,并尝试将JSON字符串的SparkDataFrame列拆分为相应的列。 The columns in the Spark DataFrame are arrays with a schema like this: Spark DataFrame中的列是具有以下架构的数组:

> printSchema(tst)
root
 |-- FromStation: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- ToStation: array (nullable = true)
 |    |-- element: string (containsNull = true)

If I look at the data in the viewer, View(head(tst$FromStation)) I can see the SparkDataFrame's FromStation column has a form like this in each row: 如果查看View(head(tst$FromStation))器中的数据View(head(tst$FromStation))我可以看到SparkDataFrame的FromStation列在每一行中都有这样的形式:

list("{\"Code\":\"ABCDE\",\"Name\":\"StationA\"}", "{\"Code\":\"WXYZP\",\"Name\":\"StationB\"}", "{...

Where the ... indicates the pattern repeats an unknown amount of times. ...表示模式重复的次数未知。

My Question 我的问题

How do I extract this information and put it in a flat dataframe? 如何提取此信息并将其放入平面数据框中? Ideally, I would like to make a FromStationCode and FromStationName column for each observation in the nested array column. 理想情况下,我想为嵌套数组列中的每个观察结果创建一个FromStationCodeFromStationName列。 I have tried various combinations of explode and getItem ...but to no avail. 我尝试了explodegetItem各种组合...但无济于事。 I keep getting a data type mismatch error. 我不断收到数据类型不匹配错误。 I've searched through examples of other people with this challenge in Spark, but SparkR examples are more scarce. 我在Spark中搜索了其他遇到此挑战的人的示例,但SparkR的示例更为稀缺。 I'm hoping someone with more experience using Spark/SparkR could provide some insight. 我希望有更多使用Spark / SparkR经验的人可以提供一些见解。

Many thanks, nate 非常感谢,天生

I guess you need to convert tst into usual R object 我想你需要将tst转换为普通的R对象

df = collect(tst)

Then you operate with df like with any other R data.frame 然后您可以像其他R data一样使用df进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM