[英]reading XML column in dataframe in spark
I have following dataframe我有关注 dataframe
+-----------+------+------------------------------------------------------------------
|ID |xml |
+-----------+------+-----------------------------------------------------------------
|1 |<root><line><colX>1</colX></line><line><colX>2</colX></line></root> |
|2 |<root><line><colX>3</colX></line><line><colX>4</colX></line> </root>
+-----------+------+-----------------------------------------------------------------
How do I convert it to following in raw spark sql using sparkXML from databricks如何使用 databricks 中的 sparkXML 将其转换为原始 spark sql 中的以下内容
+-----------+------+------------------------------------------------------------------
|ID |colx |
+-----------+------+-----------------------------------------------------------------
|1 | 1
1 2
|2 | 3
2 3
+-----------+------+-----------------------------------------------------------------
You can use xpath to select the element into an array and explode the resulting array:您可以使用 xpath 到 select 将元素放入一个数组并分解生成的数组:
df2 = df.selectExpr('ID', "explode(xpath(xml, 'root/line/colX/text()')) as colx")
df2.show()
+---+----+
| ID|colx|
+---+----+
| 1| 1|
| 1| 2|
| 2| 3|
| 2| 4|
+---+----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.