简体   繁体   English

在火花中读取 dataframe 中的 XML 列

[英]reading XML column in dataframe in spark

I have following dataframe我有关注 dataframe

+-----------+------+------------------------------------------------------------------
|ID         |xml                  |
+-----------+------+-----------------------------------------------------------------
|1          |<root><line><colX>1</colX></line><line><colX>2</colX></line></root>  |
|2          |<root><line><colX>3</colX></line><line><colX>4</colX></line> </root>
+-----------+------+-----------------------------------------------------------------

How do I convert it to following in raw spark sql using sparkXML from databricks如何使用 databricks 中的 sparkXML 将其转换为原始 spark sql 中的以下内容

+-----------+------+------------------------------------------------------------------
|ID         |colx                  |
+-----------+------+-----------------------------------------------------------------
|1          | 1
 1            2 
|2          | 3
 2            3
+-----------+------+-----------------------------------------------------------------

You can use xpath to select the element into an array and explode the resulting array:您可以使用 xpath 到 select 将元素放入一个数组并分解生成的数组:

df2 = df.selectExpr('ID', "explode(xpath(xml, 'root/line/colX/text()')) as colx")

df2.show()
+---+----+
| ID|colx|
+---+----+
|  1|   1|
|  1|   2|
|  2|   3|
|  2|   4|
+---+----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM