[英]Reading XML File Through Dataframe
I have XML file like below format.我有如下格式的 XML 文件。
<nt:vars>
<nt:var id="1.3.0" type="TimeStamp"> 89:19:00.01</nt:var>
<nt:var id="1.3.1" type="OBJECT ">1.9.5.67.2</nt:var>
<nt:var id="1.3.9" type="STRING">AB-CD-EF</nt:var>
</nt:vars>
I built a dataframe on it using below code.我使用以下代码在其上构建了 dataframe。 Though the code is displaying 3 rows and retrieving id and type fields it'snot displaying actual value which is 89:19:00.01, 1.9.5.67.2, AB-CD-EF虽然代码显示 3 行并检索 id 和 type 字段,但它没有显示实际值,即 89:19:00.01、1.9.5.67.2、AB-CD-EF
spark.read.format("xml").option("rootTag","nt:vars").option("rowTag","nt:var").load("/FileStore/tables/POC_DB.xml").show()
Could you please help me if I have to add any other options to above line to bring the values as well please.如果我必须在上面的行中添加任何其他选项以带来值,请您帮助我。
You can instead specify rowTag
as nt:vars
:您可以改为将rowTag
指定为nt:vars
:
df = spark.read.format("xml").option("rowTag","nt:vars").load("file.xml")
df.printSchema()
root
|-- nt:var: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _VALUE: string (nullable = true)
| | |-- _id: string (nullable = true)
| | |-- _type: string (nullable = true)
df.show(truncate=False)
+-------------------------------------------------------------------------------------------+
|nt:var |
+-------------------------------------------------------------------------------------------+
|[[ 89:19:00.01, 1.3.0, TimeStamp], [1.9.5.67.2, 1.3.1, OBJECT ], [AB-CD-EF, 1.3.9, STRING]]|
+-------------------------------------------------------------------------------------------+
And to get the values as separate rows, you can explode the array of structs:并且要将值作为单独的行获取,您可以分解结构数组:
df.select(F.explode('nt:var')).show(truncate=False)
+--------------------------------+
|col |
+--------------------------------+
|[ 89:19:00.01, 1.3.0, TimeStamp]|
|[1.9.5.67.2, 1.3.1, OBJECT ] |
|[AB-CD-EF, 1.3.9, STRING] |
+--------------------------------+
Or if you just want the values:或者,如果您只想要这些值:
df.select(F.explode('nt:var._VALUE')).show()
+------------+
| col|
+------------+
| 89:19:00.01|
| 1.9.5.67.2|
| AB-CD-EF|
+------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.