繁体   English   中英

如何在 Databricks 上编写命令以读取 XML 文件(基于 Python 代码)?

[英]How can I write the command on Databricks to read XML file (based on Python code)?

假设我有这种 xml 结构:

<xml version = "1.0" encoding = "UTF-8"?>
<a>
    <title = "Kurosaki Ichigo"
            tel = "123-456, 234-567"
          class = "Employee"
             id = "EM-02"/>
    <title = "Abarai Renji"
            tel = "345-678, 456-789"
          class = "Employee"
             id = "EM-03"/>
    <title = "Aizen Sosuke"
            tel = "567-890, 012-345"
          class = "Employee"
             id = "EM-04"/>
</a>

我想让 Databricks (Pyspark) 读取这些数据。 如何设置选项和其他参数?

读取 XML 文件。请遵循以下语法:

确保检查并安装com.databricks:spark-xml以读取 XML 文件。

参考1

样品 XML

<catalog>
<book id="bk101" name="vv" address="hyd">
    
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>

读取 XML 文件

df = spark.read.format("com.databricks.spark.xml").option("rowTag", "book").load("dbfs:/FileStore/xmlvalidator.xml")  
    display(df)

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM