[英]How can I write the command on Databricks to read XML file (based on Python code)?
假设我有这种 xml 结构:
<xml version = "1.0" encoding = "UTF-8"?>
<a>
<title = "Kurosaki Ichigo"
tel = "123-456, 234-567"
class = "Employee"
id = "EM-02"/>
<title = "Abarai Renji"
tel = "345-678, 456-789"
class = "Employee"
id = "EM-03"/>
<title = "Aizen Sosuke"
tel = "567-890, 012-345"
class = "Employee"
id = "EM-04"/>
</a>
我想让 Databricks (Pyspark) 读取这些数据。 如何设置选项和其他参数?
读取 XML 文件。请遵循以下语法:
确保检查并安装com.databricks:spark-xml
以读取 XML 文件。
样品 XML
<catalog>
<book id="bk101" name="vv" address="hyd">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
读取 XML 文件
df = spark.read.format("com.databricks.spark.xml").option("rowTag", "book").load("dbfs:/FileStore/xmlvalidator.xml")
display(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.