[英]How to access XML file from Azure Data Lake Gen2 and transform it into data-frame in Azure Databricks?
we need to access the XML file located in Azure Data Lake Gen2 and Transform it into a dataframe as shown below.我们需要访问位于 Azure Data Lake Gen2 中的 XML 文件并将其转换为如下所示的数据帧。
Sample XML data:示例 XML 数据:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Body>
<ns2:getProjectsResponse
xmlns:ns2="http://www.logic8.com/eq/webservices/generated">
<ns2:Project>
<ns2:fileName>P10001</ns2:fileName>
<ns2:alias>project1</ns2:alias>
</ns2:Project>
<ns2:Project>
<ns2:fileName>P10002</ns2:fileName>
<ns2:alias>project2</ns2:alias>
</ns2:Project>
<ns2:Project>
<ns2:fileName>P10003</ns2:fileName>
<ns2:alias>project3</ns2:alias>
</ns2:Project>
</ns2:getProjectsResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Expected Dataframe output:预期数据帧输出:
Can anyone help me on this.谁可以帮我这个事。
Firstly, you need to learn read data from Azure Data Lake Gen2 to Azure databricks.首先,您需要学习从 Azure Data Lake Gen2 读取数据到 Azure databricks。
There are many tutorials you can learn from:您可以从许多教程中学习:
Secondly, about the xml data type, you need to use the use the databricks spark-xml library which @Axel R has provided in comment.其次,关于xml数据类型,您需要使用@Axel R在评论中提供的databricks spark-xml库。
xmldata = spark.read.format('xml').option("rootTag","note").load('dbfs:/mnt/mydatafolder/xmls/note.xml')
Please reference: How can I read a XML file Azure Databricks Spark .请参考: 如何读取 XML 文件 Azure Databricks Spark 。
Combine these documents, I think you can figure out you problem.结合这些文件,我想你可以找出你的问题。 I don't know much about Azure databricks, I'm sorry that I can't test for you.我对 Azure 数据块了解不多,很抱歉无法为您测试。
Hope this helps.希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.