we need to access the XML file located in Azure Data Lake Gen2 and Transform it into a dataframe as shown below.
Sample XML data:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Body>
<ns2:getProjectsResponse
xmlns:ns2="http://www.logic8.com/eq/webservices/generated">
<ns2:Project>
<ns2:fileName>P10001</ns2:fileName>
<ns2:alias>project1</ns2:alias>
</ns2:Project>
<ns2:Project>
<ns2:fileName>P10002</ns2:fileName>
<ns2:alias>project2</ns2:alias>
</ns2:Project>
<ns2:Project>
<ns2:fileName>P10003</ns2:fileName>
<ns2:alias>project3</ns2:alias>
</ns2:Project>
</ns2:getProjectsResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
Expected Dataframe output:
Can anyone help me on this.
Firstly, you need to learn read data from Azure Data Lake Gen2 to Azure databricks.
There are many tutorials you can learn from:
Secondly, about the xml data type, you need to use the use the databricks spark-xml library which @Axel R has provided in comment.
xmldata = spark.read.format('xml').option("rootTag","note").load('dbfs:/mnt/mydatafolder/xmls/note.xml')
Please reference: How can I read a XML file Azure Databricks Spark .
Combine these documents, I think you can figure out you problem. I don't know much about Azure databricks, I'm sorry that I can't test for you.
Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.