简体   繁体   中英

How to use Azure databricks to read and write excel data with multiple sheets from ADLS gen 2

I want to implement the below logic in Azure databricks using pyspark. I have a below file which has multiple sheets in it. the file is present on adls gen 2. I want to read the data of all sheets into a different file and write the file to some location in adls gen 2 itself.

Note: All sheet has same schema ( Id, Name)

My final output file should have data from all the sheets. Also I need to create an additional column which stores the sheetName info

在此处输入图片说明

在此处输入图片说明

You can use the following logic

  • Using Pandas to read multiple worksheets of the same workbook link
  • concat the multiple dataframes in Pandas and make it single data frame link
  • Convert the Panda dataframe into pyspark dataframe . link
  • Apply Business logic which you want to implement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM