简体   繁体   中英

How to extract data from large NetCDF files stored on Azure File shares and send to Azure Web Page

I have downloaded some weather data (wind, wave, etc.) for 40 years duration and stored these data files which comes in a NetCDF format in Azure File shares. I have about 8 TB of total data stored. Each weather parameter, say wind speed for one year for the whole earth surface is saved in one single file which is about 35GB.

Next, I have developed a simple Azure website using Python and Dash package, where the user can define a location (Latitude, Longitude), select a weather parameter, date range and submit the request. See website picture below:

在此处输入图像描述

Now, I would like to be able to run a script once the user clicks the submit button to extract the specified data, save in a csv file, and give a download link to the file.

The Azure Storage File Share client library for Python (azure-storage-file-share) allows connecting to the file and download the file. Since one year of data file is 35GB, downloading each year of data and extracting a single grid point is not an option.

Is there anyway I can run scripts directly on Azure File shares to extract the required data, and then retrieve it from the webpage?

I am trying to avoid the situation where I need to extract data from NetCDF files and push it into a SQL database, which a website can access easily.

You could mount the file store in an Azure VM. Here's an example of how someone did the same locally: Read NetCDF file from Azure file storage

Or, you might want to look at Azure Blob Storage instead. As far as I know, Azure File Storage is really meant as a replacement for networked file shares, like on a local area network. Azure Blob Storage, on the other hand, is much better suited for streaming large files like this from the cloud. There's a good set of examples for when to use which in the Azure Blob Storage overview.

Here's an example of how to download a blob from the Azure Blob Python reference:

# Download the blob to a local file
# Add 'DOWNLOAD' before the .txt extension so you can see both files in the data directory
download_file_path = os.path.join(local_path, str.replace(local_file_name ,'.txt', 'DOWNLOAD.txt'))
print("\nDownloading blob to \n\t" + download_file_path)

with open(download_file_path, "wb") as download_file:
download_file.write(blob_client.download_blob().readall())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM