[英]Read .nc files from Azure Datalake Gen2 in Azure Databricks
Trying to read .nc (netCDF4) files in Azure Databricks.尝试读取 Azure Databricks 中的 .nc (netCDF4) 文件。
Never worked with .nc files从未使用过 .nc 文件
/mnt/eco_dailyRain
"将上述文件挂载到“ /mnt/eco_dailyRain
”的/mnt/eco_dailyRain
Can list the content of mount using dbutils.fs.ls("/mnt/eco_dailyRain")
OUTPUT:可以使用dbutils.fs.ls("/mnt/eco_dailyRain")
OUTPUT 列出 mount 的内容:
Out[76]: [FileInfo(path='dbfs:/mnt/eco_dailyRain/2000.daily_rain.nc', name='2000.daily_rain.nc', size=429390127), FileInfo(path='dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc', name='2001.daily_rain.nc', size=428217143), FileInfo(path='dbfs:/mnt/eco_dailyRain/2002.daily_rain.nc', name='2002.daily_rain.nc', size=428218181), FileInfo(path='dbfs:/mnt/eco_dailyRain/2003.daily_rain.nc', name='2003.daily_rain.nc', size=428217139), FileInfo(path='dbfs:/mnt/eco_dailyRain/2004.daily_rain.nc', name='2004.daily_rain.nc', size=429390143), FileInfo(path='dbfs:/mnt/eco_dailyRain/2005.daily_rain.nc', name='2005.daily_rain.nc', size=428217137), FileInfo(path='dbfs:/mnt/eco_dailyRain/2006.daily_rain.nc', name='2006.daily_rain.nc', size=428217127), FileInfo(path='dbfs:/mnt/eco_dailyRain/2007.daily_rain.nc', name='2007.daily_rain.nc', size=428217143), FileInfo(path='dbfs:/mnt/eco_dailyRain/2008.daily_rain.nc', name='2008.daily_rain.nc', size=429390137), FileInfo(path='dbfs:/mnt/eco_dailyRain/2009.daily_rain.nc', name='2009.daily_rain.nc', size=428217127), FileInfo(path='dbfs:/mnt/eco_dailyRain/2010.daily_rain.nc', name='2010.daily_rain.nc', size=428217134), FileInfo(path='dbfs:/mnt/eco_dailyRain/2011.daily_rain.nc', name='2011.daily_rain.nc', size=428218181), FileInfo(path='dbfs:/mnt/eco_dailyRain/2012.daily_rain.nc', name='2012.daily_rain.nc', size=429390127), FileInfo(path='dbfs:/mnt/eco_dailyRain/2013.daily_rain.nc', name='2013.daily_rain.nc', size=428217143), FileInfo(path='dbfs:/mnt/eco_dailyRain/2014.daily_rain.nc', name='2014.daily_rain.nc', size=428218104), FileInfo(path='dbfs:/mnt/eco_dailyRain/2015.daily_rain.nc', name='2015.daily_rain.nc', size=428217134), FileInfo(path='dbfs:/mnt/eco_dailyRain/2016.daily_rain.nc', name='2016.daily_rain.nc', size=429390127), FileInfo(path='dbfs:/mnt/eco_dailyRain/2017.daily_rain.nc', name='2017.daily_rain.nc', size=428217223), FileInfo(path='dbfs:/mnt/eco_dailyRain/2018.daily_rain.nc', name='2018.daily_rain.nc', size=418143765), FileInfo(path='dbfs:/mnt/eco_dailyRain/2019.daily_rain.nc', name='2019.daily_rain.nc', size=370034113), FileInfo(path='dbfs:/mnt/eco_dailyRain/Consignments.parquet', name='Consignments.parquet', size=237709917), FileInfo(path='dbfs:/mnt/eco_dailyRain/test.nc', name='test.nc', size=428217137)]
Just to test wether can read from mount.只是为了测试是否可以从 mount 读取。
spark.read.parquet('dbfs:/mnt/eco_dailyRain/Consignments.parquet')
confirms can read parquet file.确认可以读取镶木地板文件。
output输出
Out[83]: DataFrame[CONSIGNMENT_PK: int, CERTIFICATE_NO: string, ACTOR_NAME: string, GENERATOR_FK: int, TRANSPORTER_FK: int, RECEIVER_FK: int, REC_POST_CODE: string, WASTEDESC: string, WASTE_FK: int, GEN_LICNUM: string, VOLUME: int, MEASURE: string, WASTE_TYPE: string, WASTE_ADD: string, CONTAMINENT1_FK: int, CONTAMINENT2_FK: int, CONTAMINENT3_FK: int, CONTAMINENT4_FK: int, TREATMENT_FK: int, ANZSICODE_FK: int, VEH1_REGNO: string, VEH1_LICNO: string, VEH2_REGNO: string, VEH2_LICNO: string, GEN_SIGNEE: string, GEN_DATE: timestamp, TRANS_SIGNEE: string, TRANS_DATE: timestamp, REC_SIGNEE: string, REC_DATE: timestamp, DATECREATED: timestamp, DISCREPANCY: string, APPROVAL_NUMBER: string, TR_TYPE: string, REC_WASTE_FK: int, REC_WASTE_TYPE: string, REC_VOLUME: int, REC_MEASURE: string, DATE_RECEIVED: timestamp, DATE_SCANNED: timestamp, HAS_IMAGE: string, LASTMODIFIED: timestamp]
But trying to read netCDF4
files says No such file or directory
但是尝试读取netCDF4
文件说No such file or directory
Code:代码:
import datetime as dt # Python standard library datetime module
import numpy as np
from netCDF4 import Dataset # http://code.google.com/p/netcdf4-python/
import matplotlib.pyplot as plt
rootgrp = Dataset("dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc","r", format="NETCDF4")
Error错误
FileNotFoundError: [Errno 2] No such file or directory: b'dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc'
Any clues.任何线索。
According to the API reference of netCDF4 module
for class Dataset
, as the figure below.根据类Dataset
的netCDF4 module
的 API 参考,如下图。
The value of the path
parameter for Dataset
should be a path of unix directory format, but the path dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc
is a format for PySpark as I known, so you got the error FileNotFoundError: [Errno 2] No such file or directory: b'dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc'
. Dataset
的path
参数的值应该是unix目录格式的路径,但是dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc
是我知道的PySpark格式,所以你得到了错误FileNotFoundError: [Errno 2] No such file or directory: b'dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc'
。
The solution to fix it is to change the path value dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc
with the equivalence unix path /dbfs/mnt/eco_dailyRain/2001.daily_rain.nc
, the code as below.修复它的解决方案是将路径值dbfs:/mnt/eco_dailyRain/2001.daily_rain.nc
更改为等效的 unix 路径/dbfs/mnt/eco_dailyRain/2001.daily_rain.nc
,代码如下。
rootgrp = Dataset("/dbfs/mnt/eco_dailyRain/2001.daily_rain.nc","r", format="NETCDF4")
You can check it via the code below to see it.您可以通过下面的代码进行检查以查看它。
%sh
ls /dbfs/mnt/eco_dailyRain
Ofcouse, you also can list your data files of netCDF4 format via dbutils.fs.ls('/mnt/eco_dailyRain')
if you had mount it.当然,如果你挂载了它,你也可以通过dbutils.fs.ls('/mnt/eco_dailyRain')
列出你的 netCDF4 格式的数据文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.