如何使用 Xarray 读取 lambda 中的 S3 文件？

Question

I am trying to read netCDF files placed in my S3 bucket, I am using Xarray to read the files.我正在尝试读取放置在我的 S3 存储桶中的 netCDF 文件，我正在使用 Xarray 来读取这些文件。 Below sample code runs fine, if I have the same file in my local folder like ~/downloads/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc but i am new to S3 and not sure what am i missing.下面的示例代码运行良好，如果我的本地文件夹中有相同的文件，如~/downloads/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc但我是 S3 新手，不确定我缺少什么。

I am trying to read netCDF via Xarray and convert it to csv.我正在尝试通过 Xarray 读取 netCDF 并将其转换为 csv。 Boto3 doesn`t work for reading netCDF4 and converting it to CSV. Boto3 不适用于读取 netCDF4 并将其转换为 CSV。

Below is my lambda function: -下面是我的 lambda function：-

import xarray

def handler(event, context):
    
    filename = 's3://netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc'
    ds= xarray.open_dataset(filename)
    for varname in ds:
        print(varname)

    tas0=ds['wet_bulb_potential_temperature']
    tas0

    return {
        'statusCode': 200,
        'message': 'Hello from Python Lambda Function!'
    }

I am getting below error, my S3 file path isn`t detected instead its Lambda is trying to find the file in local path.我遇到以下错误，我的 S3 文件路径未检测到，而是它的 Lambda 正在尝试在本地路径中查找文件。 Error message from cloud watch logs:来自云观察日志的错误消息：

File "/opt/python/lib/python3.6/site-packages/xarray/backends/file_manager.py", line 204, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success

FileNotFoundError: [Errno 2] No such file or directory: b'/var/task/s3:/netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc'

Answer 1

As far as I know, Xarray do not support S3 directly.据我所知，Xarray 不直接支持 S3。 You can use s3fs instead:您可以改用s3fs ：

import xarray
import s3fs

def handler(event, context):
    
    fs = s3fs.S3FileSystem(anon=True) # or anon=False to use default credentials

    with fs.open('netcdf-files/60e0489fcab82c714f516064b4e6b7acf724b7b9.nc', 'rb') as f:
        ds= xarray.open_dataset(filename)
        for varname in ds:
            print(varname)

    tas0=ds['wet_bulb_potential_temperature']
    tas0

    return {
        'statusCode': 200,
        'message': 'Hello from Python Lambda Function!'
    }

如何使用 Xarray 读取 lambda 中的 S3 文件？

问题描述

1 个解决方案

解决方案1
2 2020-07-17 06:40:55

如何使用 Xarray 读取 lambda 中的 S3 文件？

问题描述

1 个解决方案

解决方案1 2 2020-07-17 06:40:55

解决方案1
2 2020-07-17 06:40:55