简体   繁体   English

读取存储在R中远程文件系统上的netCDF文件?

[英]Reading netCDF files stored on a remote filesystem in R?

I need to read a netCDF file into R that is stored on a remote filesystem. 我需要将netCDF文件读入R,该文件存储在远程文件系统中。 I do have ssh access to the filesystem, but the files are too big to store onto my local computer. 我确实可以通过ssh访问文件系统,但是文件太大而无法存储到本地计算机上。

I have tried the advice from here: Can R read from a file through an ssh connection? 我从这里尝试了以下建议: R可以通过ssh连接从文件读取吗? I tried the following: 我尝试了以下方法:

library(ncdf)
d = open.ncdf(pipe('ssh hostname "path/to/file/foo.nc"'))

However, I keep getting the error 但是,我不断收到错误

bash: path/to/file/foo.nc: Permission denied

Any ideas on how to fix this? 有想法该怎么解决这个吗?

It is not possible to open the file directly from within R using ssh, but there are a few options available to you. 无法使用ssh从R内部直接打开文件,但是有一些可用的选项。

1. Mount the remote server as a local filesystem over ssh. 1.通过ssh将远程服务器安装为本地文件系统。

There are packages which will let you mount remote machines as local filesystems over ssh; 有一些软件包可以让您通过ssh将远程计算机作为本地文件系统挂载。 on Linux, for example, you might use sshfs whereas on Windows you might use win-sshfs . 例如,在Linux上,您可以使用sshfs而在Windows上,则可以使用win-sshfs Once you've mounted the remote file system, you would be able to access the netcdf files from R just as you would any other file, although I'm not sure what the performance implications may be. 一旦安装了远程文件系统,就可以像其他任何文件一样从R访问netcdf文件,尽管我不确定性能可能会如何。

2. Break the larger files down into smaller files. 2.将较大的文件分解为较小的文件。

Use the command-line ncdump utility, on the server, to create smaller files from the large files which are able to fit on your local file system. 使用服务器上的命令行ncdump实用程序,从大文件中创建较小的文件,这些文件可以放入本地文件系统。

$ ncdump -v [var1],[var2] big.nc > smaller.cdl $ ncdump -v [var1],[var2] big.nc> small.cdl

smaller.cdl will be a text file; small.cdl将是一个文本文件; you can generate a binary netcdf .nc file by using ncgen : 您可以使用ncgen生成一个二进制netcdf .nc文件:

$ ncgen -b -o smaller.nc smaller.cdl $ ncgen -b -o较小.nc较小.cdl

3. Use an OpenDAP service on the remote server. 3.在远程服务器上使用OpenDAP服务。

Unless your remote server is already set up to provide OpenDAP service, this is probably overkill. 除非您的远程服务器已设置为提供OpenDAP服务,否则这可能是过分的。 But if it is, you may use a combination of R's OPeNDAP access and netCDF's OPenDAP subset service to retrieve data subsets on the fly. 但是,如果是这样,则可以结合使用R的OPeNDAP访问和netCDF的OPenDAP子集服务来即时检索数据子集。 You can also use ncdump on your local machine to request a subset of data from the server. 您还可以在本地计算机上使用ncdump从服务器请求数据的子集。

I'd try and arrange a samba or NFS share. 我会尝试安排samba或NFS共享。 After that you can simply approach the file as any other. 之后,您可以像处理其他文件一样简单地处理文件。

It is not possible to do it via ssh. 无法通过ssh进行操作。 The pipe command executes a shell command. pipe命令执行一个shell命令。 You are trying to execute path/to/file/foo.nc , which fails because it is not an executable. 您正在尝试执行path/to/file/foo.nc ,因为它不是可执行path/to/file/foo.nc ,所以失败了。 The examples you gave read output from stdin, which is parsed by R. This is not the same. 您提供的示例从stdin读取了输出,该输出由R解析。这是不同的。

The closest you could get is to use ncdump on the remote machine, which can be used to convert variables from the files into a text version, which you may be able to parse. 您能获得的最接近的结果是在远程计算机上使用ncdump ,该计算机可用于将文件中的变量转换为文本版本,您可以解析该文本版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM