使用 python 平均多个 netCDF4 文件

Question

I am a bit of a netCDF in python noob so please excuse this noob question.我在 python 菜鸟中有点 netCDF 所以请原谅这个菜鸟问题。

I have a folder filled with circa 3650 netCDF4 files.我有一个文件夹，里面装满了大约 3650 个 netCDF4 文件。 One file per day for a decade.每天一个文件，持续十年。 the niles are named yyyymmdd.nc (eg 20100101,20100102,20100103,etc.).尼罗河被命名为 yyyymmdd.nc（例如 20100101,20100102,20100103 等）。 Each.nc file contains latitude, longitude, and temperature at one-time point for the same area - a section of the Tonga EEZ.每个.nc 文件都包含同一区域（汤加专属经济区的一部分）的某个时间点的纬度、经度和温度。

What I am trying to do is compute the average temperature for each lat and lon from across all files, ie I want to end up with one.nc file that has all the same lats and lons and average temperature across 10 years.我想要做的是从所有文件中计算每个纬度和经度的平均温度，即我想最终得到一个具有相同纬度和经度以及 10 年平均温度的文件。

I have tried different things/versions of code, usually, they end up looking something like this.....我尝试过不同的东西/版本的代码，通常，它们最终看起来像这样......

files = glob('*.nc')
ds = xr.open_mfdataset(files,)
mean = np.mean(ds['temp'][:, 0].values)

...... This code would give me the average temperature within a.nc file for all.nc files and not the average temperature based on lat and lon across a decade worth of files. ......此代码将为我提供所有.nc 文件的 a.nc 文件中的平均温度，而不是基于 lat 和 lon 的十年文件中的平均温度。

All and any help is much appreciated.非常感谢所有和任何帮助。

Thank you.谢谢你。

Answer 1

Assuming you are working on linux/macOS, this can be done easily using my nctoolkit package(see details here ).假设您正在使用 linux/macOS，这可以使用我的 nctoolkit 包轻松完成（请参阅此处的详细信息）。

The following will calculate the mean across all files and then plot the results:下面将计算所有文件的平均值，然后计算 plot 结果：

import nctoolkit as nc
files = glob('*.nc')
ds = nc.open_data(file)
ds.ensemble_mean()
ds.plot()

nctoolkit uses CDO as a back-end by default, but can use NCO as well, which can result in a performance improvement. nctoolkit 默认使用 CDO 作为后端，但也可以使用 NCO，这可以提高性能。 So the following might be faster:所以以下可能会更快：

import nctoolkit as nc
files = glob('*.nc')
ds = nc.open_data(file)
ds.ensemble_mean(nco=True)
ds.plot()

Answer 2

You can use the cdo package to do this using a wild card in the input file name.您可以使用 cdo package 在输入文件名中使用通配符来执行此操作。 I've only tested it with a small number of files though, there is a caveat in that you might hit a system limit on the number of open files.不过，我只用少量文件对其进行了测试，但需要注意的是，您可能会达到系统打开文件数量的限制。

from cdo import *
cdo=Cdo()
cdo.ensmean(input='*.nc',output='ensmean.nc')

This is basically the equivalent of the command line call to cdo这基本上相当于命令行调用 cdo

cdo ensmean *.nc ensmean.nc

That said, it sounds to me like it would be better to cat them together and then use timmean:也就是说，在我看来，将它们放在一起然后使用 timmean 会更好：

cdo.timmean(input=cdo.mergetime(input='*.nc'),output='timmean.nc')

which again is the python equivalent to这又是 python 等价于

cdo mergetime *.nc all.nc
cdo timmean all.nc timmean.nc

try both and see which one works/is fastest:-)尝试两者，看看哪个有效/最快:-)

使用 python 平均多个 netCDF4 文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-06-07 14:11:12

解决方案2
0 2021-06-09 12:11:23

使用 python 平均多个 netCDF4 文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-06-07 14:11:12

解决方案2 0 2021-06-09 12:11:23

解决方案1
1 已采纳 2021-06-07 14:11:12

解决方案2
0 2021-06-09 12:11:23