简体   繁体   English

如何将NetCDF变量浮点数据读入Numpy数组,其精度和比例与原始NetCDF浮点值相同?

[英]How to read NetCDF variable float data into a Numpy array with the same precision and scale as the original NetCDF float values?

I have a NetCDF file which contains a variable with float values with precision/scale == 7/2, ie there are possible values from -99999.99 to 99999.99. 我有一个NetCDF文件,其中包含一个带有精度/ scale == 7/2的浮点值的变量,即有可能的值从-99999.99到99999.99。

When I take a slice of the values from the NetCDF variable and look at them in in my debugger I see that the values I now have in my array have more precision/scale than what I see in the original NetCDF. 当我从NetCDF变量中获取一些值并在我的调试器中查看它时,我看到我现在在数组中的值比我在原始NetCDF中看到的更精确/比例。 For example when I look at the values in the ToosUI/ncdump viewer they display as '-99999.99' or '12.45' but when I look at the values in the slice array they look like '-99999.9921875' (a greater scale length). 例如,当我查看ToosUI / ncdump查看器中的值时,它们显示为'-99999.99'或'12 .45'但是当我查看切片数组中的值时,它们看起来像'-99999.9921875'(更大的比例长度)。 So if I'm using '-99999.99' as the expected value to indicate a missing data point then I won't get a match with what gets pulled into the slice array since those values have a greater scale length and the additional digits in the scale are not just zeros for padding. 因此,如果我使用'-99999.99'作为指示缺少数据点的预期值,那么我将无法与切片数组中的内容匹配,因为这些值具有更大的比例长度和附加数字缩放不仅仅是填充的零。

For example I see this if I do a ncdump on a point within the NetCDF dataset: 例如,如果我对NetCDF数据集中的某个点执行ncdump,我会看到这个:

Variable: precipitation(0:0:1, 40:40:1, 150:150:1)

float precipitation(time=1348, lat=180, lon=360);
  :units = "mm/month";
  :long_name = "precipitation totals";

 data:

  {
    {
      {-99999.99}
    }
  }

However if I get a slice of the data from the variable like so: 但是,如果我从变量得到一片数据,如下所示:

value = precipitationVariable[0:1:1, 40:41:1, 150:151:1]

then I see it like this in my debugger (Eclipse/PyDev): 然后我在我的调试器(Eclipse / PyDev)中看到它:

value == ndarray: [[[-99999.9921875]]]

So it seems as if the NetCDF dataset values that I read into a Numpy array are not being read with the same precision/scale of the original values in the NetCDF file. 因此,似乎我读入Numpy数组的NetCDF数据集值没有以与NetCDF文件中原始值相同的精度/比例读取。 Or perhaps the values within the NetCDF are actually the same as what I'm seeing when I read them, but what's shown to me via ncdump is being truncated due to some format settings in the ncdump program itself. 或者NetCDF中的值实际上与我在阅读它时看到的相同,但是由于ncdump程序本身的某些格式设置,通过ncdump向我显示的内容被截断。

Can anyone advise as to what's happening here? 任何人都可以建议这里发生了什么? Thanks in advance for your help. 在此先感谢您的帮助。

BTW I'm developing this code using Python 2.7.3 on a Windows XP machine and using the Python module for the NetCDF4 API provided here: https://code.google.com/p/netcdf4-python/ BTW我在Windows XP机器上使用Python 2.7.3开发此代码,并在此处提供的NetCDF4 API使用Python模块: https//code.google.com/p/netcdf4-python/

There is no simple way of doing what you want because numpy stores the values as single precision, so they will always have the trailing numbers after 0.99. 没有简单的方法可以做你想要的,因为numpy将值存储为单精度,因此它们将始终具有0.99之后的尾随数字。

However, netCDF already provides a mechanism for missing data (see the best practices guide ). 但是,netCDF已经提供了丢失数据的机制(参见最佳实践指南 )。 How was the netCDF file written in the first place? netCDF文件是如何编写的? The missing_value is a special variable attribute that should be used to indicate those values that are missing. missing_value是一个特殊的变量属性 ,应该用于指示缺少的值。 In the C and Fortran interfaces, when the file is created all variable values are set to be missing. 在C和Fortran接口中,创建文件时,所有变量值都将设置为缺失。 If you wrote a variable all in one go, you can then set the missing_value attribute to an array of indices where the values are missing. 如果您一次性编写变量,则可以将missing_value属性设置为缺少值的索引数组。 See more about the fill values in the C and Fortran interfaces. 查看有关CFortran接口中填充值的更多信息。 This is the recommended approach. 这是推荐的方法。 The python netCDF4 module plays well with these missing values, and such arrays are read as masked arrays in numpy. python netCDF4模块可以很好地处理这些缺失值,并且这些数组在numpy中被读作掩码数组。

If you must work with the file you currently have, then I'd suggest creating a mask to cover values around your missing value: 如果您必须使用当前拥有的文件,那么我建议您创建一个掩码来覆盖缺失值周围的值:

import numpy as np
value = precipitationVariable[:]
mask = (value < -99999.98) & (value > -100000.00) 
value = np.ma.MaskedArray(value, mask=mask)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM