如何使用Python读取NetCDF文件并写入CSV

Question

My aim is to access data from a netcdf file and write to a CSV file in the following format. 我的目的是从netcdf文件访问数据并以以下格式写入CSV文件。

Latitude  Longitude Date1  Date2  Date3
100       200       <-- MIN_SFC values -->

So far I have accessed the variables, written the header to the file and populated the lat/lons. 到目前为止，我已经访问了变量，将标头写入文件并填充了纬度/经度。

How can I access the MIN_SFC values for specified lon,lat coordinates and dates and then write to a CSV file. 如何访问指定的lon，lat坐标和日期的MIN_SFC值，然后将其写入CSV文件。

I'm a python newbie if there is a better way to go about this please let me know. 我是python新手，如果有更好的方法可以解决这个问题，请告诉我。

NetCDF file info: NetCDF文件信息：

Dimensions:
  time = 7 
  latitude = 292
  longitude =341

Variables:
  float MIN_SFC (time=7, latitude = 292, longitude = 341)

Here's what I've tried: 这是我尝试过的：

  from netCDF4 import Dataset, num2date filename = "C:/filename.nc" nc = Dataset(filename, 'r', Format='NETCDF4') print nc.variables print 'Variable List' for var in nc.variables: print var, var.units, var.shape # get coordinates variables lats = nc.variables['latitude'][:] lons = nc.variables['longitude'][:] sfc= nc.variables['Min_SFC'][:] times = nc.variables['time'][:] # convert date, how to store date only strip away time? print "Converting Dates" units = nc.variables['time'].units dates = num2date (times[:], units=units, calendar='365_day') #print [dates.strftime('%Y%m%d%H') for date in dates] header = ['Latitude', 'Longitude'] # append dates to header string for d in dates: print d header.append(d) # write to file import csv with open('Output.csv', 'wb') as csvFile: outputwriter = csv.writer(csvFile, delimiter=',') outputwriter.writerow(header) for lat, lon in zip(lats, lons): outputwriter.writerow( [lat, lon] ) # close the output file csvFile.close() # close netcdf nc.close()

UPDATE: 更新：

I've updated the code that writes the CSV file, there's an attribute error, because the lat/lon are doubles. 我已经更新了写入CSV文件的代码，这是一个属性错误，因为经纬度是双精度。

AttributeError: 'numpy.float32' object has no attribute 'append' AttributeError：“ numpy.float32”对象没有属性“ append”

Any way to cast to a string in python? 有什么办法在python中强制转换为字符串？ Do you think it'll work? 你认为这行得通吗？

I've noticed a number of values returned as "--" when I printed values to the console. 当我在控制台上打印值时，我注意到许多返回为“-”的值。 I'm wondering if this represents the fillValue or missingValue defined as -32767.0. 我想知道这是否代表定义为-32767.0的fillValue或missingValue。

I'm also wondering whether the variables of the 3d dataset should be accessed by lats = nc.variables['latitude'][:][:] or lats = nc.variables['latitude'][:][:,:] ? 我还想知道3d数据集的变量是否应该通过lats = nc.variables ['latitude'] [：] [：]或lats = nc.variables ['latitude'] [：] [:: ]？

 # the csv file is closed when you leave the block with open('output.csv', 'wb') as csvFile: outputwriter = csv.writer(csvFile, delimiter=',') for time_index, time in enumerate(times): # pull the dates out for the header t = num2date(time, units = units, calendar='365_day') header.append(t) outputwriter.writerow(header) for lat_index, lat in enumerate(lats): content = lat print lat_index for lon_index, lon in enumerate(lons): content.append(lon) print lon_index for time_index, time in enumerate(times): # for a date # pull out the data data = sfc[time_index,lat_index,lon_index] content.append(data) outputwriter.writerow(content)

Answer 1

I would load the data into Pandas, which facilitates the analysis and plotting of time series data, as well as writing to CSV. 我会将数据加载到Pandas中，这有助于对时间序列数据进行分析和绘图以及写入CSV。

So here's a real working example which pulls a time series of wave heights from a specified lon,lat location out of a global forecast model dataset. 因此，这是一个真实的工作示例，该示例从指定的lon，lat位置中提取了一个波高的时间序列，并将其从全局预测模型数据集中导出。

Note: here we access an OPeNDAP dataset so we can just extract the data we need from a remote server without downloading files. 注意：这里我们访问OPeNDAP数据集，因此我们可以从远程服务器提取所需的数据，而无需下载文件。 But netCDF4 works exactly the same for a remove OPeNDAP dataset or a local NetCDF file, which is a very useful feature! 但是netCDF4对于删除的OPeNDAP数据集或本地NetCDF文件的工作原理完全相同，这是非常有用的功能！

import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

# NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file:
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/WW3/Global/Best'
nc = netCDF4.Dataset(url)
nc.variables.keys()

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)

# determine what longitude convention is being used [-180,180], [0,360]
print lon.min(),lon.max()

# specify some location to extract time series
lati = 41.4; loni = -67.8 +360.0  # Georges Bank

# find closest index to specified value
def near(array,value):
    idx=(abs(array-value)).argmin()
    return idx

# Find nearest point to desired location (could also interpolate, but more work)
ix = near(lon, loni)
iy = near(lat, lati)

# Extract desired times.      
# 1. Select -+some days around the current time:
start = dt.datetime.utcnow()- dt.timedelta(days=3)
stop = dt.datetime.utcnow()+ dt.timedelta(days=3)
#       OR
# 2. Specify the exact time period you want:
#start = dt.datetime(2013,6,2,0,0,0)
#stop = dt.datetime(2013,6,3,0,0,0)

istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop

# Get all time records of variable [vname] at indices [iy,ix]
vname = 'Significant_height_of_wind_waves_surface'
#vname = 'surf_el'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]

# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)

# Use Pandas time series plot method
ts.plot(figsize(12,4),
   title='Location: Lon=%.2f, Lat=%.2f' % ( lon[ix], lat[iy]),legend=True)
plt.ylabel(var.units);

#write to a CSV file
ts.to_csv('time_series_from_netcdf.csv')

which both creates this plot to verify that you've got the data you wanted: 两者都会创建此图以验证您是否拥有所需的数据： 在此处输入图片说明

and also writes the desired CSV file time_series_from_netcdf.csv to disk. 并将所需的CSV文件time_series_from_netcdf.csv写入磁盘。

You can also view, download and/or run this example on Wakari . 您还可以在Wakari上查看，下载和/或运行此示例。

Answer 2

Rich Signell's answer was incredibly helpful! Rich Signell的回答非常有帮助！ Just as a note, it's important to also import datetime, and when extracting times, it's necessary to use the following code: 请注意，导入日期时间也很重要，提取时间时，必须使用以下代码：

import datetime
import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

...

# 2. Specify the exact time period you want:
start = datetime.datetime(2005,1,1,0,0,0)
stop = datetime.datetime(2010,12,31,0,0,0)

I then looped over all the regions that I needed for my dataset. 然后，我遍历了数据集所需的所有区域。

Answer 3

Not sure what you're still having trouble with, this looks good. 不确定您仍然遇到什么问题，这看起来不错。 I do see: 我确实看到了：

# convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

you now have the dates as python datetime objects 您现在将日期作为python datetime对象

 #print [dates.strftime('%Y%m%d%H') for date in dates]

and this is what you need if you want them as strings -- but if you only want the day, remove the %H: 这就是您想要将它们作为字符串使用时所需要的-但是，如果只希望这一天，请删除％H：

date_strings = [dates.strftime('%Y%m%d') for date in dates] date_strings = [dates.strftime（'％Y％m％d'）表示日期中的日期]

if you want the year, month day as numbers, datetime objects have attributes for that: 如果您希望年，月日为数字，则datetime对象具有以下属性：

dt.year, dt.month, dt.day dt.year，dt.month，dt.day

As for your sfc variable -- is a 3-d array, so to get a particular value, you can do: 至于sfc变量-是一个3-d数组，因此要获得特定值，您可以执行以下操作：

sfc[time_index, lat_index, lon_index] sfc [time_index，lat_index，lon_index]

being 3-d there are more than one way to write it to a csv file, but I'm guessing you might want something like: 作为3-D，有多种方法可以将其写入csv文件，但我猜您可能想要类似的东西：

for time_index, time in enumerate(time): # pull out the data for that time data = sfc[time_index, :, :] # write the date to the file (maybe) # .... Now loop through the "rows" for row in data: outputwriter.writerow( [str(val) for val in row] ) 对于time_index，以enumerate（time）表示的时间：＃提取该时间的数据data = sfc [time_index，：，：]＃将日期写入文件（也许）＃...。现在遍历“行”对于数据中的行：outputwriter.writerow（[str（val）对于行中的val]）

Or something like that.... 或类似的东西....

Answer 4

The problem with the attribute error is because content needs to be a list, and you initialize it with lat , which is just a number. 属性错误的问题是因为content需要是一个列表，并且使用lat初始化它，而lat只是一个数字。 You need to stuff that into a list. 您需要将其填充到列表中。

Regarding the 3D variables, lats = nc.variables['latitude'][:] is sufficient to read all the data. 关于3D变量， lats = nc.variables['latitude'][:]足以读取所有数据。

Update: Iterate over lon/lat together 更新：一起迭代lon / lat

Here's your code with the mod for the list and iteration : 这是用于列表和迭代的带有mod的代码：

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
        t = num2date(time, units = units, calendar='365_day')
        header.append(t)
    outputwriter.writerow(header)

    for latlon_index, (lat,lon) in enumerate(zip(lats, lons)):
        content = [lat, lon] # Put lat and lon into list
        print latlon_index
        for time_index, time in enumerate(times): # for a date
            # pull out the data 
            data = sfc[time_index,lat_index,lon_index]
            content.append(data)
            outputwriter.writerow(content)``

I haven't actually tried to run this, so there may be other problems lurking. 我实际上并未尝试运行此操作，因此可能存在其他隐患。

如何使用Python读取NetCDF文件并写入CSV

问题描述

4 个解决方案

解决方案1
6 2015-02-10 11:32:02

解决方案2
1 2017-06-05 14:25:38

解决方案3
0 2015-02-10 00:39:49

解决方案4
0 2015-02-11 18:33:35

如何使用Python读取NetCDF文件并写入CSV

问题描述

4 个解决方案

解决方案1 6 2015-02-10 11:32:02

解决方案2 1 2017-06-05 14:25:38

解决方案3 0 2015-02-10 00:39:49

解决方案4 0 2015-02-11 18:33:35

解决方案1
6 2015-02-10 11:32:02

解决方案2
1 2017-06-05 14:25:38

解决方案3
0 2015-02-10 00:39:49

解决方案4
0 2015-02-11 18:33:35