简体   繁体   English

将时间序列数据从csv转换为netCDF python

[英]Convert time series data from csv to netCDF python

Main problem during this process is the code below: 此过程中的主要问题是以下代码:

precip[:] = orig

Produces an error of: 产生以下错误:

ValueError: cannot reshape array of size 5732784 into shape (39811,144,144)

I have two CSV files, one of the CSV file contains all the actual data of a variable (precipitation), with each column as a station, and their corresponding coordinates is in the second separate CSV file. 我有两个CSV文件,其中一个CSV文件包含变量(降水)的所有实际数据,每一列都作为一个桩号,它们的相应坐标位于第二个单独的CSV文件中。 My sample data is in google drive here . 我的示例数据在google驱动器中

If you want to have a look at the data itself, but my 1st CSV file has the shape (39811, 144) and 2nd CSV file has the shape (171, 10) but note; 如果您想查看数据本身,但是我的第一个CSV文件的形状为(39811,144),而第二个CSV文件的形状为(171,10)但请注意; I'm only using the sliced dataframe as (144, 2). 我只将切片的数据帧用作(144,2)。

This is the code: 这是代码:

stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('longitude',lons.shape[0])
ncout.createDimension('latitude',lats.shape[0])
ncout.createDimension('precip',orig.shape[1])
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('longitude',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'longitude','latitude'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

I'm mostly basing my code to this post: convert-csv-to-netcdf but does not include the variable 'TIME' as a 3rd dimension, so that's where I'm failing. 我主要将代码建立在这篇文章的基础上: convert-csv-to-netcdf,但不包含变量“ TIME”作为第三维,因此这就是我失败的地方。 I think I should be expecting the precipitation variable to have a shape in the form (39811, 144, 144), but the error suggests otherwise. 我认为我应该期望降水量变量具有以下形式的形状(39811、144、144),但错误表明并非如此。

Not exactly sure how to deal with this, any inputs are appreciated. 不完全确定如何处理此问题,感谢您的投入。

As you have data from different stations, I would suggest using dimension station for your netCDF file and not separate lon and lat . 由于您具有来自不同测站的数据,因此我建议对您的netCDF文件使用维度station ,而不是将lonlat分开。 Of course, you can save the longitude and latitude of each station to separate variable. 当然,您可以将每个测站的经度和纬度保存为单独的变量。

Here is one possible solution, using your code as an example: 这是一种可能的解决方案,以您的代码为例:

#!/usr/bin/env ipython
import pandas as pd
import numpy as np
import netCDF4

stn_precip='Precip_1910-2018_stations.csv'
orig_precip='Precip_1910-2018_origvals.csv'
stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']
nstations = np.size(lons)

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('station',nstations)
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('station',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('station',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'station'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

So the information about output file ( ncdump -h Precip_1910-2018_homomod.nc ) is like this: 因此有关输出文件( ncdump -h Precip_1910-2018_homomod.nc )的信息如下所示: 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM