將 netCDF 文件轉換為 csv

Question

我正在努力將幾個 Berekeley Earth netCDF 文件轉換為 CSV 或其他表格格式。 我意識到以前曾提出過類似的問題，但我無法應用遇到的任何解決方案。

netCDF 實用程序中的ncdump似乎不會生成實際的 CSV 文件。 我找不到有關如何執行此操作的任何說明。
我嘗試使用xarray.to_dataframe()將數據加載到pandas dataframe 中，但我的筆記本無法分配所需的 memory。

In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: nc = xr.open_dataset('Complete_TAVG_Daily_EqualArea.nc')

In [4]: nc
Out[4]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, time: 50769)
Dimensions without coordinates: map_points, time
Data variables:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
    date_number  (time) float64 ...
    year         (time) float64 ...
    month        (time) float64 ...
    day          (time) float64 ...
    day_of_year  (time) float64 ...
    land_mask    (map_points) float64 ...

In [5]: df = nc.to_dataframe()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
(...)

MemoryError: Unable to allocate 532. MiB for an array with shape (279127962,) and data type int16

我試過用Panoply轉換。 CSV 導出似乎只能將單個變量（我希望將其視為一列）導出到單行文件中。

我肯定錯過了什么。 有人能幫我嗎？

Answer 1

您缺少的是 netCDF 是一種比 CVS 復雜得多的格式。 一個 netCDF 文件可以包含多個任意形狀和大小的 arrays。 CSV 文件只能包含最大 2 維的單個數組（或一組 1D arrays，如果它們都具有相同的長度）。 因此，您不能簡單地將任何 netCDF 文件轉換為 CSV。

讓我們看一下您提供的示例文件。 我在這里用我的 Xarray 版本重復信息，這似乎有點冗長......

In [16]: ds = xr.open_dataset('Complete_TAVG_EqualArea.nc')

In [17]: ds
Out[17]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, month_number: 12, time: 3240)
Coordinates:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
  * time         (time) float64 1.75e+03 1.75e+03 1.75e+03 ... 2.02e+03 2.02e+03
Dimensions without coordinates: map_points, month_number
Data variables:
    land_mask    (map_points) float64 ...
    temperature  (time, map_points) float32 ...
    climatology  (month_number, map_points) float32 ...
Attributes:
    Conventions:          Berkeley Earth Internal Convention (based on CF-1.5)
    title:                Native Format Berkeley Earth Surface Temperature An...
    history:              16-Jan-2020 06:51:38
    institution:          Berkeley Earth Surface Temperature Project
    source_file:          Complete_TAVG.50985s.20200116T064041.mat
    source_history:       13-Jan-2020 17:22:52
    source_data_version:  ca6f26341938dae0ea7dd619bce6f15e
    comment:              This file contains Berkeley Earth surface temperatu...

有三個數據變量（land_mask、溫度、氣候），加上三個坐標向量（經度、緯度、時間）。 也許您可以將坐標矢量包含在 CSV 文件的第一行和第一列中，但即便如此，這意味着每個 netCDF 文件至少需要三個單獨的 CSV 文件。

例如，對於climatology數據框，您可以按如下方式寫入 CVS：

In [31]: clim = ds['climatology']  

In [32]: clim.to_pandas().to_csv('clim.csv')

所以clim是一個xarray.DataFrame ，原則上可以寫入 CSV 文件。 不幸的是， xarray.DataFrame class 沒有to_csv方法。 但是pandas.DataFrame class 確實如此，所以我們首先將其轉換為 Z3A43B4F88325D94022C0EFA9 在此處查看其參數文檔以調整生成的 output 文件。

Answer 2

您可以使用 CDO package 套件將 a.nc 轉換為.csv。

示例代碼（您需要編輯一些 outputtab 參數：

cdo -outputtab,date,lon,lat,value infile.nc | awk 'FNR==1{ row=$2","$3","$4","$5;print row  } FNR!=1{ row=$1","$2","$3","$4; print row}' > outfile.csv

將 netCDF 文件轉換為 csv

問題描述

2 個解決方案

解決方案1
3 已采納 2020-04-17 19:37:41

解決方案2
1 2020-12-06 18:11:13

將 netCDF 文件轉換為 csv

問題描述

2 個解決方案

解決方案1 3 已采納 2020-04-17 19:37:41

解決方案2 1 2020-12-06 18:11:13

解決方案1
3 已采納 2020-04-17 19:37:41

解決方案2
1 2020-12-06 18:11:13