简体   繁体   English

如何计算两个 N 维 xarray 的 Pearson 相关性?

[英]How to calculate Pearson's Correlation for two N-dimensional xarrays?

I have two.netcdf files, imported as xarrays (please see summary images below), containing seasonal precipitation data (lat, lon, season, precip) over Africa (regridded to the same grid).我有两个 .netcdf 文件,作为 xarrays 导入(请参阅下面的摘要图像),其中包含非洲的季节性降水数据(lat、lon、season、precip)(重新网格化到同一网格)。 I would like to compare each season by calculating a Pearson's correlation coefficient (a pattern correlation) to be used in a Taylor diagram (for each season).我想通过计算用于泰勒图(每个季节)的皮尔逊相关系数(模式相关)来比较每个季节。 I have tried numpy's corrcoef , but this returns a matrix and I need a single value.我试过 numpy 的corrcoef ,但这会返回一个矩阵,我需要一个值。 I have also tried scipy's pearsonsr but it raises an error (The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()).我也尝试过 scipy 的pearsonsr但它引发了一个错误(具有多个元素的数组的真值不明确。使用 a.any() 或 a.all())。 I am new to python and.netcdf files and so would appreciate any guidance.我是 python 和 .netcdf 文件的新手,因此非常感谢任何指导。

File1:文件一:

<xarray.DataArray 'pre' (season: 4, lat: 162, lon: 162)>
array([[[       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        [       nan,        nan,        nan, ...,        nan,
                nan,        nan],
        ...,
        [       nan,        nan,        nan, ..., 21.462164 ,
         21.921623 , 20.583786 ],
        [       nan,        nan,        nan, ..., 22.240545 ,
         21.24054  , 21.135136 ],
        [       nan,        nan,        nan, ..., 20.78919  ,
         20.45946  , 18.62973  ]]], dtype=float32)
Coordinates:
  * lon   (lon) float64 -20.25 -19.75 -19.25 -18.75 ... 59.25 59.75 60.25
  * lat   (lat) float64 -40.25 -39.75 -39.25 -38.75 ... 39.25 39.75 40.25
  * season(season) object 'DJF' 'JJA' 'MAM' 'SON'

File2:文件2:

<xarray.DataArray 'tp' (season: 4, lat: 162, lon: 162)>
array([[[        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        [        nan,         nan,         nan, ...,         nan,
                 nan,         nan],
        ...,
        [        nan,         nan,         nan, ..., 21.7096725 ,
         21.09724263, 19.69123712],
        [        nan,         nan,         nan, ..., 21.2375123 ,
         20.71120389, 20.73519617],
        [        nan,         nan,         nan, ..., 20.80748653,
         19.70237051, 18.9941896 ]]])
Coordinates:
  * lon   (lon) float64 -20.25 -19.75 -19.25 -18.75 ... 59.25 59.75 60.25
  * lat   (lat) float64 -40.25 -39.75 -39.25 -38.75 ... 39.25 39.75 40.25
  * season(season) object 'DJF' 'JJA' 'MAM' 'SON'

If you're just looking for a single correlation coefficient across all the values, between one array and the other, try flattening the arrays and then get one of the values off the diagonal (eg [0, 1] ).:如果您只是在一个数组和另一个数组之间寻找所有值之间的单个相关系数,请尝试展平 arrays,然后从对角线上获取其中一个值(例如[0, 1] )。:

>>> res = np. corrcoef(arr1.values.flat, arr2.values.flat)
>>> res
array([[1.       , 0.9985713],
       [0.9985713, 1.       ]])

>>> res[0, 1]
0.9985713

corrcoef always returns an array giving the correlation with respect to each slice of the arrays, but if you're looking for the correlation across all elements in the two then you need to reshape them to be vectors. corrcoef总是返回一个数组,给出与 arrays 的每个切片的相关性,但是如果您正在寻找两者中所有元素之间的相关性,那么您需要将它们重塑为向量。

You can do this:你可以这样做:

    pearson = xr.corr(file1,file2)
pearson = np.array(pearson)  #like this you can see the value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM