简体   繁体   English

将时间序列数据转换为python中的网格化3D数组

[英]Converting time series data to gridded 3D array in python

I have a dataframe with the timeseries data for many years and includes values of variable at different lat lon locations every day.我有一个 dataframe,其中包含多年的时间序列数据,并且每天都包含不同经纬度位置的变量值。 For a given day, the variable is recorded at different locations.对于给定的一天,变量被记录在不同的位置。 Following is a snippet of the dataframe which I am reading in python pandas:以下是我在 python pandas 中阅读的 dataframe 的片段:

               lat      lon         variable  
Date                                                            
2017-12-31  12.93025  59.9239     10.459373     
2019-12-31  12.53044  43.9229     12.730064     
2019-02-28  12.37841  33.9245     37.487683  

I want to:我想要:

  1. Grid it to 2x2.5 degrees resolution将其网格化为 2x2.5 度分辨率
  2. Make a 3D array which includes the gridded data as well its time variation.制作一个 3D 数组,其中包含网格数据及其时间变化。 I want to get a gridded dataset as an array with the shape (time, lat, lon).我想获得一个网格化数据集作为具有形状(时间,纬度,经度)的数组。 This is because the dataframe that I grid at a certain resolution has to be compared with global meteorology data with a resolution of 2x2.5 degrees.这是因为我按一定分辨率网格化的dataframe要和分辨率为2x2.5度的全球气象数据进行对比。 (Also, my dataset does not record data from all locations on all days and will have to take care of the missing data while creating the final array). (此外,我的数据集不会全天候记录所有位置的数据,并且在创建最终数组时必须处理丢失的数据)。

I have looked into geopandas, xarray and histogram2d for gridding the data.我研究了 geopandas、xarray 和 histogram2d 以对数据进行网格化。 I have also successfully gridded the data using histigram2d function. However, could only achieve a 2D array which lacks time information making my analysis a challenge.我还使用 histigram2d function 成功地对数据进行了网格化。但是,只能实现缺少时间信息的二维数组,这使我的分析成为一个挑战。 I know, ideally I should concatenate the time dimesion to my 2D array but struggling with how exactly to do so given that not all locations record data at all times.我知道,理想情况下,我应该将时间维度连接到我的 2D 数组,但考虑到并非所有位置都始终记录数据,因此我很难准确地做到这一点。

This is how I used the histogram2d function for creating 1degree grid cells:这就是我如何使用 histogram2d function 创建 1degree 网格单元:

** **

#Plot histogram2d - for gridding the data:
df=df_in['2019'] #taking one year at a time
# Test data, globally distributed
lat_r = df['lat']
lon_r = df['lon']
z_r = df['variable']
lat = np.array(lat_r)
lon = np.array(lon_r)
z = np.array(z_r)
    
# Create binning
binlon = np.linspace(-180,180, 361)
binlat = np.linspace(-90, 90, 181)
zz, xx, yy = np.histogram2d(lon, lat, bins=(binlon, binlat), weights=z, normed=False)
counts, _, _= np.histogram2d(lon, lat, bins=(binlon, binlat))\

# Workaround for zero count values tto not get an error.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)]/counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)

#Final, gridded data:
hist = zi.T # shape(180,360)

** **

Any help in this regard will be much appreciated.在这方面的任何帮助将不胜感激。

I ended up making sample data and worked on both the 2D and the 3D case.我最终制作了示例数据并处理了 2D 和 3D 案例。 I'll start with the 2D case that you already have working because the extension to the 3D case is then very simple.我将从您已经使用的 2D 案例开始,因为 3D 案例的扩展非常简单。

2D二维

First, let's create some random sample data.首先,让我们创建一些随机样本数据。 Note that I import everything I need for later here请注意,我在这里导入了稍后需要的所有内容

import numpy as np
import matplotlib.pyplot as plt
import cartopy
from cartopy.crs import PlateCarree
from matplotlib.colors import Normalize

def create2Ddata():
    '''Makes some random data'''

    N = 2000
    lat = 10 * np.random.rand(N) + 40
    lon = 25 * np.random.rand(N) - 80
    z = np.sin(4*np.pi*lat/180.0*np.pi) + np.cos(8*np.pi*lon/180.0*np.pi)

    return lat, lon, z

# Create Data
lat, lon, z = create2Ddata()

This will serve as some random, scattered, geospatial, data that we want to plot using the histogram function. The next step is then to both create bins that make sense followed by actually binning.这将用作我们想要使用直方图 function plot 的一些随机、分散的地理空间数据。下一步是创建有意义的分箱,然后进行实际分箱。


def make2dhist(lon, lat, z, latbins, lonbins):
    '''Takes the inputs and creates 2D histogram'''
    zz, _, _ = np.histogram2d(lon, lat, bins=(
        lonbins, latbins), weights=z, normed=False)
    counts, _, _ = np.histogram2d(lon, lat, bins=(lonbins, latbins))\

    # Workaround for zero count values to not divide by zero.
    # Where counts == 0, zi = 0, else zi = zz/counts
    zi = np.zeros_like(zz)
    zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
        counts[counts.astype(bool)]
    zi = np.ma.masked_equal(zi, 0)

    return lonbins, latbins, zi

# Make bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)

# Bin the data
_, _, zi = make2dhist(lon, lat, z, latbins, lonbins)

Then, we plot both the scattered data and the binned data as follows.然后,我们 plot 分散数据和分箱数据如下。


def plotmap():
    '''background map plotting'''

    ax = plt.gca()
    ax.add_feature(cartopy.feature.LAND, zorder=0, edgecolor='None',
                   linewidth=0.5, facecolor=(0.8, 0.8, 0.8))
    ax.spines['geo'].set_linewidth(0.75)


fig = plt.figure()

# Just plot the scattered data
ax = plt.subplot(211, projection=PlateCarree())
plotmap()
plt.scatter(lon, lat, s=7, c=z, cmap='rainbow')

# Plot the binned 2D data
ax = plt.subplot(212, projection=PlateCarree())
plotmap()
plt.pcolormesh(
    lonbins, latbins, zi.T, shading='auto', transform=PlateCarree(),
    cmap='rainbow')
plt.show()

Figure 2D, not allowed to embed figures yet...图 2D,还不允许嵌入图形...

At the top, the scattered data, at the bottom the binned data.顶部是分散的数据,底部是分箱数据。


3D 3D

Let's continue with the 3D case.让我们继续 3D 的案例。 Again, let's create some random scattered data that varies in time:同样,让我们创建一些随时间变化的随机分散数据:


def create3Ddata():
    ''' Make random 3D data '''
    N = 8000
    lat = 10 * np.random.rand(N) + 40
    lon = 25 * np.random.rand(N) - 80
    t = 10 * np.random.rand(N)

    # Linearly changes sign of the cos+sin wavefield
    z = (t/5 - 1) * (np.sin(2*2*np.pi*lat/180.0*np.pi)
                     + np.cos(4*2*np.pi*lon/180.0*np.pi))

    return lat, lon, t, z


# Create Data
lat, lon, t, z = create3Ddata()

Now, instead of using histogram2d here, we will use histogramdd , which is just the N-dimensional version of the same function.现在,我们不使用histogram2d ,而是使用histogramdd ,它只是同一个 function 的 N 维版本。


def make3dhist(lon, lat, t, z, latbins, lonbins, tbins):
    '''Takes the inputs and creates 3D histogram just as the 2D histogram
    function'''
    zz, _ = np.histogramdd(
        np.vstack((lon, lat, t)).T,
        bins=(lonbins, latbins, tbins),
        weights=z, normed=False)

    counts, _ = np.histogramdd(
        np.vstack((lon, lat, t)).T,
        bins=(lonbins, latbins, tbins))
    # Workaround for zero count values tto not get an error.
    # Where counts == 0, zi = 0, else zi = zz/counts
    zi = np.zeros_like(zz)
    zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
        counts[counts.astype(bool)]
    zi = np.ma.masked_equal(zi, 0)
    return lonbins, latbins, tbins, zi

# Create bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)
tbins = np.linspace(np.min(t), np.max(t), 5)

# Bin the data
_, _, _, zi = make3dhist(lon, lat, t, z, latbins, lonbins, tbins)

Finally, we plot both the scattered data and the binned data side by side in respective time bins.最后,我们 plot 将分散数据和分箱数据并排放置在各自的时间箱中。 Note the normalization that is used to make sure variations in time are easily observed.请注意用于确保容易观察到时间变化的归一化。 Note that there are three loops (I could have put them in a single one, but this is nicer for readability).请注意,这里有三个循环(我可以将它们放在一个循环中,但这对于可读性来说更好)。

  1. The first loop bins the data in time and plots the binned data in one slice each.第一个循环及时对数据进行分箱,并在每个切片中绘制分箱数据。
  2. The second loop bins the data in time, then in space using the 2D histogram function from earlier, and plots a slice for each time bin.第二个循环按时间对数据进行分箱,然后使用之前的二维直方图 function 在空间中对数据进行分箱,并为每个时间分箱绘制一个切片。
  3. The third function uses the already 3D binned data from above and plots the slices as well by accessing slices in the 3D matrix.第三个 function 使用上面已经 3D 分箱的数据,并通过访问 3D 矩阵中的切片绘制切片。

# Normalize the colors so that variations in time are easily seen
norm = Normalize(vmin=-1.0, vmax=1.0)

fig = plt.figure(figsize=(12, 10))

# The scattered data in time bins
# Left column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 1, projection=PlateCarree())
    plotmap()

    # Find points in time bins
    pos = np.where((tbins[i] < t) & (t < tbins[i+1]))

    # Plot scatter points
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
    plt.scatter(lon[pos], lat[pos], c=z[pos], s=7, cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

# Center column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 2, projection=PlateCarree())
    plotmap()
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')

    # Find data points in time bins
    pos = np.where((tbins[i] < t) & (t <= tbins[i+1]))

    # Bin the data for each time bin separately
    _, _, zt = make2dhist(lon[pos], lat[pos], z[pos], latbins, lonbins)
    plt.pcolormesh(
        lonbins, latbins, zt.T, shading='auto', transform=PlateCarree(),
        cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

# Right column
for i in range(4):
    ax = plt.subplot(4, 3, 3*i + 3, projection=PlateCarree())
    plotmap()
    plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
    plt.pcolormesh(
        lonbins, latbins, zi[:, :, i].T, shading='auto', transform=PlateCarree(),
        cmap='rainbow', norm=norm)
    plt.colorbar(orientation='horizontal', pad=0.0)

plt.show()

Figure 3D, not allowed to embed figures yet...图3D,还不允许嵌图...

In the left column, the scattered, random, geospatial data, where the titles indicate the bins.在左列中,分散的、随机的地理空间数据,其中标题表示 bin。 In the center column, the 2D histograms using "by hand" time-binned data.在中心列中,二维直方图使用“手动”时间分级数据。 In the right column, the slices that were binned using a 3D histogram.在右列中,切片使用 3D 直方图进行分箱。 As expected center and right columns show the exact same thing.正如预期的那样,中间和右列显示完全相同的内容。

Hope this solves your problem.希望这能解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM