[英]Converting time series data to gridded 3D array in python
I have a dataframe with the timeseries data for many years and includes values of variable at different lat lon locations every day.我有一个 dataframe,其中包含多年的时间序列数据,并且每天都包含不同经纬度位置的变量值。 For a given day, the variable is recorded at different locations.
对于给定的一天,变量被记录在不同的位置。 Following is a snippet of the dataframe which I am reading in python pandas:
以下是我在 python pandas 中阅读的 dataframe 的片段:
lat lon variable
Date
2017-12-31 12.93025 59.9239 10.459373
2019-12-31 12.53044 43.9229 12.730064
2019-02-28 12.37841 33.9245 37.487683
I want to:我想要:
I have looked into geopandas, xarray and histogram2d for gridding the data.我研究了 geopandas、xarray 和 histogram2d 以对数据进行网格化。 I have also successfully gridded the data using histigram2d function. However, could only achieve a 2D array which lacks time information making my analysis a challenge.
我还使用 histigram2d function 成功地对数据进行了网格化。但是,只能实现缺少时间信息的二维数组,这使我的分析成为一个挑战。 I know, ideally I should concatenate the time dimesion to my 2D array but struggling with how exactly to do so given that not all locations record data at all times.
我知道,理想情况下,我应该将时间维度连接到我的 2D 数组,但考虑到并非所有位置都始终记录数据,因此我很难准确地做到这一点。
This is how I used the histogram2d function for creating 1degree grid cells:这就是我如何使用 histogram2d function 创建 1degree 网格单元:
** **
#Plot histogram2d - for gridding the data:
df=df_in['2019'] #taking one year at a time
# Test data, globally distributed
lat_r = df['lat']
lon_r = df['lon']
z_r = df['variable']
lat = np.array(lat_r)
lon = np.array(lon_r)
z = np.array(z_r)
# Create binning
binlon = np.linspace(-180,180, 361)
binlat = np.linspace(-90, 90, 181)
zz, xx, yy = np.histogram2d(lon, lat, bins=(binlon, binlat), weights=z, normed=False)
counts, _, _= np.histogram2d(lon, lat, bins=(binlon, binlat))\
# Workaround for zero count values tto not get an error.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)]/counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)
#Final, gridded data:
hist = zi.T # shape(180,360)
** **
Any help in this regard will be much appreciated.在这方面的任何帮助将不胜感激。
I ended up making sample data and worked on both the 2D and the 3D case.我最终制作了示例数据并处理了 2D 和 3D 案例。 I'll start with the 2D case that you already have working because the extension to the 3D case is then very simple.
我将从您已经使用的 2D 案例开始,因为 3D 案例的扩展非常简单。
First, let's create some random sample data.首先,让我们创建一些随机样本数据。 Note that I import everything I need for later here
请注意,我在这里导入了稍后需要的所有内容
import numpy as np
import matplotlib.pyplot as plt
import cartopy
from cartopy.crs import PlateCarree
from matplotlib.colors import Normalize
def create2Ddata():
'''Makes some random data'''
N = 2000
lat = 10 * np.random.rand(N) + 40
lon = 25 * np.random.rand(N) - 80
z = np.sin(4*np.pi*lat/180.0*np.pi) + np.cos(8*np.pi*lon/180.0*np.pi)
return lat, lon, z
# Create Data
lat, lon, z = create2Ddata()
This will serve as some random, scattered, geospatial, data that we want to plot using the histogram function. The next step is then to both create bins that make sense followed by actually binning.这将用作我们想要使用直方图 function plot 的一些随机、分散的地理空间数据。下一步是创建有意义的分箱,然后进行实际分箱。
def make2dhist(lon, lat, z, latbins, lonbins):
'''Takes the inputs and creates 2D histogram'''
zz, _, _ = np.histogram2d(lon, lat, bins=(
lonbins, latbins), weights=z, normed=False)
counts, _, _ = np.histogram2d(lon, lat, bins=(lonbins, latbins))\
# Workaround for zero count values to not divide by zero.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)
return lonbins, latbins, zi
# Make bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)
# Bin the data
_, _, zi = make2dhist(lon, lat, z, latbins, lonbins)
Then, we plot both the scattered data and the binned data as follows.然后,我们 plot 分散数据和分箱数据如下。
def plotmap():
'''background map plotting'''
ax = plt.gca()
ax.add_feature(cartopy.feature.LAND, zorder=0, edgecolor='None',
linewidth=0.5, facecolor=(0.8, 0.8, 0.8))
ax.spines['geo'].set_linewidth(0.75)
fig = plt.figure()
# Just plot the scattered data
ax = plt.subplot(211, projection=PlateCarree())
plotmap()
plt.scatter(lon, lat, s=7, c=z, cmap='rainbow')
# Plot the binned 2D data
ax = plt.subplot(212, projection=PlateCarree())
plotmap()
plt.pcolormesh(
lonbins, latbins, zi.T, shading='auto', transform=PlateCarree(),
cmap='rainbow')
plt.show()
Figure 2D, not allowed to embed figures yet...图 2D,还不允许嵌入图形...
At the top, the scattered data, at the bottom the binned data.顶部是分散的数据,底部是分箱数据。
Let's continue with the 3D case.让我们继续 3D 的案例。 Again, let's create some random scattered data that varies in time:
同样,让我们创建一些随时间变化的随机分散数据:
def create3Ddata():
''' Make random 3D data '''
N = 8000
lat = 10 * np.random.rand(N) + 40
lon = 25 * np.random.rand(N) - 80
t = 10 * np.random.rand(N)
# Linearly changes sign of the cos+sin wavefield
z = (t/5 - 1) * (np.sin(2*2*np.pi*lat/180.0*np.pi)
+ np.cos(4*2*np.pi*lon/180.0*np.pi))
return lat, lon, t, z
# Create Data
lat, lon, t, z = create3Ddata()
Now, instead of using histogram2d
here, we will use histogramdd
, which is just the N-dimensional version of the same function.现在,我们不使用
histogram2d
,而是使用histogramdd
,它只是同一个 function 的 N 维版本。
def make3dhist(lon, lat, t, z, latbins, lonbins, tbins):
'''Takes the inputs and creates 3D histogram just as the 2D histogram
function'''
zz, _ = np.histogramdd(
np.vstack((lon, lat, t)).T,
bins=(lonbins, latbins, tbins),
weights=z, normed=False)
counts, _ = np.histogramdd(
np.vstack((lon, lat, t)).T,
bins=(lonbins, latbins, tbins))
# Workaround for zero count values tto not get an error.
# Where counts == 0, zi = 0, else zi = zz/counts
zi = np.zeros_like(zz)
zi[counts.astype(bool)] = zz[counts.astype(bool)] / \
counts[counts.astype(bool)]
zi = np.ma.masked_equal(zi, 0)
return lonbins, latbins, tbins, zi
# Create bins
latbins = np.linspace(np.min(lat), np.max(lat), 75)
lonbins = np.linspace(np.min(lon), np.max(lon), 75)
tbins = np.linspace(np.min(t), np.max(t), 5)
# Bin the data
_, _, _, zi = make3dhist(lon, lat, t, z, latbins, lonbins, tbins)
Finally, we plot both the scattered data and the binned data side by side in respective time bins.最后,我们 plot 将分散数据和分箱数据并排放置在各自的时间箱中。 Note the normalization that is used to make sure variations in time are easily observed.
请注意用于确保容易观察到时间变化的归一化。 Note that there are three loops (I could have put them in a single one, but this is nicer for readability).
请注意,这里有三个循环(我可以将它们放在一个循环中,但这对于可读性来说更好)。
# Normalize the colors so that variations in time are easily seen
norm = Normalize(vmin=-1.0, vmax=1.0)
fig = plt.figure(figsize=(12, 10))
# The scattered data in time bins
# Left column
for i in range(4):
ax = plt.subplot(4, 3, 3*i + 1, projection=PlateCarree())
plotmap()
# Find points in time bins
pos = np.where((tbins[i] < t) & (t < tbins[i+1]))
# Plot scatter points
plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
plt.scatter(lon[pos], lat[pos], c=z[pos], s=7, cmap='rainbow', norm=norm)
plt.colorbar(orientation='horizontal', pad=0.0)
# Center column
for i in range(4):
ax = plt.subplot(4, 3, 3*i + 2, projection=PlateCarree())
plotmap()
plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
# Find data points in time bins
pos = np.where((tbins[i] < t) & (t <= tbins[i+1]))
# Bin the data for each time bin separately
_, _, zt = make2dhist(lon[pos], lat[pos], z[pos], latbins, lonbins)
plt.pcolormesh(
lonbins, latbins, zt.T, shading='auto', transform=PlateCarree(),
cmap='rainbow', norm=norm)
plt.colorbar(orientation='horizontal', pad=0.0)
# Right column
for i in range(4):
ax = plt.subplot(4, 3, 3*i + 3, projection=PlateCarree())
plotmap()
plt.title(f'{tbins[i]:0.2f} < t < {tbins[i+1]:0.2f}')
plt.pcolormesh(
lonbins, latbins, zi[:, :, i].T, shading='auto', transform=PlateCarree(),
cmap='rainbow', norm=norm)
plt.colorbar(orientation='horizontal', pad=0.0)
plt.show()
Figure 3D, not allowed to embed figures yet...图3D,还不允许嵌图...
In the left column, the scattered, random, geospatial data, where the titles indicate the bins.在左列中,分散的、随机的地理空间数据,其中标题表示 bin。 In the center column, the 2D histograms using "by hand" time-binned data.
在中心列中,二维直方图使用“手动”时间分级数据。 In the right column, the slices that were binned using a 3D histogram.
在右列中,切片使用 3D 直方图进行分箱。 As expected center and right columns show the exact same thing.
正如预期的那样,中间和右列显示完全相同的内容。
Hope this solves your problem.希望这能解决您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.