How to plot the first day start of rainfall season with specific conditions with netcdf data on certain area?

Question

I have NetCDF daily precipitation data with dimensions: time: 153 (I cropped the NC file so it has 1st August as the first date), longitude: 401, latitude: 121.

I want to calculate and plot in certain areas the first day of the rainy season with this condition: The onset date of the rainy season is defined as the first 5 consecutive rainy days receiving a minimum of 40 mm which is not followed by 10 consecutive dry days, receiving at least 5 mm in the 30 days following the onset date. The calculation starts after August 1st.

I have tried to plot it spatially but I guess it gonna takes so much time for just one-year of data since I have to plot 10 years of data. So, I am looking for a more convenient way to do that while I'm currently doing with some codes for just one point (I want the date plotted spatially for a certain area) like below:

import pandas as pd
import xarray as xr
import numpy as np
file='CMA.nc'

data = xr.open_dataset(file)
precip = data['tp']

#Single point 
point = precip.sel(lon=106.11, lat=-6.11, method='nearest')
point.plot()

def wet_onset_date(data):
array = data.values

count1 = 0 
count2 = 5 
wet_onset = []
onset_date = []

while count2 <= array.size:
    wet_onset.append(array[count1:count2].sum())
    tonset_date.append(count1)
    count1 += 1
    count2 += 1
    
'''dry spell'''
count3 = 5
count4 = 5+30
thirty = []
dry_spell = []

while count4 <= array.size:
    thirty.append(array[count3:count4])
    
    for each_30 in thirty:
        count5 = 0
        count6 = 11
        weekly_sum = []
        while count6 <= thirty[0].size:
            weekly_sum.append(each_30[count5:count6].sum())
            count5 += 1
            count6 += 1
    if np.min(weekly_sum) <= 5:
        dry_spell.append(True)
    else:
        dry_spell.append(False)
        
    count3 += 1
    count4 += 1 
    
wet_onset_final = wet_onset[:len(dryspell)]
onset_final_date = onset_date[:len(dry_spell)]

for rain, not_dry, date in zip(wet_onset_final, dry_spell, onset_final_date):
    if (rain >= 40) and (not_dry == false):
        target_date = data.isel(time=date).time.values
        return target_date
        break
on = wet_onset_date(point)
print(on)

>> 2017-11-27T00:00:00.000000000

Answer 1

Let's start with a Minimal Reproducible Example (MRE) for this problem. You need a dataset with a precipitation array with at least a full year of daily time series data, along with a couple other dimensions:

import xarray as xr, pandas as pd, numpy as np

x = np.arange(-110.5, 100)
y = np.arange(30.5, 40)
time = pd.date_range('2020-01-01', '2022-12-31', freq='D')

# generate random precip-ish data
random_lognorm = np.exp(np.random.random(size=(len(time), len(y), len(x)))) * 200

# random seasonal-ish mask
raining = (
    (time.dayofyear.values.reshape(-1, 1, 1)
    * np.random.random(size=random_lognorm.shape)) > 40
)

# finally, precip is the rain array * the "is raining" array
pr = random_lognorm * raining

# now we can construct an xarray Dataset with this data to form our MRE
ds = xr.Dataset(
    {'pr': (('time', 'lat', 'lon'), pr)},
    coords={'lat': y, 'lon': x, 'time': time},
)

Here's what that looks like:

In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions:  (time: 1096, lat: 10, lon: 211)
Coordinates:
  * lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
  * lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31
Data variables:
    pr       (time, lat, lon) float64 0.0 0.0 0.0 0.0 ... 413.6 308.0 386.9

Similar to performance in numpy and pandas, to efficiently work with large arrays in xarray objects, it's best to figure out how to use the array operations rather than looping through the elements. This is definitely true of windowed/rolling operations. Check out the guide to Rolling Windowed Operations in xarray's User Guide - it's a helpful introduction to this topic.

I don't totally understand all the conditions you're trying to apply here, but I can throw a couple things into a quick demo that is hopefully helpful.

One really helpful feature in xarray is the rolling module's construct method. This method of DataArrayRolling and DatasetRolling objects returns a restructured DataArray/Dataset (respectively) with a rolling window into the original array. So below, I specify the rolling window time=30 . The construct method gives a reshaped "view" into the array, which is a memory-efficient way of reshaping the data, which provides a new dimension (I name it "window" below) along which you can work with the rolled data.

In [8]: rolled = ds.pr.rolling(time=30, min_periods=30).construct('window')

In [9]: rolled
Out[9]:
<xarray.DataArray 'pr' (time: 1096, lat: 10, lon: 211, window: 30)>
array([[[[         nan,          nan,          nan, ...,          nan,
                   nan,   0.        ],
         [         nan,          nan,          nan, ...,          nan,
                   nan,   0.        ],
         [         nan,          nan,          nan, ...,          nan,
                   nan,   0.        ],
...
         ...,
         [443.96641513, 524.82969347, 419.95639311, ...,   0.        ,
          500.87393858, 413.55965161],
         [352.36603332, 427.1653476 , 236.46898157, ..., 469.71452213,
          235.31558598, 308.02273055],
         [396.360887  , 520.49089188, 242.73958665, ..., 234.32972887,
          252.48534392, 386.93237596]]]])
Coordinates:
  * lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
  * lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31
Dimensions without coordinates: window

We can work with this window dimension as if it is each group of 30 days within our dataset. So now we can define an arbitrarily complex function to reduce our window dimension:

def complex_condition(rolled):
    # first 5 days are > 40mm
    first_5d_over_40mm = (rolled.isel(window=slice(None, 5)) > 40).all(dim='window')
    # first 30 days are > 5 mm
    all_30d_over_5mm = (rolled > 5).all(dim='window')
    # result is True when both conditions are met
    return first_5d_over_40mm & all_30d_over_5mm

This can simply be applied to the rolled dataset:

In [11]: meets_criteria = complex_condition(rolled)

In [12]: meets_criteria
Out[12]:
<xarray.DataArray 'pr' (time: 1096, lat: 10, lon: 211)>
array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
...
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False]]])
Coordinates:
  * lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
  * lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2022-12-31

Now, we can find the first index which meets these conditions with idxmax (making sure to mask out any cells which never meet the condition):

In [13]: meets_criteria.idxmax(dim='time').where(meets_criteria.any(dim='time'))
Out[13]:
<xarray.DataArray 'time' (lat: 10, lon: 211)>
array([[                          'NaT',                           'NaT',
                                  'NaT', ...,
                                  'NaT',                           'NaT',
        '2022-12-02T00:00:00.000000000'],
       ['2020-12-14T00:00:00.000000000',                           'NaT',
        '2020-12-20T00:00:00.000000000', ...,
                                  'NaT', '2021-09-22T00:00:00.000000000',
        '2021-10-20T00:00:00.000000000'],
       ['2021-12-24T00:00:00.000000000',                           'NaT',
        '2021-12-26T00:00:00.000000000', ...,
                                  'NaT', '2022-12-18T00:00:00.000000000',
                                  'NaT'],
       ...,
       ['2021-08-21T00:00:00.000000000',                           'NaT',
                                  'NaT', ...,
        '2021-08-06T00:00:00.000000000', '2020-11-07T00:00:00.000000000',
        '2022-10-04T00:00:00.000000000'],
       [                          'NaT', '2020-12-11T00:00:00.000000000',
                                  'NaT', ...,
        '2020-12-18T00:00:00.000000000', '2022-10-31T00:00:00.000000000',
                                  'NaT'],
       ['2021-09-28T00:00:00.000000000', '2020-11-18T00:00:00.000000000',
                                  'NaT', ...,
        '2021-10-14T00:00:00.000000000',                           'NaT',
                                  'NaT']], dtype='datetime64[ns]')
Coordinates:
  * lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
  * lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5

One thing to note is that the rolling window by default will return the indices of the end of the window. If you want the start of the window, you can reindex the meets_criteria results with da.shift .

There are a number of other things you mention in the question, but that's a lot of scope for a single question. Hopefully this points you in the right direction!

Also, just a heads up - when you plot a map of times, you're going to get the numeric representation of each datetime object, which is in units of nanoseconds since 1970 , so the result is going to be a ridiculously large number. If you like, you could get the day of year using each datetime object's dayofyear attribute, eg:

In [14]: (
    ...:     meets_criteria
    ...:     .groupby('time.year')
    ...:     .apply(lambda x: x.idxmax(dim='time').dt.dayofyear.where(x.any(dim='time')))
    ...: )
Out[14]:
<xarray.DataArray 'dayofyear' (year: 3, lat: 10, lon: 211)>
array([[[ nan,  nan,  nan, ...,  nan,  nan,  nan],
        [349.,  nan, 355., ...,  nan,  nan,  nan],
        [ nan,  nan,  nan, ...,  nan,  nan,  nan],
        ...,
        [ nan,  nan,  nan, ...,  nan, 312.,  nan],
        [ nan, 346.,  nan, ..., 353.,  nan,  nan],
        [ nan, 323.,  nan, ...,  nan,  nan,  nan]],

       [[ nan,  nan,  nan, ...,  nan,  nan,  nan],
        [ nan,  nan,  nan, ...,  nan, 265., 293.],
        [358.,  nan, 360., ...,  nan,  nan,  nan],
        ...,
        [233.,  nan,  nan, ..., 218., 278.,  nan],
        [ nan,  nan,  nan, ...,  nan,  nan,  nan],
        [271.,  nan,  nan, ..., 287.,  nan,  nan]],

       [[ nan,  nan,  nan, ...,  nan,  nan, 336.],
        [ nan,  nan,  nan, ...,  nan,  nan,  nan],
        [ nan,  nan, 305., ...,  nan, 352.,  nan],
        ...,
        [217.,  nan,  nan, ...,  nan,  nan, 277.],
        [ nan, 357.,  nan, ...,  nan, 304.,  nan],
        [267., 314.,  nan, ...,  nan,  nan,  nan]]])
Coordinates:
  * lat      (lat) float64 30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5
  * lon      (lon) float64 -110.5 -109.5 -108.5 -107.5 ... 96.5 97.5 98.5 99.5
  * year     (year) int64 2020 2021 2022

How to plot the first day start of rainfall season with specific conditions with netcdf data on certain area?

Question

1 answers

solution1
2 2022-06-05 16:56:34

How to plot the first day start of rainfall season with specific conditions with netcdf data on certain area?

Question

1 answers

solution1 2 2022-06-05 16:56:34

solution1
2 2022-06-05 16:56:34