Getting Seasonal Means for the season December-January-February

Getting Seasonal Means for the season December-January-February#

We’ve been getting occasionally a few questions about seasonal reduction operations (for example means).

Say you have a dataset that goes on for several years, and you want to compare the seaonal means of each year against each other.

For three of the four seasons this isn’t a big deal, you cut the dataset into yearly chunks, then mean over the season.

But for the fourth season (DJF) this doesn’t work: It bunches together the January, February, and December of that same year, when in fact you want the December from the year before.

Let’s have a look at an overly simplified example.

Our example dataset#

We only use a single dimension of data, and we only use monthly values. What’s more, the values of our “dataset” are strictly ascending.

This keeps the dataset small and we can easily see what’s going on.

The code works just the same way with far more complex datasets

import xarray as xr
import numpy as np
import pandas as pd

time = xr.DataArray(
    pd.date_range(start='2000-01-01', end='2004-12-31', freq='MS'), 
    dims=('time',)
)

data = xr.DataArray(
    np.arange(len(time)), 
    coords={'time': time},
)
data

<xarray.DataArray (time: 60)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2004-12-01

xarray.DataArray

time: 60

0 1 2 3 4 5 6 7 8 9 10 11 12 ... 48 49 50 51 52 53 54 55 56 57 58 59

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59])

Coordinates: (1)

time

(time)

datetime64[ns]

2000-01-01 ... 2004-12-01

array(['2000-01-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000',
       '2000-03-01T00:00:00.000000000', '2000-04-01T00:00:00.000000000',
       '2000-05-01T00:00:00.000000000', '2000-06-01T00:00:00.000000000',
       '2000-07-01T00:00:00.000000000', '2000-08-01T00:00:00.000000000',
       '2000-09-01T00:00:00.000000000', '2000-10-01T00:00:00.000000000',
       '2000-11-01T00:00:00.000000000', '2000-12-01T00:00:00.000000000',
       '2001-01-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
       '2001-03-01T00:00:00.000000000', '2001-04-01T00:00:00.000000000',
       '2001-05-01T00:00:00.000000000', '2001-06-01T00:00:00.000000000',
       '2001-07-01T00:00:00.000000000', '2001-08-01T00:00:00.000000000',
       '2001-09-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000',
       '2001-11-01T00:00:00.000000000', '2001-12-01T00:00:00.000000000',
       '2002-01-01T00:00:00.000000000', '2002-02-01T00:00:00.000000000',
       '2002-03-01T00:00:00.000000000', '2002-04-01T00:00:00.000000000',
       '2002-05-01T00:00:00.000000000', '2002-06-01T00:00:00.000000000',
       '2002-07-01T00:00:00.000000000', '2002-08-01T00:00:00.000000000',
       '2002-09-01T00:00:00.000000000', '2002-10-01T00:00:00.000000000',
       '2002-11-01T00:00:00.000000000', '2002-12-01T00:00:00.000000000',
       '2003-01-01T00:00:00.000000000', '2003-02-01T00:00:00.000000000',
       '2003-03-01T00:00:00.000000000', '2003-04-01T00:00:00.000000000',
       '2003-05-01T00:00:00.000000000', '2003-06-01T00:00:00.000000000',
       '2003-07-01T00:00:00.000000000', '2003-08-01T00:00:00.000000000',
       '2003-09-01T00:00:00.000000000', '2003-10-01T00:00:00.000000000',
       '2003-11-01T00:00:00.000000000', '2003-12-01T00:00:00.000000000',
       '2004-01-01T00:00:00.000000000', '2004-02-01T00:00:00.000000000',
       '2004-03-01T00:00:00.000000000', '2004-04-01T00:00:00.000000000',
       '2004-05-01T00:00:00.000000000', '2004-06-01T00:00:00.000000000',
       '2004-07-01T00:00:00.000000000', '2004-08-01T00:00:00.000000000',
       '2004-09-01T00:00:00.000000000', '2004-10-01T00:00:00.000000000',
       '2004-11-01T00:00:00.000000000', '2004-12-01T00:00:00.000000000'],
      dtype='datetime64[ns]')

Indexes: (1)

time

PandasIndex

PandasIndex(DatetimeIndex(['2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01',
               '2000-05-01', '2000-06-01', '2000-07-01', '2000-08-01',
               '2000-09-01', '2000-10-01', '2000-11-01', '2000-12-01',
               '2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01',
               '2001-05-01', '2001-06-01', '2001-07-01', '2001-08-01',
               '2001-09-01', '2001-10-01', '2001-11-01', '2001-12-01',
               '2002-01-01', '2002-02-01', '2002-03-01', '2002-04-01',
               '2002-05-01', '2002-06-01', '2002-07-01', '2002-08-01',
               '2002-09-01', '2002-10-01', '2002-11-01', '2002-12-01',
               '2003-01-01', '2003-02-01', '2003-03-01', '2003-04-01',
               '2003-05-01', '2003-06-01', '2003-07-01', '2003-08-01',
               '2003-09-01', '2003-10-01', '2003-11-01', '2003-12-01',
               '2004-01-01', '2004-02-01', '2004-03-01', '2004-04-01',
               '2004-05-01', '2004-06-01', '2004-07-01', '2004-08-01',
               '2004-09-01', '2004-10-01', '2004-11-01', '2004-12-01'],
              dtype='datetime64[ns]', name='time', freq='MS'))

Attributes: (0)

Initial idea: groupby#

The .groupby method can combine values according to a specific option, and the season is one of these. Let’s try

data.groupby(data.time.dt.season).mean(dim='time')

<xarray.DataArray (season: 4)>
array([28., 30., 27., 33.])
Coordinates:
  * season   (season) object 'DJF' 'JJA' 'MAM' 'SON'

Now this isn’t what we wanted. It has basically averaged over all Decembers, Januarys, and Februaries and stored this as a single value for “DJF”, and the same for “MAM”, “JJA”, and “SON”.

Note that since this is a one-dimensional array, we could have omitted the dim='time' option from the call to mean() but you will probably also have other dimensions, and this way we can guarantee that it will only collapse the time dimension.

Getting Seasonal Means for the season December-January-February

Contents

Getting Seasonal Means for the season December-January-February#

Our example dataset#

Initial idea: groupby#

Next idea: Resample#

Extracting the season we’re interested in.#

More flexibility using time masks#