Basic event statistics

Basic event statistics#

Scott Wales, CLEX CMS

We have a set of timeseries from multiple models, and we’d like to compute some basic statistics on where the values exceed some threshold

import numpy
import xarray
import pandas

Sample data#

Some random data as a sample, we have dimensions of ‘model’ and ‘time’

da = xarray.DataArray(numpy.random.random((5,10))-0.5, dims=['model','time'])
da['time'] = pandas.date_range('20010101', periods=da.sizes['time'], freq='MS')
da['model'] = ['A','B','C','D','E']
da

<xarray.DataArray (model: 5, time: 10)>
array([[-0.32756177, -0.26286746,  0.40030123,  0.35854755,  0.43113588,
         0.19164228, -0.24890014, -0.16822426,  0.0288118 ,  0.30851645],
       [ 0.02352896,  0.01700063,  0.40107743,  0.09688021, -0.34731748,
         0.33646959, -0.22218415,  0.00075536,  0.41923523,  0.46450296],
       [-0.36606797,  0.13691949, -0.24407689,  0.26593951, -0.36044558,
        -0.09755949,  0.23047747, -0.30123992,  0.49773007, -0.48146704],
       [-0.28529412, -0.4137749 , -0.49678032, -0.20963922, -0.37296851,
         0.32198888, -0.48008865, -0.35185213, -0.19133728,  0.48969387],
       [ 0.35537076, -0.34188436, -0.46208259,  0.41417063, -0.07029039,
        -0.39533084,  0.21379516, -0.18512164, -0.43433092, -0.29407475]])
Coordinates:
  * time     (time) datetime64[ns] 2001-01-01 2001-02-01 ... 2001-10-01
  * model    (model) <U1 'A' 'B' 'C' 'D' 'E'

Count below threshold for each model#

First select only the values below our threshold, then get the count of unmasked values along the time dimension

da.where(da < 0).count('time')

<xarray.DataArray (model: 5)>
array([4, 2, 6, 8, 7])
Coordinates:
  * model    (model) <U1 'A' 'B' 'C' 'D' 'E'

Minimum values for each model#

The same idea, this time using ‘min’ instead of ‘count’

da.where(da < 0).min('time')

<xarray.DataArray (model: 5)>
array([-0.32756177, -0.34731748, -0.48146704, -0.49678032, -0.46208259])
Coordinates:
  * model    (model) <U1 'A' 'B' 'C' 'D' 'E'

Climtas event detection#

The climtas library has some functions for event detection - it will find runs where a condition is true for multiple times, then let you run a function to calculate statistics for each event

import climtas

events = climtas.event.find_events(da < 0, min_duration=2)
events

	time	model	event_duration
0	0	0	2
1	1	4	2
2	0	3	5
3	4	2	2
4	4	4	2
5	6	0	2
6	6	3	3
7	7	4	3

coords = climtas.event.event_coords(da, events)

def stat_func(event_data):
    return {'min': event_data.min().values,
            'min_date': event_data.time[event_data.argmin()].values}
    
minimum = climtas.event.map_events(da, events, stat_func)

coords.join(minimum)

	model	time	event_duration	min	min_date
0	A	2001-01-01	31 days	-0.32756177046349233	2001-01-01
1	E	2001-02-01	28 days	-0.4620825909697166	2001-03-01
2	D	2001-01-01	120 days	-0.49678032011641426	2001-03-01
3	C	2001-05-01	31 days	-0.36044557563610247	2001-05-01
4	E	2001-05-01	31 days	-0.395330843938117	2001-06-01
5	A	2001-07-01	31 days	-0.2489001360755888	2001-07-01
6	D	2001-07-01	62 days	-0.48008864559888853	2001-07-01
7	E	2001-08-01	61 days	-0.4343309190867375	2001-09-01

Basic event statistics

Contents

Basic event statistics#

Sample data#

Count below threshold for each model#

Minimum values for each model#

Date of minimum value for each model#

Putting it together#

Climtas event detection#