Basic event statistics
Contents
Basic event statistics¶
Scott Wales, CLEX CMS¶
We have a set of timeseries from multiple models, and we’d like to compute some basic statistics on where the values exceed some threshold
import numpy
import xarray
import pandas
Sample data¶
Some random data as a sample, we have dimensions of ‘model’ and ‘time’
da = xarray.DataArray(numpy.random.random((5,10))-0.5, dims=['model','time'])
da['time'] = pandas.date_range('20010101', periods=da.sizes['time'], freq='MS')
da['model'] = ['A','B','C','D','E']
da
<xarray.DataArray (model: 5, time: 10)> array([[-0.32756177, -0.26286746, 0.40030123, 0.35854755, 0.43113588, 0.19164228, -0.24890014, -0.16822426, 0.0288118 , 0.30851645], [ 0.02352896, 0.01700063, 0.40107743, 0.09688021, -0.34731748, 0.33646959, -0.22218415, 0.00075536, 0.41923523, 0.46450296], [-0.36606797, 0.13691949, -0.24407689, 0.26593951, -0.36044558, -0.09755949, 0.23047747, -0.30123992, 0.49773007, -0.48146704], [-0.28529412, -0.4137749 , -0.49678032, -0.20963922, -0.37296851, 0.32198888, -0.48008865, -0.35185213, -0.19133728, 0.48969387], [ 0.35537076, -0.34188436, -0.46208259, 0.41417063, -0.07029039, -0.39533084, 0.21379516, -0.18512164, -0.43433092, -0.29407475]]) Coordinates: * time (time) datetime64[ns] 2001-01-01 2001-02-01 ... 2001-10-01 * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- time: 10
- -0.3276 -0.2629 0.4003 0.3585 ... 0.2138 -0.1851 -0.4343 -0.2941
array([[-0.32756177, -0.26286746, 0.40030123, 0.35854755, 0.43113588, 0.19164228, -0.24890014, -0.16822426, 0.0288118 , 0.30851645], [ 0.02352896, 0.01700063, 0.40107743, 0.09688021, -0.34731748, 0.33646959, -0.22218415, 0.00075536, 0.41923523, 0.46450296], [-0.36606797, 0.13691949, -0.24407689, 0.26593951, -0.36044558, -0.09755949, 0.23047747, -0.30123992, 0.49773007, -0.48146704], [-0.28529412, -0.4137749 , -0.49678032, -0.20963922, -0.37296851, 0.32198888, -0.48008865, -0.35185213, -0.19133728, 0.48969387], [ 0.35537076, -0.34188436, -0.46208259, 0.41417063, -0.07029039, -0.39533084, 0.21379516, -0.18512164, -0.43433092, -0.29407475]])
- time(time)datetime64[ns]2001-01-01 ... 2001-10-01
array(['2001-01-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-04-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-06-01T00:00:00.000000000', '2001-07-01T00:00:00.000000000', '2001-08-01T00:00:00.000000000', '2001-09-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000'], dtype='datetime64[ns]')
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
Count below threshold for each model¶
First select only the values below our threshold, then get the count of unmasked values along the time dimension
da.where(da < 0).count('time')
<xarray.DataArray (model: 5)> array([4, 2, 6, 8, 7]) Coordinates: * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- 4 2 6 8 7
array([4, 2, 6, 8, 7])
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
Minimum values for each model¶
The same idea, this time using ‘min’ instead of ‘count’
da.where(da < 0).min('time')
<xarray.DataArray (model: 5)> array([-0.32756177, -0.34731748, -0.48146704, -0.49678032, -0.46208259]) Coordinates: * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- -0.3276 -0.3473 -0.4815 -0.4968 -0.4621
array([-0.32756177, -0.34731748, -0.48146704, -0.49678032, -0.46208259])
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
Date of minimum value for each model¶
Start off by finding the index of the minimum value - argmin.
min_index = da.where(da < 0).argmin('time')
min_index
<xarray.DataArray (model: 5)> array([0, 4, 9, 2, 2]) Coordinates: * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- 0 4 9 2 2
array([0, 4, 9, 2, 2])
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
These indices can be converted to time values all at once by using min_index
as an array index
da.time[min_index]
<xarray.DataArray 'time' (model: 5)> array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: time (model) datetime64[ns] 2001-01-01 2001-05-01 ... 2001-03-01 * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- 2001-01-01 2001-05-01 2001-10-01 2001-03-01 2001-03-01
array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]')
- time(model)datetime64[ns]2001-01-01 ... 2001-03-01
array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]')
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
There’s a “ghost” time axis in the result which might cause problems, let’s get rid of it. Our returned values aren’t time dependent
da.time[min_index].drop('time')
<xarray.DataArray 'time' (model: 5)> array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * model (model) <U1 'A' 'B' 'C' 'D' 'E'
- model: 5
- 2001-01-01 2001-05-01 2001-10-01 2001-03-01 2001-03-01
array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]')
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
Putting it together¶
It could be helpful to put all of the results into a single dataset
event_stats = xarray.Dataset({
'count': da.where(da < 0).count('time'),
'min': da.where(da < 0).min('time'),
'min_date': da.time[min_index].drop('time'),
})
event_stats
<xarray.Dataset> Dimensions: (model: 5) Coordinates: * model (model) <U1 'A' 'B' 'C' 'D' 'E' Data variables: count (model) int64 4 2 6 8 7 min (model) float64 -0.3276 -0.3473 -0.4815 -0.4968 -0.4621 min_date (model) datetime64[ns] 2001-01-01 2001-05-01 ... 2001-03-01
- model: 5
- model(model)<U1'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
- count(model)int644 2 6 8 7
array([4, 2, 6, 8, 7])
- min(model)float64-0.3276 -0.3473 ... -0.4968 -0.4621
array([-0.32756177, -0.34731748, -0.48146704, -0.49678032, -0.46208259])
- min_date(model)datetime64[ns]2001-01-01 ... 2001-03-01
array(['2001-01-01T00:00:00.000000000', '2001-05-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000', '2001-03-01T00:00:00.000000000'], dtype='datetime64[ns]')
That way you can e.g. get all the values for a specific model
event_stats.sel(model='A')
<xarray.Dataset> Dimensions: () Coordinates: model <U1 'A' Data variables: count int64 4 min float64 -0.3276 min_date datetime64[ns] 2001-01-01
- model()<U1'A'
array('A', dtype='<U1')
- count()int644
array(4)
- min()float64-0.3276
array(-0.32756177)
- min_date()datetime64[ns]2001-01-01
array('2001-01-01T00:00:00.000000000', dtype='datetime64[ns]')
Climtas event detection¶
The climtas library has some functions for event detection - it will find runs where a condition is true for multiple times, then let you run a function to calculate statistics for each event
import climtas
events = climtas.event.find_events(da < 0, min_duration=2)
events
time | model | event_duration | |
---|---|---|---|
0 | 0 | 0 | 2 |
1 | 1 | 4 | 2 |
2 | 0 | 3 | 5 |
3 | 4 | 2 | 2 |
4 | 4 | 4 | 2 |
5 | 6 | 0 | 2 |
6 | 6 | 3 | 3 |
7 | 7 | 4 | 3 |
coords = climtas.event.event_coords(da, events)
def stat_func(event_data):
return {'min': event_data.min().values,
'min_date': event_data.time[event_data.argmin()].values}
minimum = climtas.event.map_events(da, events, stat_func)
coords.join(minimum)
model | time | event_duration | min | min_date | |
---|---|---|---|---|---|
0 | A | 2001-01-01 | 31 days | -0.32756177046349233 | 2001-01-01 |
1 | E | 2001-02-01 | 28 days | -0.4620825909697166 | 2001-03-01 |
2 | D | 2001-01-01 | 120 days | -0.49678032011641426 | 2001-03-01 |
3 | C | 2001-05-01 | 31 days | -0.36044557563610247 | 2001-05-01 |
4 | E | 2001-05-01 | 31 days | -0.395330843938117 | 2001-06-01 |
5 | A | 2001-07-01 | 31 days | -0.2489001360755888 | 2001-07-01 |
6 | D | 2001-07-01 | 62 days | -0.48008864559888853 | 2001-07-01 |
7 | E | 2001-08-01 | 61 days | -0.4343309190867375 | 2001-09-01 |