Effect of rounding on coordinates with Xarray#

Claire Carouge, CLEX CMS

This notebook demonstrates how to work with two datasets that should have the same dimensional coordinates, but for whatever reason don’t.

When doing operations involving the 2 DataArrays, Xarray will either return the results on the common locations only or return an indexing error. The solution is then to replace one set of coordinates with the other set.

The problem#

To demonstrate, we create two DataArrays that have a slightly different coordinate array. Note how the difference is so small that Jupyter’s default presentation obscures it by rounding.

import xarray as xr
import numpy as np
arr1 = xr.DataArray([1,2,3],dims="x",coords={"x":[1.1,2.1,3.1]})
arr2 = xr.DataArray([1,2,3],dims="x",coords={"x":[1.1,2.099999999,3.1]})
display(arr1, arr2)
<xarray.DataArray (x: 3)>
array([1, 2, 3])
Coordinates:
  * x        (x) float64 1.1 2.1 3.1
<xarray.DataArray (x: 3)>
array([1, 2, 3])
Coordinates:
  * x        (x) float64 1.1 2.1 3.1

You can specify a precision for numpy to suppress the rounding. We are using np.printoptions() here to easily swap between the default rounding and a specific precision. You can use np.set_printoptions() to set a specific precision for your whole notebook. Note that Xarray may override the precision set, make sure you print the .data array if you need to double-check the precision in the print:

with np.printoptions(precision=9):
    print(arr2.x.data)
[1.1         2.099999999 3.1        ]

When we make element-wise calculations with these two DataArrays, we would expect that the resulting array would have the same length as the two input arrays, i.e. arr1 - arr2 == [0, 0, 0].

But instead we only get a length 2 array, containing only the points where the coordinates were truly identical.

arr1-arr2
<xarray.DataArray (x: 2)>
array([0, 0])
Coordinates:
  * x        (x) float64 1.1 3.1

There was no error, but xarray automatic broadcasting has silently ignored the second value where the coordinates are different for the 2 arrays.

Solutions#

Re-assign coordinates#

Usually, these issues arise when the coordinates should be identical, but aren’t. In that case, the easiest, straight-forward method is simply assign the coordinate array from one to the other:

arr2=arr2.assign_coords({"x":arr1.x})
with np.printoptions(precision=9):
    print(arr2.x.data)
[1.1 2.1 3.1]
arr1-arr2
<xarray.DataArray (x: 3)>
array([0, 0, 0])
Coordinates:
  * x        (x) float64 1.1 2.1 3.1

Interpolate the data#

If the coordinates are so different that this would lead to unacceptable errors, we can also use xarray’s interpolation routine to force the arrays on the same grid.

Let’s first re-assign the coordinates from the beginning.

arr2=arr2.assign_coords({"x":[1.1,2.099999999,3.1]})
with np.printoptions(precision=9):
    print(arr2.x.data)
[1.1         2.099999999 3.1        ]

Now we interpolate using the x-coordinate from arr1 as the target for the interpolation.

arr2 = arr2.interp(x=arr1.x)
display(arr2)
<xarray.DataArray (x: 3)>
array([1., 2., 3.])
Coordinates:
  * x        (x) float64 1.1 2.1 3.1

The two arrays now have the same coordinate array, but the values have ever so slightly changed. In this case, the rounding obscures this fact again, but the difference is no longer all-zeros:

arr1 - arr2
<xarray.DataArray (x: 3)>
array([ 0.00000000e+00, -1.00000008e-09,  0.00000000e+00])
Coordinates:
  * x        (x) float64 1.1 2.1 3.1

Why is Xarray doing this#

Although this behaviour may seem annoying in this example, it is something that is very useful for Xarray. For example, if you have a field defined on a subset of a grid (e.g surface field) and one on the full grid (e.g on all model levels), you can get the difference, sum, etc. of the two fields directly. You don’t need to create a subset of the full grid field. Or if you have fields covering different but overlapping time periods, to subset the fields to the common period, you can do:

    array1 = array1.sel(time=array2.time)
    array2 = array2.sel(time=array1.time)

That is true as long as the subset grid has a coordinate along the dimensions that are subsetted even if the dimensions do not appear in the subsetted field.