# Merge arrays with missing data#

**Claire Carouge, CLEX CMS**

Let’s say you have 2 datasets coming from different sources but representing the same quantity. You’d like to merge those datasets into a single one via a mean, unfortunately both datasets have missing data at different times and places. Accordingly, we want the merged dataset to follow these rules:

if both original datasets have data, we take the mean of both

if one dataset only has data, we take this data

if the data is missing in both original datasets, we keep a missing data

The strategy using xarray is to open each dataset in a DataArray, concatenate both arrays on a new dimension and then average along this dimension.

```
import xarray as xr
import numpy as np
```

First define 2 arrays of same dimensions with missing data at different places

```
aa = xr.DataArray([[0,1,2],[3,4,np.nan]],dims=('x','y'))
bb = xr.DataArray([[5,np.nan,6],[np.nan,7,np.nan]],dims=('x','y'))
```

```
aa
```

```
<xarray.DataArray (x: 2, y: 3)>
array([[ 0., 1., 2.],
[ 3., 4., nan]])
Dimensions without coordinates: x, y
```

```
bb
```

```
<xarray.DataArray (x: 2, y: 3)>
array([[ 5., nan, 6.],
[nan, 7., nan]])
Dimensions without coordinates: x, y
```

Now, if we simply sum the arrays together, we do not get what we want. The missing value take precedence. That is, if any of the array has a missing value, the sum is missing. So summing and dividing by the number of arrays won’t work

```
aa+bb
```

```
<xarray.DataArray (x: 2, y: 3)>
array([[ 5., nan, 8.],
[nan, 11., nan]])
Dimensions without coordinates: x, y
```

At the opposite, if we can do a mean, it will work as then the missing value is ignored (mean(1,nan) = 1). For this, we need to “merge” the arrays into a single array. For this we’ll use the `xarray.concat()`

method.

Concatenate the arrays along a new dimension we’ll call z

```
cc = xr.concat((aa,bb),'z')
```

```
cc
```

```
<xarray.DataArray (z: 2, x: 2, y: 3)>
array([[[ 0., 1., 2.],
[ 3., 4., nan]],
[[ 5., nan, 6.],
[nan, 7., nan]]])
Dimensions without coordinates: z, x, y
```

As you see above the concatenation allows us to have the 2 arrays aligned together in a new array. Now we take advantage of the fact xarray handles missing data correctly. That is, a mean will not count missing data.

```
cc.mean(dim='z')
```

```
<xarray.DataArray (x: 2, y: 3)>
array([[2.5, 1. , 4. ],
[3. , 5.5, nan]])
Dimensions without coordinates: x, y
```

Usually you would find these last 2 operations combined as you don’t need to store the results of the `concat`

operation.

```
xr.concat((aa,bb),'z').mean(dim='z')
```

```
<xarray.DataArray (x: 2, y: 3)>
array([[2.5, 1. , 4. ],
[3. , 5.5, nan]])
Dimensions without coordinates: x, y
```