.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "advanced_examples/plot_variogram_estimation_modelling.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_advanced_examples_plot_variogram_estimation_modelling.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_advanced_examples_plot_variogram_estimation_modelling.py:


Estimation and modelling of spatial variograms
==============================================

Digital elevation models have errors that are often `correlated in space <https://en.wikipedia.org/wiki/Spatial_analysis#Spatial_auto-correlation>`_.
While many DEM studies used solely short-range `variogram <https://en.wikipedia.org/wiki/Variogram>`_ to
estimate the correlation of elevation measurement errors (e.g., `Howat et al. (2008) <https://doi.org/10.1029/2008GL034496>`_
, `Wang and Kääb (2015) <https://doi.org/10.3390/rs70810117>`_), recent studies show that variograms of multiple ranges
provide larger, more reliable estimates of spatial correlation for DEMs.

Here, we show an example in which we estimate the spatial correlation for a DEM difference at Longyearbyen, and its
impact on the standard error with averaging area. We first estimate an empirical variogram with
:func:`xdem.spatialstats.sample_empirical_variogram` based on routines of `scikit-gstat
<https://mmaelicke.github.io/scikit-gstat/index.html>`_. We then fit the empirical variogram with a sum of variogram
models using :func:`xdem.spatialstats.fit_sum_model_variogram`. Finally, we perform spatial propagation for a range of
averaging area using :func:`xdem.spatialstats.number_effective_samples`, and empirically validate the improved
robustness of our results using :func:`xdem.spatialstats.patches_method`, an intensive Monte-Carlo sampling approach.

**Reference:** `Hugonnet et al. (2022) <https://doi.org/10.1109/jstars.2022.3188922>`_, Figure 5 and Equations 13–16.

.. GENERATED FROM PYTHON SOURCE LINES 21-29

.. code-block:: default

    import geoutils as gu

    import matplotlib.pyplot as plt
    import numpy as np

    import xdem
    from xdem.spatialstats import nmad


.. GENERATED FROM PYTHON SOURCE LINES 31-32

We load example files.

.. GENERATED FROM PYTHON SOURCE LINES 32-37

.. code-block:: default


    dh = xdem.DEM(xdem.examples.get_path("longyearbyen_ddem"))
    glacier_outlines = gu.Vector(xdem.examples.get_path("longyearbyen_glacier_outlines"))
    mask_glacier = glacier_outlines.create_mask(dh)


.. GENERATED FROM PYTHON SOURCE LINES 38-39

We exclude values on glacier terrain in order to isolate stable terrain, our proxy for elevation errors.

.. GENERATED FROM PYTHON SOURCE LINES 39-41

.. code-block:: default

    dh.set_mask(mask_glacier)


.. GENERATED FROM PYTHON SOURCE LINES 42-44

We estimate the average per-pixel elevation error on stable terrain, using both the standard deviation
and normalized median absolute deviation. For this example, we do not account for elevation heteroscedasticity.

.. GENERATED FROM PYTHON SOURCE LINES 44-47

.. code-block:: default

    print(f"STD: {np.nanstd(dh.data):.2f} meters.")
    print(f"NMAD: {xdem.spatialstats.nmad(dh.data):.2f} meters.")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    STD: 3.22 meters.
    NMAD: 2.51 meters.


.. GENERATED FROM PYTHON SOURCE LINES 48-52

The two measures of dispersion are quite similar showing that, on average, there is a small influence of outliers on the
elevation differences. The per-pixel precision is about :math:`\pm` 2.5 meters.
**Does this mean that every pixel has an independent measurement error of** :math:`\pm` **2.5 meters?**
Let's plot the elevation differences to visually check the quality of the data.

.. GENERATED FROM PYTHON SOURCE LINES 52-55

.. code-block:: default

    plt.figure(figsize=(8, 5))
    dh.show(ax=plt.gca(), cmap="RdYlBu", vmin=-4, vmax=4, cbar_title="Elevation differences (m)")


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_001.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 56-59

We clearly see that the residual elevation differences on stable terrain are not random. The positive and negative
differences (blue and red, respectively) appear correlated over large distances. These correlated errors are what
we want to estimate and model.

.. GENERATED FROM PYTHON SOURCE LINES 61-64

Additionally, we notice that the elevation differences are still polluted by unrealistically large elevation
differences near glaciers, probably because the glacier inventory is more recent than the data, hence with too small outlines.
To remedy this, we filter large elevation differences outside 4 NMAD.

.. GENERATED FROM PYTHON SOURCE LINES 64-66

.. code-block:: default

    dh.set_mask(np.abs(dh.data) > 4 * xdem.spatialstats.nmad(dh.data))


.. GENERATED FROM PYTHON SOURCE LINES 67-68

We plot the elevation differences after filtering to check that we successively removed glacier signals.

.. GENERATED FROM PYTHON SOURCE LINES 68-71

.. code-block:: default

    plt.figure(figsize=(8, 5))
    dh.show(ax=plt.gca(), cmap="RdYlBu", vmin=-4, vmax=4, cbar_title="Elevation differences (m)")


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_002.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 72-80

To quantify the spatial correlation of the data, we sample an empirical variogram.
The empirical variogram describes the variance between the elevation differences of pairs of pixels depending on their
distance. This distance between pairs of pixels if referred to as spatial lag.

To perform this procedure effectively, we use improved methods that provide efficient pairwise sampling methods for
large grid data in `scikit-gstat <https://mmaelicke.github.io/scikit-gstat/index.html>`_, which are encapsulated
conveniently by :func:`xdem.spatialstats.sample_empirical_variogram`:
Dowd's variogram is used for robustness in conjunction with the NMAD (see :ref:`robuststats-corr`).

.. GENERATED FROM PYTHON SOURCE LINES 80-85

.. code-block:: default


    df = xdem.spatialstats.sample_empirical_variogram(
        values=dh.data, gsd=dh.res[0], subsample=100, n_variograms=10, estimator="dowd", random_state=42
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/numpy/lib/nanfunctions.py:1215: RuntimeWarning: Mean of empty slice
      return np.nanmean(a, axis, out=out, keepdims=keepdims)


.. GENERATED FROM PYTHON SOURCE LINES 86-88

*Note: in this example, we add a* ``random_state`` *argument to yield a reproducible random sampling of pixels within
the grid.*

.. GENERATED FROM PYTHON SOURCE LINES 90-91

We plot the empirical variogram:

.. GENERATED FROM PYTHON SOURCE LINES 91-93

.. code-block:: default

    xdem.spatialstats.plot_variogram(df)


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_003.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 94-101

With this plot, it is hard to conclude anything! Properly visualizing the empirical variogram is one of the most
important step. With grid data, we expect short-range correlations close to the resolution of the grid (~20-200
meters), but also possibly longer range correlation due to instrument noise or alignment issues (~1-50 km).

To better visualize the variogram, we can either change the axis to log-scale, but this might make it more difficult
to later compare to variogram models. # Another solution is to split the variogram plot into subpanels, each with
its own linear scale. Both are shown below.

.. GENERATED FROM PYTHON SOURCE LINES 103-104

**Log scale:**

.. GENERATED FROM PYTHON SOURCE LINES 104-106

.. code-block:: default

    xdem.spatialstats.plot_variogram(df, xscale="log")


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_004.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 107-108

**Subpanels with linear scale:**

.. GENERATED FROM PYTHON SOURCE LINES 108-110

.. code-block:: default

    xdem.spatialstats.plot_variogram(df, xscale_range_split=[100, 1000, 10000])


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_005.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_005.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 111-119

We identify:
  - a short-range (i.e., correlation length) correlation, likely due to effects of resolution. It has a large partial sill (correlated variance), meaning that the elevation measurement errors are strongly correlated until a range of ~100 m.
  - a longer range correlation, with a smaller partial sill, meaning the part of the elevation measurement errors remain correlated over a longer distance.

In order to show the difference between accounting only for the most noticeable, short-range correlation, or adding the
long-range correlation, we fit this empirical variogram with two different models: a single spherical model, and
the sum of two spherical models (two ranges). For this, we use :func:`xdem.spatialstats.fit_sum_model_variogram`, which
is based on `scipy.optimize.curve_fit <https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html>`_:

.. GENERATED FROM PYTHON SOURCE LINES 119-134

.. code-block:: default

    func_sum_vgm1, params_vgm1 = xdem.spatialstats.fit_sum_model_variogram(
        list_models=["Spherical"], empirical_variogram=df
    )

    func_sum_vgm2, params_vgm2 = xdem.spatialstats.fit_sum_model_variogram(
        list_models=["Spherical", "Spherical"], empirical_variogram=df
    )

    xdem.spatialstats.plot_variogram(
        df,
        list_fit_fun=[func_sum_vgm1, func_sum_vgm2],
        list_fit_fun_label=["Single-range model", "Double-range model"],
        xscale_range_split=[100, 1000, 10000],
    )


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_006.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_006.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/scipy/optimize/_lsq/trf.py:234: RuntimeWarning: divide by zero encountered in divide
      Delta = norm(x0 * scale_inv / v**0.5)
    /home/docs/checkouts/readthedocs.org/user_builds/xdem-erikm/conda/latest/lib/python3.11/site-packages/scipy/optimize/_lsq/trf.py:363: RuntimeWarning: invalid value encountered in scalar divide
      alpha *= Delta / Delta_new


.. GENERATED FROM PYTHON SOURCE LINES 135-143

The sum of two spherical models fits better, accouting for the small partial sill at longer ranges. Yet this longer
range partial sill (correlated variance) is quite small...

**So one could wonder: is it really important to account for this small additional "bump" in the variogram?**

To answer this, we compute the precision of the DEM integrated over a certain surface area based on spatial integration of the
variogram models using :func:`xdem.spatialstats.neff_circ`, with areas varying from pixel size to grid size.
Numerical and exact integration of variogram is fast, allowing us to estimate errors for a wide range of areas rapidly.

.. GENERATED FROM PYTHON SOURCE LINES 143-161

.. code-block:: default


    areas = np.linspace(20, 10000, 50) ** 2

    list_stderr_singlerange, list_stderr_doublerange, list_stderr_empirical = ([] for i in range(3))
    for area in areas:

        # Number of effective samples integrated over the area for a single-range model
        neff_singlerange = xdem.spatialstats.number_effective_samples(area, params_vgm1)

        # For a double-range model
        neff_doublerange = xdem.spatialstats.number_effective_samples(area, params_vgm2)

        # Convert into a standard error
        stderr_singlerange = nmad(dh.data) / np.sqrt(neff_singlerange)
        stderr_doublerange = nmad(dh.data) / np.sqrt(neff_doublerange)
        list_stderr_singlerange.append(stderr_singlerange)
        list_stderr_doublerange.append(stderr_doublerange)


.. GENERATED FROM PYTHON SOURCE LINES 162-165

We add an empirical error based on intensive Monte-Carlo sampling ("patches" method) to validate our results.
This method is implemented in :func:`xdem.spatialstats.patches_method`. Here, we sample fewer areas to avoid for the
patches method to run over long processing times, increasing from areas of 5 pixels to areas of 10000 pixels exponentially.

.. GENERATED FROM PYTHON SOURCE LINES 165-187

.. code-block:: default


    areas_emp = [4000 * 2 ** (i) for i in range(10)]
    df_patches = xdem.spatialstats.patches_method(dh, gsd=dh.res[0], areas=areas_emp)


    fig, ax = plt.subplots()
    plt.plot(np.asarray(areas) / 1000000, list_stderr_singlerange, label="Single-range spherical model")
    plt.plot(np.asarray(areas) / 1000000, list_stderr_doublerange, label="Double-range spherical model")
    plt.scatter(
        df_patches.exact_areas.values / 1000000,
        df_patches.nmad.values,
        label="Empirical estimate",
        color="black",
        marker="x",
    )
    plt.xlabel("Averaging area (km²)")
    plt.ylabel("Uncertainty in the mean elevation difference (m)")
    plt.xscale("log")
    plt.yscale("log")
    plt.legend()
    plt.show()


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_007.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_007.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 188-190

*Note: in this example, we add a* ``random_state`` *argument to the patches method to yield a reproducible random
sampling, and set* ``n_patches`` *to reduce computing time.*

.. GENERATED FROM PYTHON SOURCE LINES 192-203

Using a single-range variogram highly underestimates the measurement error integrated over an area, by over a factor
of ~100 for large surface areas. Using a double-range variogram brings us closer to the empirical error.

**But, in this case, the error is still too small. Why?**
The small size of the sampling area against the very large range of the noise implies that we might not verify the
assumption of second-order stationarity (see :ref:`spatialstats`). Longer range correlations might be omitted by
our analysis, due to the limits of the variogram sampling. In other words, a small part of the variance could be
fully correlated over a large part of the grid: a vertical bias.

As a first guess for this, let's examine the difference between mean and median to gain some insight on the central
tendency of our sample:

.. GENERATED FROM PYTHON SOURCE LINES 203-207

.. code-block:: default


    diff_med_mean = np.nanmean(dh.data.data) - np.nanmedian(dh.data.data)
    print(f"Difference mean/median: {diff_med_mean:.3f} meters.")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Difference mean/median: -0.823 meters.


.. GENERATED FROM PYTHON SOURCE LINES 208-209

If we now express it as a percentage of the dispersion:

.. GENERATED FROM PYTHON SOURCE LINES 209-212

.. code-block:: default


    print(f"{diff_med_mean/np.nanstd(dh.data)*100:.1f} % of STD.")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    -30.3 % of STD.


.. GENERATED FROM PYTHON SOURCE LINES 213-216

There might be a significant bias of central tendency, i.e. almost fully correlated measurement error across the grid.
If we assume that around 5% of the variance is fully correlated, and re-calculate our elevation measurement errors
accordingly.

.. GENERATED FROM PYTHON SOURCE LINES 216-250

.. code-block:: default


    list_stderr_doublerange_plus_fullycorrelated = []
    for area in areas:

        # For a double-range model
        neff_doublerange = xdem.spatialstats.neff_circular_approx_numerical(area=area, params_variogram_model=params_vgm2)

        # About 5% of the variance might be fully correlated, the other 95% has the random part that we quantified
        stderr_fullycorr = np.sqrt(0.05 * np.nanvar(dh.data))
        stderr_doublerange = np.sqrt(0.95 * np.nanvar(dh.data)) / np.sqrt(neff_doublerange)
        list_stderr_doublerange_plus_fullycorrelated.append(stderr_fullycorr + stderr_doublerange)

    fig, ax = plt.subplots()
    plt.plot(np.asarray(areas) / 1000000, list_stderr_singlerange, label="Single-range spherical model")
    plt.plot(np.asarray(areas) / 1000000, list_stderr_doublerange, label="Double-range spherical model")
    plt.plot(
        np.asarray(areas) / 1000000,
        list_stderr_doublerange_plus_fullycorrelated,
        label="5% fully correlated,\n 95% double-range spherical model",
    )
    plt.scatter(
        df_patches.exact_areas.values / 1000000,
        df_patches.nmad.values,
        label="Empirical estimate",
        color="black",
        marker="x",
    )
    plt.xlabel("Averaging area (km²)")
    plt.ylabel("Uncertainty in the mean elevation difference (m)")
    plt.xscale("log")
    plt.yscale("log")
    plt.legend()
    plt.show()


.. image-sg:: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_008.png
   :alt: plot variogram estimation modelling
   :srcset: /advanced_examples/images/sphx_glr_plot_variogram_estimation_modelling_008.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 251-256

Our final estimation is now very close to the empirical error estimate.

Some take-home points:
  1. Long-range correlations are very important to reliably estimate measurement errors integrated in space, even if they have a small partial sill i.e. correlated variance,
  2. Ideally, the grid must only contain correlation patterns significantly smaller than the grid size to verify second-order stationarity. Otherwise, be wary of small biases of central tendency, i.e. fully correlated measurement errors!


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 42.362 seconds)


.. _sphx_glr_download_advanced_examples_plot_variogram_estimation_modelling.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_variogram_estimation_modelling.py <plot_variogram_estimation_modelling.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_variogram_estimation_modelling.ipynb <plot_variogram_estimation_modelling.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_