Fitting across multiple dimensions#

Suppose you have to fit a single model to multiple data points across some dimension, or even multiple dimensions. The accessor can handle this with ease.

To demonstrate, let’s extend our previous example of fitting a 1D Gaussian peak on a linear background to 2D, where each row contains a Gaussian peak with a different center.

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import lmfit
import xarray_lmfit
# Define coordinates
x = np.linspace(-5.0, 5.0, 100)
y = np.arange(3)

# Center of the peaks along y
center = np.array([-2.0, 0.0, 2.0])[:, np.newaxis]

# Gaussian peak on a linear background
z = -0.1 * x + 2 + 3 * np.exp(-((x - center) ** 2) / (2 * 1**2))

# Add some noise with fixed seed for reproducibility
rng = np.random.default_rng(5)
zerr = np.full_like(z, 0.1)
z = rng.normal(z, zerr)

# Construct DataArray
darr = xr.DataArray(z, dims=["y", "x"], coords={"y": y, "x": x})
darr.plot()
<matplotlib.collections.QuadMesh at 0x702b48685c10>
../_images/f3851089dbf5e32b37d7f48016855fb5dcf8e79d1a3ef77fd1a56e1d998fd72e.png

xarray.DataArray.xlm.modelfit() will automatically broadcast the model parameters across the non-fitting dimensions, allowing you to fit all rows in one go.

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

params = {"center": 0.0, "slope": -0.1}

result_ds = darr.xlm.modelfit(coords="x", model=model, params=params)
result_ds
<xarray.Dataset> Size: 8kB
Dimensions:                (y: 3, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 24B 0 1 2
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 24B <lmfit.model.ModelResult object at ...
    modelfit_coefficients  (y, param) float64 120B 7.324 -2.0 ... -0.1044 1.993
    modelfit_stderr        (y, param) float64 120B 0.1349 0.011 ... 0.01735
    modelfit_covariance    (y, cov_i, cov_j) float64 600B 0.01819 ... 0.0003009
    modelfit_stats         (y, fit_stat) float64 216B 51.0 5.0 ... -450.2 0.9895
    modelfit_data          (y, x) float64 2kB 2.453 2.402 2.515 ... 1.404 1.64
    modelfit_best_fit      (y, x) float64 2kB 2.553 2.553 2.557 ... 1.526 1.504

Note that xarray.DataArray.xlm.modelfit() also allows params to be provided as a dictionary, structured like the keyword arguments to lmfit.model.Model.make_params() or lmfit.parameter.create_params().

Providing initial guesses#

What if you want to provide different initial guesses for each row? Using the powerful broadcasting capabilities of xarray, you can provide initial guesses and bounds for the fitting parameters as xarray.DataArrays.

For instance, if we want to provide different initial guesses for the peak positions along y, we can do so by passing a dictionary of DataArrays to the params argument.

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

params = {
    "center": xr.DataArray([-2, 0, 2], coords=[darr.y]),
    "slope": -0.1,
}

result_ds = darr.xlm.modelfit(coords="x", model=model, params=params)
result_ds
<xarray.Dataset> Size: 8kB
Dimensions:                (y: 3, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 24B 0 1 2
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 24B <lmfit.model.ModelResult object at ...
    modelfit_coefficients  (y, param) float64 120B 7.324 -2.0 ... -0.1044 1.993
    modelfit_stderr        (y, param) float64 120B 0.1349 0.011 ... 0.01735
    modelfit_covariance    (y, cov_i, cov_j) float64 600B 0.01819 ... 0.000301
    modelfit_stats         (y, fit_stat) float64 216B 31.0 5.0 ... -450.2 0.9895
    modelfit_data          (y, x) float64 2kB 2.453 2.402 2.515 ... 1.404 1.64
    modelfit_best_fit      (y, x) float64 2kB 2.553 2.553 2.557 ... 1.526 1.504

Let’s overlay the fitted peak positions on the data.

result_ds.modelfit_data.plot()
result_center = result_ds.sel(param="center")

plt.plot(result_center.modelfit_coefficients, result_center.y, "o-")
[<matplotlib.lines.Line2D at 0x702b46163f20>]
../_images/e4948f45732a11ed876ef669702dbd5adeb572c02237a49e75c2efeb452fbe3d.png

The same can be done with all parameter attributes that can be passed to lmfit.parameter.create_params() (e.g., vary, min, max, etc.). For example:

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

params = {
    "center": {
        "value": xr.DataArray([-2, 0, 2], coords=[darr.y]),
        "min": -5.0,
        "max": xr.DataArray([0, 2, 5], coords=[darr.y]),
    },
    "slope": -0.1,
}

result_ds = darr.xlm.modelfit(coords="x", model=model, params=params)
result_ds
<xarray.Dataset> Size: 8kB
Dimensions:                (y: 3, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 24B 0 1 2
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 24B <lmfit.model.ModelResult object at ...
    modelfit_coefficients  (y, param) float64 120B 7.324 -2.0 ... -0.1044 1.993
    modelfit_stderr        (y, param) float64 120B 0.1349 0.011 ... 0.01735
    modelfit_covariance    (y, cov_i, cov_j) float64 600B 0.01819 ... 0.000301
    modelfit_stats         (y, fit_stat) float64 216B 31.0 5.0 ... -450.2 0.9895
    modelfit_data          (y, x) float64 2kB 2.453 2.402 2.515 ... 1.404 1.64
    modelfit_best_fit      (y, x) float64 2kB 2.553 2.553 2.557 ... 1.526 1.504

Parallelization#

The accessors are tightly integrated with xarray, so passing a dask array will parallelize the fitting process. See Parallel Computing with Dask for background on how xarray and dask work together.

Assuming you have dask installed with a dask scheduler set up, you can fit large datasets in parallel with ease.

For example, recall the previous example where we created a DataArray with 3 Gaussian peaks, each with a different center, except this time we will create not just 3, but 300 peaks.

# Define coordinates
x = np.linspace(-5.0, 5.0, 100)
y = np.arange(300)

# Center of the peaks along y
center = np.linspace(-2.0, 2.0, 300)[:, np.newaxis]

# Gaussian peak on a linear background
z = -0.1 * x + 2 + 3 * np.exp(-((x - center) ** 2) / (2 * 1**2))

# Construct DataArray
darr = xr.DataArray(z, dims=["y", "x"], coords={"y": y, "x": x})
darr.plot()
<matplotlib.collections.QuadMesh at 0x702b4615e300>
../_images/f878c35e2371a82a615a250672795565718b4eaa309f9fc16e0301cc42dcc64e.png

Now, let’s try chunking the data and converting it to a dask array before fitting:

darr = darr.chunk({"y": 50})
darr
<xarray.DataArray (y: 300, x: 100)> Size: 240kB
dask.array<xarray-<this-array>, shape=(300, 100), dtype=float64, chunksize=(50, 100), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) int64 2kB 0 1 2 3 4 5 6 7 8 ... 292 293 294 295 296 297 298 299
  * x        (x) float64 800B -5.0 -4.899 -4.798 -4.697 ... 4.798 4.899 5.0

When xarray.DataArray.xlm.modelfit() is called on a dask array, the fitting is not performed immediately. Instead, a dask graph is created that represents the computation to be performed:

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

params = {
    "center": xr.DataArray(np.linspace(-2.0, 2.0, 300), coords=[darr.y]),
    "slope": -0.1,
}

result = darr.xlm.modelfit(coords="x", model=model, params=params)
result
<xarray.Dataset> Size: 592kB
Dimensions:                (y: 300, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 2kB 0 1 2 3 4 5 ... 294 295 296 297 298 299
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 2kB dask.array<chunksize=(50,), meta=np.ndarray>
    modelfit_coefficients  (y, param) float64 12kB dask.array<chunksize=(50, 5), meta=np.ndarray>
    modelfit_stderr        (y, param) float64 12kB dask.array<chunksize=(50, 5), meta=np.ndarray>
    modelfit_covariance    (y, cov_i, cov_j) float64 60kB dask.array<chunksize=(50, 5, 5), meta=np.ndarray>
    modelfit_stats         (y, fit_stat) float64 22kB dask.array<chunksize=(50, 9), meta=np.ndarray>
    modelfit_data          (y, x) float64 240kB dask.array<chunksize=(50, 100), meta=np.ndarray>
    modelfit_best_fit      (y, x) float64 240kB dask.array<chunksize=(50, 100), meta=np.ndarray>

You can call .compute() on the entire Dataset, or on individual variables to perform the fitting in parallel.

result.compute()
<xarray.Dataset> Size: 592kB
Dimensions:                (y: 300, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 2kB 0 1 2 3 4 5 ... 294 295 296 297 298 299
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 2kB <lmfit.model.ModelResult object at ...
    modelfit_coefficients  (y, param) float64 12kB 7.52 -2.0 1.0 ... -0.1 2.0
    modelfit_stderr        (y, param) float64 12kB 6.986e-15 ... 1.121e-15
    modelfit_covariance    (y, cov_i, cov_j) float64 60kB 4.881e-29 ... 1.256...
    modelfit_stats         (y, fit_stat) float64 22kB 13.0 5.0 ... 1.0
    modelfit_data          (y, x) float64 240kB 2.533 2.535 2.54 ... 1.555 1.533
    modelfit_best_fit      (y, x) float64 240kB 2.533 2.535 2.54 ... 1.555 1.533