Fitting across multiple dimensions#

Suppose you have to fit a single model to multiple data points across some dimension, or even multiple dimensions. The accessor can handle this with ease.

To demonstrate, let’s extend our previous example of fitting a 1D Gaussian peak on a linear background to 2D, where each row contains a Gaussian peak with a different center.

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import lmfit
import xarray_lmfit

# Define coordinates
x = np.linspace(-5.0, 5.0, 100)
y = np.arange(3)

# Center of the peaks along y
center = np.array([-2.0, 0.0, 2.0])[:, np.newaxis]

# Gaussian peak on a linear background
z = -0.1 * x + 2 + 3 * np.exp(-((x - center) ** 2) / (2 * 1**2))

# Add some noise with fixed seed for reproducibility
rng = np.random.default_rng(5)
zerr = np.full_like(z, 0.1)
z = rng.normal(z, zerr)

# Construct DataArray
darr = xr.DataArray(z, dims=["y", "x"], coords={"y": y, "x": x})
darr.plot()

<matplotlib.collections.QuadMesh at 0x702b48685c10>

../_images/f3851089dbf5e32b37d7f48016855fb5dcf8e79d1a3ef77fd1a56e1d998fd72e.png

xarray.DataArray.xlm.modelfit() will automatically broadcast the model parameters across the non-fitting dimensions, allowing you to fit all rows in one go.

Note that xarray.DataArray.xlm.modelfit() also allows params to be provided as a dictionary, structured like the keyword arguments to lmfit.model.Model.make_params() or lmfit.parameter.create_params().

Parallelization#

The accessors are tightly integrated with xarray, so passing a dask array will parallelize the fitting process. See Parallel Computing with Dask for background on how xarray and dask work together.

Assuming you have dask installed with a dask scheduler set up, you can fit large datasets in parallel with ease.

For example, recall the previous example where we created a DataArray with 3 Gaussian peaks, each with a different center, except this time we will create not just 3, but 300 peaks.

# Define coordinates
x = np.linspace(-5.0, 5.0, 100)
y = np.arange(300)

# Center of the peaks along y
center = np.linspace(-2.0, 2.0, 300)[:, np.newaxis]

# Gaussian peak on a linear background
z = -0.1 * x + 2 + 3 * np.exp(-((x - center) ** 2) / (2 * 1**2))

# Construct DataArray
darr = xr.DataArray(z, dims=["y", "x"], coords={"y": y, "x": x})
darr.plot()

<matplotlib.collections.QuadMesh at 0x702b4615e300>

../_images/f878c35e2371a82a615a250672795565718b4eaa309f9fc16e0301cc42dcc64e.png

Now, let’s try chunking the data and converting it to a dask array before fitting:

darr = darr.chunk({"y": 50})
darr

<xarray.DataArray (y: 300, x: 100)> Size: 240kB
dask.array<xarray-<this-array>, shape=(300, 100), dtype=float64, chunksize=(50, 100), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) int64 2kB 0 1 2 3 4 5 6 7 8 ... 292 293 294 295 296 297 298 299
  * x        (x) float64 800B -5.0 -4.899 -4.798 -4.697 ... 4.798 4.899 5.0

When xarray.DataArray.xlm.modelfit() is called on a dask array, the fitting is not performed immediately. Instead, a dask graph is created that represents the computation to be performed:

model = lmfit.models.GaussianModel() + lmfit.models.LinearModel()

params = {
    "center": xr.DataArray(np.linspace(-2.0, 2.0, 300), coords=[darr.y]),
    "slope": -0.1,
}

result = darr.xlm.modelfit(coords="x", model=model, params=params)
result

<xarray.Dataset> Size: 592kB
Dimensions:                (y: 300, param: 5, cov_i: 5, cov_j: 5, fit_stat: 9,
                            x: 100)
Coordinates:
  * y                      (y) int64 2kB 0 1 2 3 4 5 ... 294 295 296 297 298 299
  * x                      (x) float64 800B -5.0 -4.899 -4.798 ... 4.899 5.0
  * param                  (param) <U9 180B 'amplitude' 'center' ... 'intercept'
  * fit_stat               (fit_stat) <U8 288B 'nfev' 'nvarys' ... 'rsquared'
  * cov_i                  (cov_i) <U9 180B 'amplitude' 'center' ... 'intercept'
  * cov_j                  (cov_j) <U9 180B 'amplitude' 'center' ... 'intercept'
Data variables:
    modelfit_results       (y) object 2kB dask.array<chunksize=(50,), meta=np.ndarray>
    modelfit_coefficients  (y, param) float64 12kB dask.array<chunksize=(50, 5), meta=np.ndarray>
    modelfit_stderr        (y, param) float64 12kB dask.array<chunksize=(50, 5), meta=np.ndarray>
    modelfit_covariance    (y, cov_i, cov_j) float64 60kB dask.array<chunksize=(50, 5, 5), meta=np.ndarray>
    modelfit_stats         (y, fit_stat) float64 22kB dask.array<chunksize=(50, 9), meta=np.ndarray>
    modelfit_data          (y, x) float64 240kB dask.array<chunksize=(50, 100), meta=np.ndarray>
    modelfit_best_fit      (y, x) float64 240kB dask.array<chunksize=(50, 100), meta=np.ndarray>

You can call .compute() on the entire Dataset, or on individual variables to perform the fitting in parallel.

Fitting across multiple dimensions#

Providing initial guesses#

Parallelization#

This Page