Tutorial
Introduction to Nonlinear Regression
Assume that, for the th observation, the relationship between independent variable and dependent variable follows:
where is a non-linear model function depends on the independent variable and the parameter vector . In order to find the parameter that "best" fit our data, we choose the parameter which minimizes the sum of squared residuals from our data, i.e. solves the problem:
Given that the function is non-linear, there’s no analytical solution for the best . We have to use computational tools, which is LsqFit.jl
in this tutorial, to find the least squares solution.
One example of non-linear model is the exponential model, which takes a one-element predictor variable . The model function is:
and the model becomes:
To fit data using LsqFit.jl
, pass the defined model function (m
), data (tdata
and ydata
) and the initial parameter value (p0
) to curve_fit()
. For now, LsqFit.jl
only supports the Levenberg Marquardt algorithm.
julia> # t: array of independent variables
julia> # p: array of model parameters
julia> m(t, p) = p[1] * exp.(p[2] * t)
julia> p0 = [0.5, 0.5]
julia> fit = curve_fit(m, tdata, ydata, p0)
It will return a composite type LsqFitResult
, with some interesting values:
-
dof(fit)
: degrees of freedom *coef(fit)
: best fit parameters *fit.resid
: vector of residuals *fit.jacobian
: estimated Jacobian at the solution
Jacobian Calculation
The Jacobian of a vector function is defined as the matrix with elements:
The matrix is therefore:
The Jacobian of the exponential model function with respect to
By default, the finite differences is used (see NLSolversBase.jl for more information), is used to approximate the Jacobian for the data fitting algorithm and covariance computation. Alternatively, a function which calculates the Jacobian can be supplied to curve_fit()
for faster and/or more accurate results.
function j_m(t,p)
J = Array{Float64}(length(t),length(p))
J[:,1] = exp.(p[2] .* t) #df/dp[1]
J[:,2] = t .* p[1] .* J[:,1] #df/dp[2]
J
end
fit = curve_fit(m, j_m, tdata, ydata, p0)
Linear Approximation
The non-linear function
where
Consider the residual vector functon
with entries:
Each entry’s linear approximation can hence be written as:
Since the
which is a linear function on
Goodness of Fit
The linear approximation of the non-linear least squares problem leads to the approximation of the covariance matrix of each parameter, from which we can perform regression analysis.
Consider a least squares solution
Set
which is essentially the linear least squares problem:
where
The covariance matrix for the analytical solution is:
Note that
Assume the errors in each sample are independent, normal distributed with zero mean and same variance, i.e.
where
In LsqFit.jl
, the covariance matrix calculation uses QR decomposition to be more computationally stable, which has the form:
vcov()
computes the covariance matrix of fit:
julia> cov = vcov(fit)
2×2 Array{Float64,2}:
0.000116545 0.000174633
0.000174633 0.00258261
The standard error is then the square root of each diagonal elements of the covariance matrix. stderror()
returns the standard error of each parameter:
julia> se = stderror(fit)
2-element Array{Float64,1}:
0.0114802
0.0520416
margin_error()
computes the product of standard error and the critical value of each parameter at a certain significance level (default is 5%) from t-distribution. The margin of error at 10% significance level can be computed by:
julia> margin_of_error = margin_error(fit, 0.1)
2-element Array{Float64,1}:
0.0199073
0.0902435
confint()
returns the confidence interval of each parameter at certain significance level, which is essentially the estimate value ± margin of error. To get the confidence interval at 10% significance level, run:
julia> confidence_intervals = confint(fit; level=0.9)
2-element Array{Tuple{Float64,Float64},1}:
(0.976316, 1.01613)
(1.91047, 2.09096)
Weighted Least Squares
curve_fit()
also accepts weight parameter (wt
) to perform Weighted Least Squares and General Least Squares, where the parameter
Weight parameter (wt
) is an array or a matrix of weights for each sample. To perform Weighted Least Squares, pass the weight array [w_1, w_2, ..., w_n]
or the weight matrix W
:
The weighted least squares problem becomes:
in matrix form:
where
is a residual vector function with entries:
The algorithm in LsqFit.jl
will then provide a least squares solution
In LsqFit.jl
, the residual function passed to levenberg_marquardt()
is in different format, if the weight is a vector:
r(p) = sqrt.(wt) .* ( model(xpts, p) - ydata )
lmfit(r, g, p0, wt; kwargs...)
Cholesky decomposition, which is effectively a sqrt of a matrix, will be performed if the weight is a matrix:
u = chol(wt)
r(p) = u * ( model(xpts, p) - ydata )
lmfit(r, p0, wt; kwargs...)
The solution will be the same as the least squares problem mentioned in the tutorial.
Set
The analytical solution to the linear approximation is:
Assume the errors in each sample are independent, normal distributed with zero mean and different variances (heteroskedastic error), i.e.
We know the error variance and we set the weight as the inverse of the variance (the optimal weight), i.e.
The covariance matrix is now:
If we only know the relative ratio of different variances, i.e.
where curve_fit()
currently does not support this implementation. curve_fit()
assumes the weight as the inverse of the error covariance matrix rather than the ratio of error covariance matrix, i.e. the covariance of the estimated parameter is calculated as covar = inv(J'*fit.wt*J)
.
Passing vector of ones as the weight vector will cause mistakes in covariance estimation. |
Pass the vector of 1 ./ var(ε)
or the matrix inv(covar(ε))
as the weight parameter (wt
) to the function curve_fit()
:
julia> wt = inv(cov_ε)
julia> fit = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = vcov(fit)
If the weight matrix is not a diagonal matrix, General Least Squares will be performed. |
General Least Squares
Assume the errors in each sample are correlated, normal distributed with zero mean and different variances (heteroskedastic and autocorrelated error), i.e.
Set the weight matrix as the inverse of the error covariance matrix (the optimal weight), i.e.
Pass the matrix inv(covar(ε))
as the weight parameter (wt
) to the function curve_fit()
:
julia> wt = 1 ./ yvar
julia> fit = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = vcov(fit)
Estimate the Optimal Weight
In most cases, the variances of errors are unknown. To perform Weighted Least Square, we need estimate the variances of errors first, which is the squared residual of
Unweighted fitting (OLS) will return the residuals we need, since the estimator of OLS is unbiased. Then pass the reciprocal of the residuals as the estimated optimal weight to perform Weighted Least Squares:
julia> fit_OLS = curve_fit(m, tdata, ydata, p0)
julia> wt = 1 ./ fit_OLS.resid
julia> fit_WLS = curve_fit(m, tdata, ydata, wt, p0)
julia> cov = vcov(fit_WLS)
References
Hansen, P. C., Pereyra, V. and Scherer, G. (2013) Least squares data fitting with applications. Baltimore, Md: Johns Hopkins University Press, p. 147-155.
Kutner, M. H. et al. (2005) Applied Linear statistical models.
Weisberg, S. (2014) Applied linear regression. Fourth edition. Hoboken, NJ: Wiley (Wiley series in probability and statistics).