Документация Engee

Logistic regression

Страница в процессе перевода.

This tutorial was generated using Literate.jl. Download the source as a .jl file.

This tutorial was originally contributed by François Pacaud.

This tutorial shows how to solve a logistic regression problem with JuMP. Logistic regression is a well known method in machine learning, useful when we want to classify binary variables with the help of a given set of features. To this goal, we find the optimal combination of features maximizing the (log)-likelihood onto a training set. From a modern optimization glance, the resulting problem is convex and differentiable. On a modern optimization glance, it is even conic representable.

Formulating the logistic regression problem

Suppose we have a set of training data-point , where for each we have a vector of features and a categorical observation .

The log-likelihood is given by

and the optimal minimizes the logistic loss function:

Most of the time, instead of solving directly the previous optimization problem, we prefer to add a regularization term:

with a penalty and a norm function. By adding such a regularization term, we avoid overfitting on the training set and usually achieve a greater score in cross-validation.

Reformulation as a conic optimization problem

By introducing auxiliary variables and , the optimization problem is equivalent to

Now, the trick is to reformulate the constraints with the help of the exponential cone

Indeed, by passing to the exponential, we see that for all , the constraint is equivalent to

with . Then, by adding two auxiliary variables and such that and , we get the equivalent formulation

In this setting, the conic version of the logistic regression problems writes out

and thus encompasses variables and constraints ( is only a virtual constraint used to clarify the notation). Thus, if , we get a large number of variables and constraints.

Fitting logistic regression with a conic solver

It is now time to pass to the implementation. We choose SCS as a conic solver.

using JuMP
import Random
import SCS

This tutorial uses sets from MathOptInterface. By default, JuMP exports the MOI symbol as an alias for the MathOptInterface.jl package. We recommend making this more explicit in your code by adding the following lines:

import MathOptInterface as MOI
Random.seed!(2713);

We start by implementing a function to generate a fake dataset, and where we could tune the correlation between the feature variables. The function is a direct transcription of the one used in this blog post.

function generate_dataset(n_samples = 100, n_features = 10; shift = 0.0)
    X = randn(n_samples, n_features)
    w = randn(n_features)
    y = sign.(X * w)
    X .+= 0.8 * randn(n_samples, n_features) # add noise
    X .+= shift # shift the points in the feature space
    X = hcat(X, ones(n_samples, 1))
    return X, y
end
generate_dataset (generic function with 3 methods)

We write a softplus function to formulate each constraint with two exponential cones.

function softplus(model, t, u)
    z = @variable(model, [1:2], lower_bound = 0.0)
    @constraint(model, sum(z) <= 1.0)
    @constraint(model, [u - t, 1, z[1]] in MOI.ExponentialCone())
    @constraint(model, [-t, 1, z[2]] in MOI.ExponentialCone())
end
softplus (generic function with 1 method)

regularized logistic regression

Then, with the help of the softplus function, we could write our optimization model. In the regularization case, the constraint rewrites as a second order cone constraint.

function build_logit_model(X, y, λ)
    n, p = size(X)
    model = Model()
    @variable(model, θ[1:p])
    @variable(model, t[1:n])
    for i in 1:n
        u = -(X[i, :]' * θ) * y[i]
        softplus(model, t[i], u)
    end
    # Add ℓ2 regularization
    @variable(model, 0.0 <= reg)
    @constraint(model, [reg; θ] in SecondOrderCone())
    # Define objective
    @objective(model, Min, sum(t) + λ * reg)
    return model
end
build_logit_model (generic function with 1 method)

We generate the dataset.

Be careful here, for large n and p SCS could fail to converge.

n, p = 200, 10
X, y = generate_dataset(n, p; shift = 10.0);

# We could now solve the logistic regression problem
λ = 10.0
model = build_logit_model(X, y, λ)
set_optimizer(model, SCS.Optimizer)
set_silent(model)
optimize!(model)
@assert is_solved_and_feasible(model)
θ♯ = value.(model[:θ])
11-element Vector{Float64}:
  0.02041320313399326
  0.1613990324656828
  0.35700771838432327
 -0.3078808218759407
 -0.39391842182094333
 -0.059140401509826857
  0.3471736591163848
 -0.8812123235444669
  0.20125660912000315
  0.5409398851901865
  0.08090419699796665

It appears that the speed of convergence is not that impacted by the correlation of the dataset, nor by the penalty .

regularized logistic regression

We now formulate the logistic problem with a regularization term. The regularization ensures sparsity in the optimal solution of the resulting optimization problem. Luckily, the norm is implemented as a set in MathOptInterface. Thus, we could formulate the sparse logistic regression problem with the help of a MOI.NormOneCone set.

function build_sparse_logit_model(X, y, λ)
    n, p = size(X)
    model = Model()
    @variable(model, θ[1:p])
    @variable(model, t[1:n])
    for i in 1:n
        u = -(X[i, :]' * θ) * y[i]
        softplus(model, t[i], u)
    end
    # Add ℓ1 regularization
    @variable(model, 0.0 <= reg)
    @constraint(model, [reg; θ] in MOI.NormOneCone(p + 1))
    # Define objective
    @objective(model, Min, sum(t) + λ * reg)
    return model
end

# Auxiliary function to count non-null components:
count_nonzero(v::Vector; tol = 1e-6) = sum(abs.(v) .>= tol)

# We solve the sparse logistic regression problem on the same dataset as before.
λ = 10.0
sparse_model = build_sparse_logit_model(X, y, λ)
set_optimizer(sparse_model, SCS.Optimizer)
set_silent(sparse_model)
optimize!(sparse_model)
@assert is_solved_and_feasible(sparse_model)
θ♯ = value.(sparse_model[:θ])
println(
    "Number of non-zero components: ",
    count_nonzero(θ♯),
    " (out of ",
    p,
    " features)",
)
Number of non-zero components: 8 (out of 10 features)

Extensions

A direct extension would be to consider the sparse logistic regression with hard thresholding, which, on contrary to the soft version using a regularization, adds an explicit cardinality constraint in its formulation:

where is the maximum number of non-zero components in the vector , and is the pseudo-norm:

The cardinality constraint could be reformulated with binary variables. Thus the hard sparse regression problem could be solved by any solver supporting mixed integer conic problems.