Bijectors

Index

Bijectors.Bijector
Bijectors.CorrBijector
Bijectors.Coupling
Bijectors.Inverse
Bijectors.LeakyReLU
Bijectors.NamedCoupling
Bijectors.NamedTransform
Bijectors.OrderedBijector
Bijectors.PartitionMask
Bijectors.PartitionMask
Bijectors.Permute
Bijectors.RationalQuadraticSpline
Bijectors.Reshape
Bijectors.Stacked
Bijectors.Transform
Bijectors.VecCholeskyBijector
Bijectors.VecCorrBijector
Bijectors._inv_link_chol_lkj
Bijectors._link_chol_lkj
Bijectors.bijector
Bijectors.cholesky_lower
Bijectors.cholesky_upper
Bijectors.columnwise
Bijectors.combine
Bijectors.compute_r
Bijectors.couple
Bijectors.coupling
Bijectors.elementwise
Bijectors.find_alpha
Bijectors.get_u_hat
Bijectors.has_constant_bijector
Bijectors.isclosedform
Bijectors.isinvertible
Bijectors.logabsdetjac
Bijectors.logabsdetjac!
Bijectors.logabsdetjacinv
Bijectors.ordered
Bijectors.output_length
Bijectors.output_size
Bijectors.partition
Bijectors.transform
Bijectors.transform!
Bijectors.transformed
Bijectors.triu1_to_vec
Bijectors.triu_mask
Bijectors.triu_to_vec
Bijectors.vec_to_triu
Bijectors.vec_to_triu1
Bijectors.with_logabsdet_jacobian!

Functions

# Bijectors._inv_link_chol_lkj — _Method

_inv_link_chol_lkj(y)

Inverse link function for cholesky factor.

# Bijectors._link_chol_lkj — _Method

function _link_chol_lkj(w)

Link function for cholesky factor.

An alternative and maybe more efficient implementation was considered:

for i=2:K, j=(i+1):K
    z[i, j] = (w[i, j] / w[i-1, j]) * (z[i-1, j] / sqrt(1 - z[i-1, j]^2))
end

But this implementation will not work when w[i-1, j] = 0. Though it is a zero measure set, unit matrix initialization will not work.

For equivalence, following explanations is given by @torfjelde:

For (i, j) in the loop below, we define

z₍ᵢ₋₁, ⱼ₎ = w₍ᵢ₋₁,ⱼ₎ * ∏ₖ₌₁ⁱ⁻² (1 / √(1 - z₍ₖ,ⱼ₎²))

and so

z₍ᵢ,ⱼ₎ = w₍ᵢ,ⱼ₎ * ∏ₖ₌₁ⁱ⁻¹ (1 / √(1 - z₍ₖ,ⱼ₎²))
        = (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (∏ₖ₌₁ⁱ⁻² 1 / √(1 - z₍ₖ,ⱼ₎²))
        = (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (w₍ᵢ₋₁,ⱼ₎ * ∏ₖ₌₁ⁱ⁻² 1 / √(1 - z₍ₖ,ⱼ₎²)) / w₍ᵢ₋₁,ⱼ₎
        = (w₍ᵢ,ⱼ₎ * / √(1 - z₍ᵢ₋₁,ⱼ₎²)) * (z₍ᵢ₋₁,ⱼ₎ / w₍ᵢ₋₁,ⱼ₎)
        = (w₍ᵢ,ⱼ₎ / w₍ᵢ₋₁,ⱼ₎) * (z₍ᵢ₋₁,ⱼ₎ / √(1 - z₍ᵢ₋₁,ⱼ₎²))

which is the above implementation.

# Bijectors.bijector — Method

bijector(d::Distribution)

Returns the constrained-to-unconstrained bijector for distribution d.

# Bijectors.cholesky_lower — Method

cholesky_lower(X)

Return the lower triangular Cholesky factor of X as a Matrix rather than LowerTriangular.

This is a thin wrapper around cholesky(Hermitian(X)).L that returns a Matrix rather than LowerTriangular.

# Bijectors.cholesky_upper — Method

cholesky_upper(X)

Return the upper triangular Cholesky factor of X as a Matrix rather than UpperTriangular.

This is a thin wrapper around cholesky(Hermitian(X)).U that returns a Matrix rather than UpperTriangular.

# Bijectors.columnwise — Method

Alias for Base.Fix1(eachcolmaphcat, f).

Represents a function f which is applied to each column of an input.

# Bijectors.combine — Method

combine(m::PartitionMask, x_1, x_2, x_3)

Combines x_1, x_2, and x_3 into a single vector.

# Bijectors.compute_r — Method

compute_r(y_minus_z0::AbstractVector{<:Real}, α, α_plus_β_hat)

Compute the unique solution to the equation

subject to and .

Since and , the solution is unique and given by

where . For details see appendix A.2 of the reference.

References

\D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

# Bijectors.couple — Method

Returns the coupling law constructed from x.

# Bijectors.coupling — Method

Returns the constructor of the coupling law.

# Bijectors.elementwise — Method

elementwise(f)

Alias for Base.Fix1(broadcast, f).

In the case where f::ComposedFunction, the result is Base.Fix1(broadcast, f.outer) ∘ Base.Fix1(broadcast, f.inner) rather than Base.Fix1(broadcast, f).

# Bijectors.find_alpha — Method

find_alpha(wt_y, wt_u_hat, b)

Compute an (approximate) real-valued solution to the equation

The uniqueness of the solution is guaranteed since . For details see appendix A.1 of the reference.

Initial bracket

For all , we have

Thus

which implies ]. To avoid floating point issues if , we use the more conservative interval ] as initial bracket, at the cost of one additional iteration step.

References

\D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

# Bijectors.get_u_hat — Method

get_u_hat(u::AbstractVector{<:Real}, w::AbstractVector{<:Real})

Return a tuple of vector that guarantees invertibility of the planar layer, and scalar .

Mathematical background

According to appendix A.1, vector defined by

guarantees that the planar layer is invertible for all and . We can rewrite as

Additionally, we obtain

References

\D. Rezende, S. Mohamed (2015): Variational Inference with Normalizing Flows. arXiv:1505.05770

# Bijectors.has_constant_bijector — Method

has_constant_bijector(dist_type::Type)

Returns true if the distribution type dist_type has a constant bijector, i.e. the return-value of bijector does not depend on runtime information.

# Bijectors.isclosedform — Method

isclosedform(b::Transform)::bool
isclosedform(b⁻¹::Inverse{<:Transform})::bool

Returns true or false depending on whether or not evaluation of b has a closed-form implementation.

Most transformations have closed-form evaluations, but there are cases where this is not the case. For example the inverse evaluation of PlanarLayer requires an iterative procedure to evaluate.

# Bijectors.isinvertible — Method

isinvertible(t)

Return true if t is invertible, and false otherwise.

# Bijectors.logabsdetjac! — Method

logabsdetjac!(b, x[, logjac])

Compute log(abs(det(J(b, x)))) and store the result in logjac, where J(b, x) is the jacobian of b at x.

# Bijectors.logabsdetjac — Method

logabsdetjac(b, x)

Return log(abs(det(J(b, x)))), where J(b, x) is the jacobian of b at x.

# Bijectors.logabsdetjacinv — Method

logabsdetjacinv(b, y)

Just an alias for logabsdetjac(inverse(b), y).

# Bijectors.ordered — Method

ordered(d::Distribution)

Return a Distribution whose support are ordered vectors, i.e., vectors with increasingly ordered elements.

This transformation is currently only supported for otherwise unconstrained distributions.

# Bijectors.output_length — Method

output_length(f, len::Int)

Returns the output length of f given the input length len.

# Bijectors.output_size — Method

output_size(f, sz)

Returns the output size of f given the input size sz.

# Bijectors.partition — Method

partition(m::PartitionMask, x)

Partitions x into 3 disjoint subvectors.

# Bijectors.transform! — Method

transform!(b, x[, y])

Transform x using b, storing the result in y.

If y is not provided, x is used as the output.

# Bijectors.transform — Method

transform(b, x)

Transform x using b, treating x as a single input.

# Bijectors.transformed — Method

transformed(d::Distribution)
transformed(d::Distribution, b::Bijector)

Couples distribution d with the bijector b by returning a TransformedDistribution.

If no bijector is provided, i.e. transformed(d) is called, then transformed(d, bijector(d)) is returned.

# Bijectors.triu1_to_vec — Method

triu1_to_vec(X::AbstractMatrix{<:Real})

Extracts elements from upper triangle of X with offset 1 and returns them as a vector.

# Bijectors.triu_mask — Method

triu_mask(X::AbstractMatrix, k::Int)

Return a mask for elements of X above the kth diagonal.

# Bijectors.triu_to_vec — Method

triu_to_vec(X::AbstractMatrix{<:Real})

Extracts elements from upper triangle of X and returns them as a vector.

# Bijectors.vec_to_triu — Method

vec_to_triu(x::AbstractVector{<:Real})

Constructs a matrix from a vector x by filling the upper triangle.

# Bijectors.vec_to_triu1 — Method

vec_to_triu1(x::AbstractVector{<:Real})

Constructs a matrix from a vector x by filling the upper triangle with offset 1.

# Bijectors.with_logabsdet_jacobian! — Method

with_logabsdet_jacobian!(b, x[, y, logjac])

Compute transform(b, x) and logabsdetjac(b, x), storing the result in y and logjac, respetively.

If y is not provided, then x will be used in its place.

Defaults to calling with_logabsdet_jacobian(b, x) and updating y and logjac with the result.

# InverseFunctions.inverse — Method

inverse(t::Transform)

Returns the inverse of transform t.

Types

# Bijectors.Bijector — Type

Abstract type of a bijector, i.e. differentiable bijection with differentiable inverse.

# Bijectors.CorrBijector — Type

CorrBijector <: Bijector

A bijector implementation of Stan’s parametrization method for Correlation matrix: https://mc-stan.org/docs/2_23/reference-manual/correlation-matrix-transform-section.html

Basically, a unconstrained strictly upper triangular matrix y is transformed to a correlation matrix by following readable but not that efficient form:

K = size(y, 1)
z = tanh.(y)

for j=1:K, i=1:K
    if i>j
        w[i,j] = 0
    elseif 1==i==j
        w[i,j] = 1
    elseif 1<i==j
        w[i,j] = prod(sqrt(1 .- z[1:i-1, j].^2))
    elseif 1==i<j
        w[i,j] = z[i,j]
    elseif 1<i<j
        w[i,j] = z[i,j] * prod(sqrt(1 .- z[1:i-1, j].^2))
    end
end

It is easy to see that every column is a unit vector, for example:

w3' w3 ==
w[1,3]^2 + w[2,3]^2 + w[3,3]^2 ==
z[1,3]^2 + (z[2,3] * sqrt(1 - z[1,3]^2))^2 + (sqrt(1-z[1,3]^2) * sqrt(1-z[2,3]^2))^2 ==
z[1,3]^2 + z[2,3]^2 * (1-z[1,3]^2) + (1-z[1,3]^2) * (1-z[2,3]^2) ==
z[1,3]^2 + z[2,3]^2 - z[2,3]^2 * z[1,3]^2 + 1 -z[1,3]^2 - z[2,3]^2 + z[1,3]^2 * z[2,3]^2 ==
1

And diagonal elements are positive, so w is a cholesky factor for a positive matrix.

x = w' * w

Consider block matrix representation for x

x = [w1'; w2'; ... wn'] * [w1 w2 ... wn] ==
[w1'w1 w1'w2 ... w1'wn;
 w2'w1 w2'w2 ... w2'wn;
 ...
]

The diagonal elements are given by wk’wk = 1, thus x is a correlation matrix.

Every step is invertible, so this is a bijection(bijector).

The implementation doesn’t follow their "manageable expression" directly, because their equation seems wrong (7/30/2020). Insteadly it follows definition above the "manageable expression" directly, which is also described in above doc.

# Bijectors.Coupling — Type

Coupling{F, M}(θ::F, mask::M)

Implements a coupling-layer as defined in [1].

Examples

julia> using Bijectors: Shift, Coupling, PartitionMask, coupling, couple

julia> m = PartitionMask(3, [1], [2]); # <= going to use x[2] to parameterize transform of x[1]

julia> cl = Coupling(Shift, m); # <= will do `y[1:1] = x[1:1] + x[2:2]`;

julia> x = [1., 2., 3.];

julia> cl(x)
3-element Vector{Float64}:
 3.0
 2.0
 3.0

julia> inverse(cl)(cl(x))
3-element Vector{Float64}:
 1.0
 2.0
 3.0

julia> coupling(cl) # get the `Bijector` map `θ -> b(⋅, θ)`
Shift

julia> couple(cl, x) # get the `Bijector` resulting from `x`
Shift([2.0])

julia> with_logabsdet_jacobian(cl, x)
([3.0, 2.0, 3.0], 0.0)

References

[1] Kobyzev, I., Prince, S., & Brubaker, M. A., Normalizing flows: introduction and ideas, CoRR, (), (2019).

# Bijectors.Inverse — Type

inverse(b::Transform)
Inverse(b::Transform)

A Transform representing the inverse transform of b.

# Bijectors.LeakyReLU — Type

LeakyReLU{T}(α::T) <: Bijector

Defines the invertible mapping

x ↦ x if x ≥ 0 else αx

where α > 0.

# Bijectors.NamedCoupling — Type

NamedCoupling{target, deps, F} <: AbstractNamedTransform

Implements a coupling layer for named bijectors.

See also: Coupling

Examples

julia> using Bijectors: NamedCoupling, Scale

julia> b = NamedCoupling(:b, (:a, :c), (a, c) -> Scale(a + c));

julia> x = (a = 1., b = 2., c = 3.);

julia> b(x)
(a = 1.0, b = 8.0, c = 3.0)

julia> (a = x.a, b = (x.a + x.c) * x.b, c = x.c)
(a = 1.0, b = 8.0, c = 3.0)

# Bijectors.NamedTransform — Type

NamedTransform <: AbstractNamedTransform

Wraps a NamedTuple of key -> Bijector pairs, implementing evaluation, inversion, etc.

Examples

julia> using Bijectors: NamedTransform, Scale

julia> b = NamedTransform((a = Scale(2.0), b = exp));

julia> x = (a = 1., b = 0., c = 42.);

julia> b(x)
(a = 2.0, b = 1.0, c = 42.0)

julia> (a = 2 * x.a, b = exp(x.b), c = x.c)
(a = 2.0, b = 1.0, c = 42.0)

# Bijectors.OrderedBijector — Type

OrderedBijector()

A bijector mapping ordered vectors in ℝᵈ to unordered vectors in ℝᵈ.

See also

Stan’s documentation
- Note that this transformation and its inverse are the opposite of in this reference.

# Bijectors.PartitionMask — Type

PartitionMask{A}(A_1::A, A_2::A, A_3::A) where {A}

This is used to partition and recombine a vector into 3 disjoint "subvectors".

Implements

partition(m::PartitionMask, x): partitions x into 3 disjoint "subvectors"
combine(m::PartitionMask, x_1, x_2, x_3): combines 3 disjoint vectors into a single one

Note that PartitionMask is not a Bijector. It is indeed a bijection, but does not follow the Bijector interface.

Its main use is in Coupling where we want to partition the input into 3 parts, one part to transform, one part to map into the parameter-space of the transform applied to the first part, and the last part of the vector is not used for anything.

Examples

julia> using Bijectors: PartitionMask, partition, combine

julia> m = PartitionMask(3, [1], [2]) # <= assumes input-length 3
PartitionMask{Bool,SparseArrays.SparseMatrixCSC{Bool,Int64}}(
  [1, 1]  =  true,
  [2, 1]  =  true,
  [3, 1]  =  true)

julia> # Partition into 3 parts; the last part is inferred to be indices `[3, ]` from
       # the fact that `[1]` and `[2]` does not make up all indices in `1:3`.
       x1, x2, x3 = partition(m, [1., 2., 3.])
([1.0], [2.0], [3.0])

julia> # Recombines the partitions into a vector
       combine(m, x1, x2, x3)
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

Note that the underlying SparseMatrix is using Bool as the element type. We can also specify this to be some other type using the sp_type keyword:

julia> m = PartitionMask{Float32}(3, [1], [2])
PartitionMask{Float32,SparseArrays.SparseMatrixCSC{Float32,Int64}}(
  [1, 1]  =  1.0,
  [2, 1]  =  1.0,
  [3, 1]  =  1.0)

# Bijectors.PartitionMask — Method

PartitionMask(n::Int, indices)

Assumes you want to split the vector, where indices refer to the parts of the vector you want to apply the bijector to.

# Bijectors.Permute — Type

Permute{A} <: Bijector

A bijector implementation of a permutation. The permutation is performed using a matrix of type A. There are a couple of different ways to construct Permute:

Permute([0 1; 1 0])          # will map [1, 2] => [2, 1]
Permute([2, 1])              # will map [1, 2] => [2, 1]
Permute(2, 2 => 1, 1 => 2)   # will map [1, 2] => [2, 1]
Permute(2, [1, 2] => [2, 1]) # will map [1, 2] => [2, 1]

If this is not clear, the examples might be of help.

Examples

A simple example is permuting a vector of size 3.

julia> b1 = Permute([
           0 1 0;
           1 0 0;
           0 0 1
       ])
Permute{Array{Int64,2}}([0 1 0; 1 0 0; 0 0 1])

julia> b2 = Permute([2, 1, 3])           # specify all elements at once
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(

  [2, 1]  =  1.0
  [1, 2]  =  1.0
  [3, 3]  =  1.0)

julia> b3 = Permute(3, 2 => 1, 1 => 2)    # element-wise
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(
  [2, 1]  =  1.0
  [1, 2]  =  1.0
  [3, 3]  =  1.0)

julia> b4 = Permute(3, [1, 2] => [2, 1])  # block-wise
Permute{SparseArrays.SparseMatrixCSC{Float64,Int64}}(
  [2, 1]  =  1.0
  [1, 2]  =  1.0
  [3, 3]  =  1.0)

julia> b1.A == b2.A == b3.A == b4.A
true

julia> b1([1., 2., 3.])
3-element Array{Float64,1}:
 2.0
 1.0
 3.0

julia> b2([1., 2., 3.])
3-element Array{Float64,1}:
 2.0
 1.0
 3.0

julia> b3([1., 2., 3.])
3-element Array{Float64,1}:
 2.0
 1.0
 3.0

julia> b4([1., 2., 3.])
3-element Array{Float64,1}:
 2.0
 1.0
 3.0

julia> inverse(b1)
Permute{LinearAlgebra.Transpose{Int64,Array{Int64,2}}}([0 1 0; 1 0 0; 0 0 1])

julia> inverse(b1)(b1([1., 2., 3.]))
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

# Bijectors.RationalQuadraticSpline — Type

RationalQuadraticSpline{T} <: Bijector

Implementation of the Rational Quadratic Spline flow [1].

Outside of the interval [minimum(widths), maximum(widths)], this mapping is given by the identity map.
Inside the interval it’s given by a monotonic spline (i.e. monotonic polynomials connected at intermediate points) with endpoints fixed so as to continuously transform into the identity map.

For the sake of efficiency, there are separate implementations for 0-dimensional and 1-dimensional inputs.

Notes

There are two constructors for RationalQuadraticSpline:

RationalQuadraticSpline(widths, heights, derivatives): it is assumed that widths,

heights, and derivatives satisfy the constraints that makes this a valid bijector, i.e.

widths: monotonically increasing and length(widths) == K,
heights: monotonically increasing and length(heights) == K,
derivatives: non-negative and derivatives[1] == derivatives[end] == 1.
RationalQuadraticSpline(widths, heights, derivatives, B): other than than the lengths, no assumptions are made on parameters. Therefore we will transform the parameters s.t.:
widths_new ∈ [-B, B]ᴷ⁺¹, where K == length(widths),
heights_new ∈ [-B, B]ᴷ⁺¹, where K == length(heights),
derivatives_new ∈ (0, ∞)ᴷ⁺¹ with derivatives_new[1] == derivates_new[end] == 1, where (K - 1) == length(derivatives).

Examples

Univariate

julia> using StableRNGs: StableRNG; rng = StableRNG(42);  # For reproducibility.

julia> using Bijectors: RationalQuadraticSpline

julia> K = 3; B = 2;

julia> # Monotonic spline on '[-B, B]' with `K` intermediate knots/"connection points".
       b = RationalQuadraticSpline(randn(rng, K), randn(rng, K), randn(rng, K - 1), B);

julia> b(0.5) # inside of `[-B, B]` → transformed
1.1943325397834206

julia> b(5.) # outside of `[-B, B]` → not transformed
5.0

julia> b = RationalQuadraticSpline(b.widths, b.heights, b.derivatives);

julia> b(0.5) # inside of `[-B, B]` → transformed
1.1943325397834206

julia> d = 2; K = 3; B = 2;

julia> b = RationalQuadraticSpline(randn(rng, d, K), randn(rng, d, K), randn(rng, d, K - 1), B);

julia> b([-1., 1.])
2-element Vector{Float64}:
 -1.5660106244288925
  0.5384702734738573

julia> b([-5., 5.])
2-element Vector{Float64}:
 -5.0
  5.0

julia> b([-1., 5.])
2-element Vector{Float64}:
 -1.5660106244288925
  5.0

References

[1] Durkan, C., Bekasov, A., Murray, I., & Papamakarios, G., Neural Spline Flows, CoRR, arXiv:1906.04032 [stat.ML], (2019).

# Bijectors.Reshape — Type

Reshape(in_shape, out_shape)

A Bijector that reshapes the input to the output shape.

Example

julia> b = Reshape((2, 3), (3, 2)) Reshape{Tuple{Int64, Int64}, Tuple{Int64, Int64}}((2, 3), (3, 2))

julia> Array(transform(b, reshape(1:6, 2, 3))) 3×2 Matrix{Int64}:  1  4  2  5  3  6

# Bijectors.Stacked — Type

Stacked(bs)
Stacked(bs, ranges)
stack(bs::Bijector...)

A Bijector which stacks bijectors together which can then be applied to a vector where bs[i]::Bijector is applied to x[ranges[i]]::UnitRange{Int}.

Arguments

bs can be either a Tuple or an AbstractArray of 0- and/or 1-dimensional bijectors
- If bs is a Tuple, implementations are type-stable using generated functions
- If bs is an AbstractArray, implementations are not type-stable and use iterative methods
ranges needs to be an iterable consisting of UnitRange{Int}
- length(bs) == length(ranges) needs to be true.

Examples

b1 = Logit(0.0, 1.0)
b2 = identity
b = stack(b1, b2)
b([0.0, 1.0]) == [b1(0.0), 1.0]  # => true

# Bijectors.Transform — Type

Abstract type for a transformation.

Implementing

A subtype of Transform of should at least implement transform(b, x).

If the Transform is also invertible:

Required:
- Either of the following:
  - transform(::Inverse{<:MyTransform}, x): the transform for its inverse.
  - InverseFunctions.inverse(b::MyTransform): returns an existing Transform.
- logabsdetjac: computes the log-abs-det jacobian factor.
Optional:
- with_logabsdet_jacobian: transform and logabsdetjac combined. Useful in cases where we can exploit shared computation in the two.

For the above methods, there are mutating versions which can optionally be implemented:

with_logabsdet_jacobian!
logabsdetjac!
with_logabsdet_jacobian!

# Bijectors.VecCholeskyBijector — Type

VecCholeskyBijector <: Bijector

A bijector to transform a Cholesky factor of a correlation matrix to an unconstrained vector.

Fields

mode :Symbol. Controls the inverse tranformation :
- if mode === :U returns a LinearAlgebra.Cholesky holding the UpperTriangular factor
- if mode === :L returns a LinearAlgebra.Cholesky holding the LowerTriangular factor

Reference

https://mc-stan.org/docs/reference-manual/cholesky-factors-of-correlation-matrices-1