ReinforcementLearningCore.jl

# ReinforcementLearningCore.AbstractExplorer — Type

RLBase.plan!(p::AbstractExplorer, x[, mask])

Define how to select an action based on action values.

# ReinforcementLearningCore.AbstractHook — Type

A hook is called at different stage during a run to allow users to inject customized runtime logic. By default, an AbstractHook will do nothing. One can customize the behavior by implementing the following methods:

Base.push!(hook::YourHook, ::PreActStage, agent, env)
Base.push!(hook::YourHook, ::PostActStage, agent, env)
Base.push!(hook::YourHook, ::PreEpisodeStage, agent, env)
Base.push!(hook::YourHook, ::PostEpisodeStage, agent, env)
Base.push!(hook::YourHook, ::PostExperimentStage, agent, env)

By convention, the Base.getindex(h::YourHook) is implemented to extract the metrics we are interested in. Users can compose different AbstractHooks with +.

# ReinforcementLearningCore.AbstractLearner — Type

AbstractLearner

Abstract type for a learner.

# ReinforcementLearningCore.ActorCritic — Type

ActorCritic(;actor, critic, optimizer=Adam())

The actor part must return logits (Do not use softmax in the last layer!), and the critic part must return a state value.

# ReinforcementLearningCore.Agent — Type

Agent(;policy, trajectory) <: AbstractPolicy

A wrapper of an AbstractPolicy. Generally speaking, it does nothing but to update the trajectory and policy appropriately in different stages. Agent is a Callable and its call method accepts varargs and keyword arguments to be passed to the policy.

# ReinforcementLearningCore.BatchExplorer — Type

BatchExplorer(explorer::AbstractExplorer)

# ReinforcementLearningCore.BatchStepsPerEpisode — Method

BatchStepsPerEpisode(batchsize::Int; tag = "TRAINING")

Similar to StepsPerEpisode, but is specific to environments which return a Vector of rewards (a typical case with MultiThreadEnv).

# ReinforcementLearningCore.CategoricalNetwork — Type

CategoricalNetwork(model)([rng,] state::AbstractArray [, mask::AbstractArray{Bool}]; is_sampling::Bool=false, is_return_log_prob::Bool = false)

CategoricalNetwork wraps a model (typically a neural network) that takes a state input and outputs logits for a categorical distribution. The optional argument mask must be an Array of Bool with the same size as state expect for the first dimension that must have the length of the action vector. Actions mapped to false by mask have a logit equal to -Inf and/or a zero-probability of being sampled.

rng::AbstractRNG=Random.default_rng()
is_sampling::Bool=false, whether to sample from the obtained normal categorical distribution (returns a Flux.OneHotArray z).
is_return_log_prob::Bool=false, whether to return the logits (i.e. the unnormalized logprobabilities) of getting the sampled actions in the given state.

Only applies if is_sampling is true and will return z, logits.

If is_sampling = false, returns only the logits obtained by a simple forward pass into model.

# ReinforcementLearningCore.CategoricalNetwork — Method

(model::CategoricalNetwork)([rng::AbstractRNG,] state::AbstractArray{<:Any, 3}, [mask::AbstractArray{Bool},] action_samples::Int)

Sample action_samples actions from each state. Returns a 3D tensor with dimensions (action_size x action_samples x batchsize). Always returns the logits of each action along in a tensor with the same dimensions. The optional argument mask must be an Array of Bool with the same size as state expect for the first dimension that must have the length of the action vector. Actions mapped to false by mask have a logit equal to -Inf and/or a zero-probability of being sampled.

# ReinforcementLearningCore.CovGaussianNetwork — Type

CovGaussianNetwork(;pre=identity, μ, Σ)

Returns μ and Σ when called where μ is the mean and Σ is a covariance matrix. Unlike GaussianNetwork, the output is 3-dimensional. μ has dimensions (action_size x 1 x batchsize) and Σ has dimensions (action_size x action_size x batchsize). The Σ head of the CovGaussianNetwork should not directly return a square matrix but a vector of length action_size x (action_size + 1) ÷ 2. This vector will contain elements of the uppertriangular cholesky decomposition of the covariance matrix, which is then reconstructed from it. Sample from MvNormal.(μ, Σ).