ReinforcementLearningBase.jl

# ReinforcementLearningBase.CHANCE_PLAYER — Constant

Basic player type for a random step in game.

# ReinforcementLearningBase.CONSTANT_SUM — Constant

Rewards of all players sum to a constant

# ReinforcementLearningBase.DETERMINISTIC — Constant

No ChancePlayer in the environment. And the game is fully deterministic.

# ReinforcementLearningBase.EXPLICIT_STOCHASTIC — Constant

Usually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob(env, player=chance_player(env)) must be defined.

# ReinforcementLearningBase.FULL_ACTION_SET — Constant

Alias for FullActionSet()

# ReinforcementLearningBase.GENERAL_SUM — Constant

Total rewards of all players may be different in each step

# ReinforcementLearningBase.IDENTICAL_UTILITY — Constant

Every player gets the same reward

# ReinforcementLearningBase.IMPERFECT_INFORMATION — Constant

The inner state of some players' observations may be different

# ReinforcementLearningBase.MINIMAL_ACTION_SET — Constant

Alias for MinimalActionSet()

# ReinforcementLearningBase.PERFECT_INFORMATION — Constant

All players observe the same state

# ReinforcementLearningBase.SAMPLED_STOCHASTIC — Constant

Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.

The chance player (chance_player(env)) must appears in the result of RLBase.players(env). The result of action_space(env, chance_player) should only contains one dummy action.

# ReinforcementLearningBase.SEQUENTIAL — Constant

Environment with the DynamicStyle of SEQUENTIAL must takes actions from different players one-by-one.

# ReinforcementLearningBase.SIMULTANEOUS — Constant

Environment with the DynamicStyle of SIMULTANEOUS must take in actions from some (or all) players at one time

# ReinforcementLearningBase.SPECTATOR — Constant

SPECTATOR

Spectator is a special player who doesn’t take any action.

# ReinforcementLearningBase.STEP_REWARD — Constant

Alias for StepReward()

# ReinforcementLearningBase.STOCHASTIC — Constant

No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.

# ReinforcementLearningBase.TERMINAL_REWARD — Constant

Only get reward at the end of environment

# ReinforcementLearningBase.ZERO_SUM — Constant

Rewards of all players sum to 0. A special case of [CONSTANT_SUM].

# ReinforcementLearningBase.AbstractEnv — Type

act!(env::AbstractEnv, action, player=current_player(env))

Super type of all reinforcement learning environments.

# ReinforcementLearningBase.AbstractEnvironmentModel — Type

TODO:

Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

Analytic gradient computation
Sampling-based planning
Model-based data generation
Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning

# ReinforcementLearningBase.AbstractPolicy — Type

plan!(π::AbstractPolicy, env) -> action

The policy is the most basic concept in reinforcement learning. Here an agent’s action is determined by a plan! which takes an environment and policy and returns an action.

See discussions here if you are wondering why we define the input as AbstractEnv instead of state.

The policy π may change its internal state but it shouldn’t change env. When it’s really necessary, remember to make a copy of env to keep the original env untouched.

# ReinforcementLearningBase.ConstantSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is constant.

# ReinforcementLearningBase.Deterministic — Type

AbstractChanceStyle for fully deterministic games without a ChancePlayer.

# ReinforcementLearningBase.Episodic — Type

The environment will terminate in finite steps.

# ReinforcementLearningBase.FullActionSet — Type

The action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET, legal_action_space and legal_action_space_mask must also be defined.

# ReinforcementLearningBase.GeneralSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is not constant.

# ReinforcementLearningBase.GoalState — Type

Use it to represent the goal state

# ReinforcementLearningBase.IdenticalUtility — Type

AbstractUtilityStyle for environments where all players get the same reward.

# ReinforcementLearningBase.ImperfectInformation — Type

Other Players actions are not known by other Players.

# ReinforcementLearningBase.InformationSet — Type

See the definition of information set

# ReinforcementLearningBase.InternalState — Type

Use it to represent the internal state.

# ReinforcementLearningBase.MinimalActionSet — Type

All actions in the action space of the environment are legal

# ReinforcementLearningBase.MultiAgent — Method

MultiAgent(n::Integer) -> MultiAgent{n}()

n must be ≥ 2.

# ReinforcementLearningBase.NeverEnding — Type

The environment can run infinitely.

# ReinforcementLearningBase.Observation — Type

Sometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}() as the default state style in this package.

See discussions here

# ReinforcementLearningBase.PerfectInformation — Type

All Players actions are visible to other Players.

# ReinforcementLearningBase.Sequential — Type

Players act one after the other.

# ReinforcementLearningBase.Simultaneous — Type

Players act at the same time.

# ReinforcementLearningBase.SingleAgent — Type

AbstractNumAgentStyle for environments with a single agent

# ReinforcementLearningBase.StepReward — Type

We can get reward after each step

# ReinforcementLearningBase.Stochastic — Type

Stochastic()

Default ChanceStyle.

# ReinforcementLearningBase.TerminalReward — Type

Only get reward at the end of environment

# ReinforcementLearningBase.ZeroSum — Type

AbstractUtilityStyle for environments where the sum of all players' rewards is equal to zero.

# Base.:== — Method

Base.:(==)(env1::T, env2::T) where T<:AbstractEnv

Only check the state of all players in the env.

# Base.copy — Method

Make an independent copy of env,

rng (if env has) is also copied!

# Random.seed! — Method

Set the seed of internal rng

# ReinforcementLearningBase.ActionStyle — Method

ActionStyle(env::AbstractEnv)

For environments of discrete actions, specify whether the current state of env contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET is returned.

# ReinforcementLearningBase.ChanceStyle — Method

ChanceStyle(env) = STOCHASTIC

Specify which role the chance plays in the env. Possible returns are:

STOCHASTIC. This is the default return.
DETERMINISTIC
EXPLICIT_STOCHASTIC
SAMPLED_STOCHASTIC

# ReinforcementLearningBase.DefaultStateStyle — Method

Specify the default state style when calling state(env).

# ReinforcementLearningBase.DynamicStyle — Method

DynamicStyle(env::AbstractEnv) = SEQUENTIAL

Only valid in environments with a NumAgentStyle of MultiAgent. Determine whether the players can play simultaneously or not. Possible returns are:

SEQUENTIAL. This is the default return.
SIMULTANEOUS.

# ReinforcementLearningBase.InformationStyle — Method

InformationStyle(env) = IMPERFECT_INFORMATION

Distinguish environments between PERFECT_INFORMATION and IMPERFECT_INFORMATION. IMPERFECT_INFORMATION is returned by default.

# ReinforcementLearningBase.NumAgentStyle — Method

NumAgentStyle(env)

Number of agents involved in the env. Possible returns are:

SingleAgent. This is the default return.
MultiAgent.

# ReinforcementLearningBase.RewardStyle — Method

Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD (the default one) or TERMINAL_REWARD.

Environments of TERMINAL_REWARD style can be viewed as a subset of environments of STEP_REWARD style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD style.

# ReinforcementLearningBase.StateStyle — Method

StateStyle(env::AbstractEnv)

Define the possible styles of state(env). Possible values are:

Observation{T}. This is the default return.
InternalState{T}
InformationSet{T}
You can also define your customized state style when necessary.

Or a tuple contains several of the above ones.

This is useful for environments which provide more than one kind of state.

# ReinforcementLearningBase.UtilityStyle — Method

UtilityStyle(env::AbstractEnv)

Specify the utility style in multi-agent environments. Possible values are:

GENERAL_SUM. The default return.
ZERO_SUM
CONSTANT_SUM
IDENTICAL_UTILITY

# ReinforcementLearningBase.action_space — Function

action_space(env, player=current_player(env))

Get all available actions from environment. See also: legal_action_space

# ReinforcementLearningBase.chance_player — Method

chance_player(env)

Only valid for environments with a chance player.

# ReinforcementLearningBase.child — Method

child(env::AbstractEnv, action)

Treat the env as a game tree. Create an independent child after applying action.

# ReinforcementLearningBase.current_player — Method

current_player(env)

Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player).

# ReinforcementLearningBase.is_terminated — Method

is_terminated(env, player=current_player(env))

# ReinforcementLearningBase.legal_action_space — Function

legal_action_space(env, player=current_player(env))

For environments of MINIMAL_ACTION_SET, the result is the same with action_space.

# ReinforcementLearningBase.legal_action_space_mask — Function

legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}

Required for environments of FULL_ACTION_SET. As a default implementation, legal_action_space_mask creates a mask of action_space with the subset legal_action_space.

# ReinforcementLearningBase.next_player! — Method

next_player!(env::E) where {E<:AbstractEnv}

Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential MultiAgent games should implement this method.

# ReinforcementLearningBase.optimise! — Method

RLBase.optimise!(π::AbstractPolicy, experience)

Optimise the policy π with online/offline experience or parameters.

# ReinforcementLearningBase.players — Method

players(env::RLBaseEnv)

Players in the game. This is a no-op for single-player games. MultiAgent games should implement this method.

# ReinforcementLearningBase.priority — Method

priority(π::AbstractPolicy, experience)

Usually used in offline policies to evaluate the priorities of the experience.

# ReinforcementLearningBase.prob — Function

Get the action distribution of chance player.

Only valid for environments of EXPLICIT_STOCHASTIC style. The current player of env must be the chance player.

# ReinforcementLearningBase.prob — Method

prob(π::AbstractPolicy, env, action)

Only valid for environments with discrete actions.

# ReinforcementLearningBase.prob — Method

prob(π::AbstractPolicy, env) -> Distribution

Get the probability distribution of actions based on policy π given an env.

# ReinforcementLearningBase.reset! — Method

Reset the internal state of an environment

# ReinforcementLearningBase.reward — Function

reward(env, player=current_player(env))

# ReinforcementLearningBase.simultaneous_player — Method

simultaneous_player(env)

Only valid for environments of SIMULTANEOUS style.

# ReinforcementLearningBase.spectator_player — Method

spectator_player(env)

Used in imperfect multi-agent environments.

# ReinforcementLearningBase.state — Method

state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.

The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.

# ReinforcementLearningBase.state_space — Method

state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

Describe all possible states.

# ReinforcementLearningBase.test_interfaces! — Method

Call this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.

# ReinforcementLearningBase.walk — Method

walk(f, env::AbstractEnv)

Call f with env and its descendants. Only use it with small games.