ReinforcementLearningBase.jl
#
ReinforcementLearningBase.CHANCE_PLAYER
— Constant
Basic player type for a random step in game.
#
ReinforcementLearningBase.CONSTANT_SUM
— Constant
Rewards of all players sum to a constant
#
ReinforcementLearningBase.DETERMINISTIC
— Constant
No ChancePlayer
in the environment. And the game is fully deterministic.
#
ReinforcementLearningBase.EXPLICIT_STOCHASTIC
— Constant
Usually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob
(env, player=chance_player(env))
must be defined.
#
ReinforcementLearningBase.FULL_ACTION_SET
— Constant
Alias for FullActionSet()
#
ReinforcementLearningBase.GENERAL_SUM
— Constant
Total rewards of all players may be different in each step
#
ReinforcementLearningBase.IDENTICAL_UTILITY
— Constant
Every player gets the same reward
#
ReinforcementLearningBase.IMPERFECT_INFORMATION
— Constant
The inner state of some players' observations may be different
#
ReinforcementLearningBase.MINIMAL_ACTION_SET
— Constant
Alias for MinimalActionSet()
#
ReinforcementLearningBase.PERFECT_INFORMATION
— Constant
All players observe the same state
#
ReinforcementLearningBase.SAMPLED_STOCHASTIC
— Constant
Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.
The chance player ( |
#
ReinforcementLearningBase.SEQUENTIAL
— Constant
Environment with the DynamicStyle
of SEQUENTIAL
must takes actions from different players one-by-one.
#
ReinforcementLearningBase.SIMULTANEOUS
— Constant
Environment with the DynamicStyle
of SIMULTANEOUS
must take in actions from some (or all) players at one time
#
ReinforcementLearningBase.SPECTATOR
— Constant
SPECTATOR
Spectator is a special player who doesn’t take any action.
#
ReinforcementLearningBase.STEP_REWARD
— Constant
Alias for StepReward()
#
ReinforcementLearningBase.STOCHASTIC
— Constant
No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG
as a keyword argument. For some third-party environments, at least a seed
is exposed in the constructor.
#
ReinforcementLearningBase.TERMINAL_REWARD
— Constant
Only get reward at the end of environment
#
ReinforcementLearningBase.ZERO_SUM
— Constant
Rewards of all players sum to 0. A special case of [CONSTANT_SUM
].
#
ReinforcementLearningBase.AbstractEnv
— Type
act!(env::AbstractEnv, action, player=current_player(env))
Super type of all reinforcement learning environments.
#
ReinforcementLearningBase.AbstractEnvironmentModel
— Type
TODO:
Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/
-
Analytic gradient computation
-
Sampling-based planning
-
Model-based data generation
-
Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning
#
ReinforcementLearningBase.AbstractPolicy
— Type
plan!(π::AbstractPolicy, env) -> action
The policy is the most basic concept in reinforcement learning. Here an agent’s action is determined by a plan!
which takes an environment and policy and returns an action.
See discussions here if you are wondering why we define the input as |
The policy |
#
ReinforcementLearningBase.ConstantSum
— Type
AbstractUtilityStyle
for environments where the sum of all players' rewards is constant.
#
ReinforcementLearningBase.Deterministic
— Type
AbstractChanceStyle
for fully deterministic games without a ChancePlayer
.
#
ReinforcementLearningBase.Episodic
— Type
The environment will terminate in finite steps.
#
ReinforcementLearningBase.FullActionSet
— Type
The action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET
, legal_action_space
and legal_action_space_mask
must also be defined.
#
ReinforcementLearningBase.GeneralSum
— Type
AbstractUtilityStyle
for environments where the sum of all players' rewards is not constant.
#
ReinforcementLearningBase.GoalState
— Type
Use it to represent the goal state
#
ReinforcementLearningBase.IdenticalUtility
— Type
AbstractUtilityStyle
for environments where all players get the same reward.
#
ReinforcementLearningBase.ImperfectInformation
— Type
Other Player
s actions are not known by other Player
s.
#
ReinforcementLearningBase.InformationSet
— Type
See the definition of information set
#
ReinforcementLearningBase.InternalState
— Type
Use it to represent the internal state.
#
ReinforcementLearningBase.MinimalActionSet
— Type
All actions in the action space of the environment are legal
#
ReinforcementLearningBase.MultiAgent
— Method
MultiAgent(n::Integer) -> MultiAgent{n}()
n
must be ≥ 2.
#
ReinforcementLearningBase.NeverEnding
— Type
The environment can run infinitely.
#
ReinforcementLearningBase.PerfectInformation
— Type
All Player
s actions are visible to other Player
s.
#
ReinforcementLearningBase.Sequential
— Type
Player
s act one after the other.
#
ReinforcementLearningBase.Simultaneous
— Type
Player
s act at the same time.
#
ReinforcementLearningBase.SingleAgent
— Type
AbstractNumAgentStyle for environments with a single agent
#
ReinforcementLearningBase.StepReward
— Type
We can get reward after each step
#
ReinforcementLearningBase.TerminalReward
— Type
Only get reward at the end of environment
#
ReinforcementLearningBase.ZeroSum
— Type
AbstractUtilityStyle
for environments where the sum of all players' rewards is equal to zero.
#
Base.:==
— Method
Base.:(==)(env1::T, env2::T) where T<:AbstractEnv
Only check the state of all players in the env. |
#
Random.seed!
— Method
Set the seed of internal rng
#
ReinforcementLearningBase.ActionStyle
— Method
ActionStyle(env::AbstractEnv)
For environments of discrete actions, specify whether the current state of env
contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET
is returned.
#
ReinforcementLearningBase.ChanceStyle
— Method
ChanceStyle(env) = STOCHASTIC
Specify which role the chance plays in the env
. Possible returns are:
-
STOCHASTIC
. This is the default return.
#
ReinforcementLearningBase.DefaultStateStyle
— Method
Specify the default state style when calling state(env)
.
#
ReinforcementLearningBase.DynamicStyle
— Method
DynamicStyle(env::AbstractEnv) = SEQUENTIAL
Only valid in environments with a NumAgentStyle
of MultiAgent
. Determine whether the players can play simultaneously or not. Possible returns are:
-
SEQUENTIAL
. This is the default return.
#
ReinforcementLearningBase.InformationStyle
— Method
InformationStyle(env) = IMPERFECT_INFORMATION
Distinguish environments between PERFECT_INFORMATION
and IMPERFECT_INFORMATION
. IMPERFECT_INFORMATION
is returned by default.
#
ReinforcementLearningBase.NumAgentStyle
— Method
NumAgentStyle(env)
Number of agents involved in the env
. Possible returns are:
-
SingleAgent
. This is the default return.
#
ReinforcementLearningBase.RewardStyle
— Method
Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD
(the default one) or TERMINAL_REWARD
.
Environments of |
#
ReinforcementLearningBase.StateStyle
— Method
StateStyle(env::AbstractEnv)
Define the possible styles of state(env)
. Possible values are:
-
Observation{T}
. This is the default return. -
You can also define your customized state style when necessary.
Or a tuple contains several of the above ones.
This is useful for environments which provide more than one kind of state.
#
ReinforcementLearningBase.UtilityStyle
— Method
UtilityStyle(env::AbstractEnv)
Specify the utility style in multi-agent environments. Possible values are:
-
GENERAL_SUM. The default return.
#
ReinforcementLearningBase.action_space
— Function
action_space(env, player=current_player(env))
Get all available actions from environment. See also: legal_action_space
#
ReinforcementLearningBase.chance_player
— Method
chance_player(env)
Only valid for environments with a chance player.
#
ReinforcementLearningBase.child
— Method
child(env::AbstractEnv, action)
Treat the env
as a game tree. Create an independent child after applying action
.
#
ReinforcementLearningBase.current_player
— Method
current_player(env)
Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player
) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player
).
#
ReinforcementLearningBase.is_terminated
— Method
is_terminated(env, player=current_player(env))
#
ReinforcementLearningBase.legal_action_space
— Function
legal_action_space(env, player=current_player(env))
For environments of MINIMAL_ACTION_SET
, the result is the same with action_space
.
#
ReinforcementLearningBase.legal_action_space_mask
— Function
legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}
Required for environments of FULL_ACTION_SET
. As a default implementation, legal_action_space_mask
creates a mask of action_space
with the subset legal_action_space
.
#
ReinforcementLearningBase.next_player!
— Method
next_player!(env::E) where {E<:AbstractEnv}
Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential
MultiAgent
games should implement this method.
#
ReinforcementLearningBase.optimise!
— Method
RLBase.optimise!(π::AbstractPolicy, experience)
Optimise the policy π
with online/offline experience or parameters.
#
ReinforcementLearningBase.players
— Method
players(env::RLBaseEnv)
Players in the game. This is a no-op for single-player games. MultiAgent
games should implement this method.
#
ReinforcementLearningBase.priority
— Method
priority(π::AbstractPolicy, experience)
Usually used in offline policies to evaluate the priorities of the experience.
#
ReinforcementLearningBase.prob
— Function
Get the action distribution of chance player.
Only valid for environments of |
#
ReinforcementLearningBase.prob
— Method
prob(π::AbstractPolicy, env, action)
Only valid for environments with discrete actions.
#
ReinforcementLearningBase.prob
— Method
prob(π::AbstractPolicy, env) -> Distribution
Get the probability distribution of actions based on policy π
given an env
.
#
ReinforcementLearningBase.reset!
— Method
Reset the internal state of an environment
#
ReinforcementLearningBase.reward
— Function
reward(env, player=current_player(env))
#
ReinforcementLearningBase.simultaneous_player
— Method
simultaneous_player(env)
Only valid for environments of SIMULTANEOUS
style.
#
ReinforcementLearningBase.spectator_player
— Method
spectator_player(env)
Used in imperfect multi-agent environments.
#
ReinforcementLearningBase.state
— Method
state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
The state can be of any type. However, most neural network based algorithms assume an AbstractArray
is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style
to declare which kind of state they want.
The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect. |
#
ReinforcementLearningBase.state_space
— Method
state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
Describe all possible states.
#
ReinforcementLearningBase.test_interfaces!
— Method
Call this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.
#
ReinforcementLearningBase.walk
— Method
walk(f, env::AbstractEnv)
Call f
with env
and its descendants. Only use it with small games.