ReinforcementLearningBase.jl
#
ReinforcementLearningBase.CHANCE_PLAYER — Constant
Basic player type for a random step in game.
#
ReinforcementLearningBase.CONSTANT_SUM — Constant
Rewards of all players sum to a constant
#
ReinforcementLearningBase.DETERMINISTIC — Constant
No ChancePlayer in the environment. And the game is fully deterministic.
#
ReinforcementLearningBase.EXPLICIT_STOCHASTIC — Constant
Usually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob(env, player=chance_player(env)) must be defined.
#
ReinforcementLearningBase.FULL_ACTION_SET — Constant
Alias for FullActionSet()
#
ReinforcementLearningBase.GENERAL_SUM — Constant
Total rewards of all players may be different in each step
#
ReinforcementLearningBase.IDENTICAL_UTILITY — Constant
Every player gets the same reward
#
ReinforcementLearningBase.IMPERFECT_INFORMATION — Constant
The inner state of some players' observations may be different
#
ReinforcementLearningBase.MINIMAL_ACTION_SET — Constant
Alias for MinimalActionSet()
#
ReinforcementLearningBase.PERFECT_INFORMATION — Constant
All players observe the same state
#
ReinforcementLearningBase.SAMPLED_STOCHASTIC — Constant
Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.
|
The chance player ( |
#
ReinforcementLearningBase.SEQUENTIAL — Constant
Environment with the DynamicStyle of SEQUENTIAL must takes actions from different players one-by-one.
#
ReinforcementLearningBase.SIMULTANEOUS — Constant
Environment with the DynamicStyle of SIMULTANEOUS must take in actions from some (or all) players at one time
#
ReinforcementLearningBase.SPECTATOR — Constant
SPECTATOR
Spectator is a special player who doesn’t take any action.
#
ReinforcementLearningBase.STEP_REWARD — Constant
Alias for StepReward()
#
ReinforcementLearningBase.STOCHASTIC — Constant
No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.
#
ReinforcementLearningBase.TERMINAL_REWARD — Constant
Only get reward at the end of environment
#
ReinforcementLearningBase.ZERO_SUM — Constant
Rewards of all players sum to 0. A special case of [CONSTANT_SUM].
#
ReinforcementLearningBase.AbstractEnv — Type
act!(env::AbstractEnv, action, player=current_player(env))
Super type of all reinforcement learning environments.
#
ReinforcementLearningBase.AbstractEnvironmentModel — Type
TODO:
Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/
-
Analytic gradient computation
-
Sampling-based planning
-
Model-based data generation
-
Value-equivalence prediction Model-based Reinforcement Learning: A Survey. Tutorial on Model-Based Methods in Reinforcement Learning
#
ReinforcementLearningBase.AbstractPolicy — Type
plan!(π::AbstractPolicy, env) -> action
The policy is the most basic concept in reinforcement learning. Here an agent’s action is determined by a plan! which takes an environment and policy and returns an action.
|
See discussions here if you are wondering why we define the input as |
|
The policy |
#
ReinforcementLearningBase.ConstantSum — Type
AbstractUtilityStyle for environments where the sum of all players' rewards is constant.
#
ReinforcementLearningBase.Deterministic — Type
AbstractChanceStyle for fully deterministic games without a ChancePlayer.
#
ReinforcementLearningBase.Episodic — Type
The environment will terminate in finite steps.
#
ReinforcementLearningBase.FullActionSet — Type
The action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET, legal_action_space and legal_action_space_mask must also be defined.
#
ReinforcementLearningBase.GeneralSum — Type
AbstractUtilityStyle for environments where the sum of all players' rewards is not constant.
#
ReinforcementLearningBase.GoalState — Type
Use it to represent the goal state
#
ReinforcementLearningBase.IdenticalUtility — Type
AbstractUtilityStyle for environments where all players get the same reward.
#
ReinforcementLearningBase.ImperfectInformation — Type
Other Players actions are not known by other Players.
#
ReinforcementLearningBase.InformationSet — Type
See the definition of information set
#
ReinforcementLearningBase.InternalState — Type
Use it to represent the internal state.
#
ReinforcementLearningBase.MinimalActionSet — Type
All actions in the action space of the environment are legal
#
ReinforcementLearningBase.MultiAgent — Method
MultiAgent(n::Integer) -> MultiAgent{n}()
n must be ≥ 2.
#
ReinforcementLearningBase.NeverEnding — Type
The environment can run infinitely.
#
ReinforcementLearningBase.PerfectInformation — Type
All Players actions are visible to other Players.
#
ReinforcementLearningBase.Sequential — Type
Players act one after the other.
#
ReinforcementLearningBase.Simultaneous — Type
Players act at the same time.
#
ReinforcementLearningBase.SingleAgent — Type
AbstractNumAgentStyle for environments with a single agent
#
ReinforcementLearningBase.StepReward — Type
We can get reward after each step
#
ReinforcementLearningBase.TerminalReward — Type
Only get reward at the end of environment
#
ReinforcementLearningBase.ZeroSum — Type
AbstractUtilityStyle for environments where the sum of all players' rewards is equal to zero.
#
Base.:== — Method
Base.:(==)(env1::T, env2::T) where T<:AbstractEnv
|
Only check the state of all players in the env. |
#
Random.seed! — Method
Set the seed of internal rng
#
ReinforcementLearningBase.ActionStyle — Method
ActionStyle(env::AbstractEnv)
For environments of discrete actions, specify whether the current state of env contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET is returned.
#
ReinforcementLearningBase.ChanceStyle — Method
ChanceStyle(env) = STOCHASTIC
Specify which role the chance plays in the env. Possible returns are:
-
STOCHASTIC. This is the default return.
#
ReinforcementLearningBase.DefaultStateStyle — Method
Specify the default state style when calling state(env).
#
ReinforcementLearningBase.DynamicStyle — Method
DynamicStyle(env::AbstractEnv) = SEQUENTIAL
Only valid in environments with a NumAgentStyle of MultiAgent. Determine whether the players can play simultaneously or not. Possible returns are:
-
SEQUENTIAL. This is the default return.
#
ReinforcementLearningBase.InformationStyle — Method
InformationStyle(env) = IMPERFECT_INFORMATION
Distinguish environments between PERFECT_INFORMATION and IMPERFECT_INFORMATION. IMPERFECT_INFORMATION is returned by default.
#
ReinforcementLearningBase.NumAgentStyle — Method
NumAgentStyle(env)
Number of agents involved in the env. Possible returns are:
-
SingleAgent. This is the default return.
#
ReinforcementLearningBase.RewardStyle — Method
Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD (the default one) or TERMINAL_REWARD.
|
Environments of |
#
ReinforcementLearningBase.StateStyle — Method
StateStyle(env::AbstractEnv)
Define the possible styles of state(env). Possible values are:
-
Observation{T}. This is the default return. -
You can also define your customized state style when necessary.
Or a tuple contains several of the above ones.
This is useful for environments which provide more than one kind of state.
#
ReinforcementLearningBase.UtilityStyle — Method
UtilityStyle(env::AbstractEnv)
Specify the utility style in multi-agent environments. Possible values are:
-
GENERAL_SUM. The default return.
#
ReinforcementLearningBase.action_space — Function
action_space(env, player=current_player(env))
Get all available actions from environment. See also: legal_action_space
#
ReinforcementLearningBase.chance_player — Method
chance_player(env)
Only valid for environments with a chance player.
#
ReinforcementLearningBase.child — Method
child(env::AbstractEnv, action)
Treat the env as a game tree. Create an independent child after applying action.
#
ReinforcementLearningBase.current_player — Method
current_player(env)
Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player).
#
ReinforcementLearningBase.is_terminated — Method
is_terminated(env, player=current_player(env))
#
ReinforcementLearningBase.legal_action_space — Function
legal_action_space(env, player=current_player(env))
For environments of MINIMAL_ACTION_SET, the result is the same with action_space.
#
ReinforcementLearningBase.legal_action_space_mask — Function
legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}
Required for environments of FULL_ACTION_SET. As a default implementation, legal_action_space_mask creates a mask of action_space with the subset legal_action_space.
#
ReinforcementLearningBase.next_player! — Method
next_player!(env::E) where {E<:AbstractEnv}
Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential MultiAgent games should implement this method.
#
ReinforcementLearningBase.optimise! — Method
RLBase.optimise!(π::AbstractPolicy, experience)
Optimise the policy π with online/offline experience or parameters.
#
ReinforcementLearningBase.players — Method
players(env::RLBaseEnv)
Players in the game. This is a no-op for single-player games. MultiAgent games should implement this method.
#
ReinforcementLearningBase.priority — Method
priority(π::AbstractPolicy, experience)
Usually used in offline policies to evaluate the priorities of the experience.
#
ReinforcementLearningBase.prob — Function
Get the action distribution of chance player.
|
Only valid for environments of |
#
ReinforcementLearningBase.prob — Method
prob(π::AbstractPolicy, env, action)
Only valid for environments with discrete actions.
#
ReinforcementLearningBase.prob — Method
prob(π::AbstractPolicy, env) -> Distribution
Get the probability distribution of actions based on policy π given an env.
#
ReinforcementLearningBase.reset! — Method
Reset the internal state of an environment
#
ReinforcementLearningBase.reward — Function
reward(env, player=current_player(env))
#
ReinforcementLearningBase.simultaneous_player — Method
simultaneous_player(env)
Only valid for environments of SIMULTANEOUS style.
#
ReinforcementLearningBase.spectator_player — Method
spectator_player(env)
Used in imperfect multi-agent environments.
#
ReinforcementLearningBase.state — Method
state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.
|
The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect. |
#
ReinforcementLearningBase.state_space — Method
state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])
Describe all possible states.
#
ReinforcementLearningBase.test_interfaces! — Method
Call this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.
#
ReinforcementLearningBase.walk — Method
walk(f, env::AbstractEnv)
Call f with env and its descendants. Only use it with small games.