Engee documentation

ReinforcementLearningBase.jl

Basic player type for a random step in game.

Rewards of all players sum to a constant

No ChancePlayer in the environment. And the game is fully deterministic.

Usually used to describe extensive-form game. The environment contains a chance player and the corresponding probability is known. Therefore, prob(env, player=chance_player(env)) must be defined.

Alias for FullActionSet()

Total rewards of all players may be different in each step

Every player gets the same reward

The inner state of some players' observations may be different

Alias for MinimalActionSet()

All players observe the same state

Environment contains chance player and the probability is unknown. Usually only a dummy action is allowed in this case.

The chance player (chance_player(env)) must appears in the result of RLBase.players(env). The result of action_space(env, chance_player) should only contains one dummy action.

Environment with the DynamicStyle of SEQUENTIAL must takes actions from different players one-by-one.

Environment with the DynamicStyle of SIMULTANEOUS must take in actions from some (or all) players at one time

SPECTATOR

Spectator is a special player who doesn’t take any action.

Alias for StepReward()

No chance player in the environment. And the game is stochastic. To help increase reproducibility, these environments should generally accept a AbstractRNG as a keyword argument. For some third-party environments, at least a seed is exposed in the constructor.

Only get reward at the end of environment

Rewards of all players sum to 0. A special case of [CONSTANT_SUM].

act!(env::AbstractEnv, action, player=current_player(env))

Super type of all reinforcement learning environments.

TODO:

Describe how to model a reinforcement learning environment. TODO: need more investigation Ref: https://bair.berkeley.edu/blog/2019/12/12/mbpo/

plan!(π::AbstractPolicy, env) -> action

The policy is the most basic concept in reinforcement learning. Here an agent’s action is determined by a plan! which takes an environment and policy and returns an action.

See discussions here if you are wondering why we define the input as AbstractEnv instead of state.

The policy π may change its internal state but it shouldn’t change env. When it’s really necessary, remember to make a copy of env to keep the original env untouched.

AbstractUtilityStyle for environments where the sum of all players' rewards is constant.

AbstractChanceStyle for fully deterministic games without a ChancePlayer.

The environment will terminate in finite steps.

The action space of the environment may contains illegal actions. For environments of FULL_ACTION_SET, legal_action_space and legal_action_space_mask must also be defined.

AbstractUtilityStyle for environments where the sum of all players' rewards is not constant.

Use it to represent the goal state

AbstractUtilityStyle for environments where all players get the same reward.

Other Players actions are not known by other Players.

See the definition of information set

Use it to represent the internal state.

All actions in the action space of the environment are legal

MultiAgent(n::Integer) -> MultiAgent{n}()

n must be ≥ 2.

The environment can run infinitely.

Sometimes people from different field talk about the same thing with a different name. Here we set the Observation{Any}() as the default state style in this package.

See discussions here

All Players actions are visible to other Players.

Players act one after the other.

Players act at the same time.

AbstractNumAgentStyle for environments with a single agent

We can get reward after each step

Stochastic()

Default ChanceStyle.

Only get reward at the end of environment

AbstractUtilityStyle for environments where the sum of all players' rewards is equal to zero.

Base.:(==)(env1::T, env2::T) where T<:AbstractEnv

Only check the state of all players in the env.

Make an independent copy of env,

rng (if env has) is also copied!

Set the seed of internal rng

ActionStyle(env::AbstractEnv)

For environments of discrete actions, specify whether the current state of env contains a full action set or a minimal action set. By default the MINIMAL_ACTION_SET is returned.

ChanceStyle(env) = STOCHASTIC

Specify which role the chance plays in the env. Possible returns are:

Specify the default state style when calling state(env).

DynamicStyle(env::AbstractEnv) = SEQUENTIAL

Only valid in environments with a NumAgentStyle of MultiAgent. Determine whether the players can play simultaneously or not. Possible returns are:

InformationStyle(env) = IMPERFECT_INFORMATION

Distinguish environments between PERFECT_INFORMATION and IMPERFECT_INFORMATION. IMPERFECT_INFORMATION is returned by default.

NumAgentStyle(env)

Number of agents involved in the env. Possible returns are:

Specify whether we can get reward after each step or only at the end of an game. Possible values are STEP_REWARD (the default one) or TERMINAL_REWARD.

Environments of TERMINAL_REWARD style can be viewed as a subset of environments of STEP_REWARD style. For some algorithms, like MCTS, we may have some a more efficient implementation for environments of TERMINAL_REWARD style.

StateStyle(env::AbstractEnv)

Define the possible styles of state(env). Possible values are:

Or a tuple contains several of the above ones.

This is useful for environments which provide more than one kind of state.

UtilityStyle(env::AbstractEnv)

Specify the utility style in multi-agent environments. Possible values are:

action_space(env, player=current_player(env))

Get all available actions from environment. See also: legal_action_space

chance_player(env)

Only valid for environments with a chance player.

child(env::AbstractEnv, action)

Treat the env as a game tree. Create an independent child after applying action.

current_player(env)

Return the next player to take action. For Extensive Form Games, a chance player may be returned. (See also chance_player) For SIMULTANEOUS environments, a simultaneous player is always returned. (See also simultaneous_player).

is_terminated(env, player=current_player(env))
legal_action_space(env, player=current_player(env))

For environments of MINIMAL_ACTION_SET, the result is the same with action_space.

legal_action_space_mask(env, player=current_player(env)) -> AbstractArray{Bool}

Required for environments of FULL_ACTION_SET. As a default implementation, legal_action_space_mask creates a mask of action_space with the subset legal_action_space.

next_player!(env::E) where {E<:AbstractEnv}

Advance to the next player. This is a no-op for single-player and simultaneous games. Sequential MultiAgent games should implement this method.

RLBase.optimise!(π::AbstractPolicy, experience)

Optimise the policy π with online/offline experience or parameters.

players(env::RLBaseEnv)

Players in the game. This is a no-op for single-player games. MultiAgent games should implement this method.

priority(π::AbstractPolicy, experience)

Usually used in offline policies to evaluate the priorities of the experience.

Get the action distribution of chance player.

Only valid for environments of EXPLICIT_STOCHASTIC style. The current player of env must be the chance player.

prob(π::AbstractPolicy, env, action)

Only valid for environments with discrete actions.

prob(π::AbstractPolicy, env) -> Distribution

Get the probability distribution of actions based on policy π given an env.

Reset the internal state of an environment

reward(env, player=current_player(env))
simultaneous_player(env)

Only valid for environments of SIMULTANEOUS style.

spectator_player(env)

Used in imperfect multi-agent environments.

state(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

The state can be of any type. However, most neural network based algorithms assume an AbstractArray is returned. For environments with many different states provided (inner state, information state, etc), users need to provide style to declare which kind of state they want.

The state may be reused and be mutated at each step. Always remember to make a copy if this is not what you expect.

state_space(env, style=[DefaultStateStyle(env)], player=[current_player(env)])

Describe all possible states.

Call this function after writing your customized environment to make sure that all the necessary interfaces are implemented correctly and consistently.

walk(f, env::AbstractEnv)

Call f with env and its descendants. Only use it with small games.