reinforcement_learning: rl_agent | rl_common | rl_env | rl_experiment | rl_msgs

Package Summary

A set of reinforcement learning agents to learn various tasks. In addition, a framework for model based agents where any model learning and/or planning methods can be substituted in. Developed by Todd Hester and Peter Stone at the University of Texas at Austin.

This package provides some reinforcement learning (RL) agents.

Documentation

Please take a look at the tutorial on how to install, compile, and use this package.

Check out the code at: http://code.google.com/p/rl-texplore-ros-pkg/source/checkout

This package includes a number of reinforcement learning agents that can be used for learning on robots, or learning with the environments in the accompanying rl_env package.

The package contains the following agents:

In addition to these methods, the package contains a general model-based architecture that can be used with any combinations of planners and model learning algorithms. For example, the R-Max implementation is simply the general agent with an R-Max model and Value Iteration for planning and the TEXPLORE agent is the general agent with a random forest model (Breiman 2001) and UCT (Kocsis and Szepesvari 2006) for planning.

Running the agent

The agent can be run with the following command. It should be initalized before starting the environment:

rosrun rl_agent agent --agent type [options]

where the agent type is one of the following:

qlearner sarsa modelbased rmax texplore dyna savedpolicy

There are a number of options to specify particular parameters of the algorithms:

  • --seed value (integer seed for random number generator)
  • --gamma value (discount factor between 0 and 1)
  • --epsilon value (epsilon for epsilon-greedy exploration)
  • --alpha value (learning rate alpha)
  • --initialvalue value (initial q values)
  • --actrate value (action selection rate (Hz))
  • --lamba value (lamba for eligibility traces)
  • --m value (parameter for R-Max)
  • --k value (For Dyna: # of model based updates to do between each real world update)
  • --history value (# steps of history to use for planning with delay)
  • --filename file (file to load saved policy from for savedpolicy agent)
  • --model type (tabular,tree,m5tree)
  • --planner type (vi,pi,sweeping,uct,parallel-uct,delayed-uct,delayed-parallel-uct)
  • --explore type (unknowns,greedy,epsilongreedy)
  • --combo type (average,best,separate)
  • --nmodels value (# of models)
  • --nstates value (optionally discreteize domain into value # of states on each feature)
  • --reltrans (learn relative transitions)
  • --abstrans (learn absolute transitions)
  • --prints (turn on debug printing of actions/rewards)

Example

For example, to run real-time TEXPLORE using 10 continuous trees, at an action rate of 25 Hz, with a discount factor of 0.99, you would call:

rosrun rl_agent agent --agent texplore --planner parallel-uct --nmodels 10 --model m5tree --actrate 25 --gamma 0.99

The General Model-Based Agent

Included in this package is a general model based agent that can use any model learning or planning method that match the interface defined by the core.hh file in the rl_common package.

The model learning methods that are available include:

With any of these types of models, multiple models can be combined together using the --nmodels option. For example, a random forest with 10 trees can be created with the options:

--model tree --nmodels 10

There are also a number of planning methods available:

Any of these model learning methods can be combined with any of the planners. It is also easy to write new model learning and planning methods that match the interface defined in rl_common and use those as well. In addition, there are multiple ways of performing exploration:

  • epsilon-greedy: take a random action epsilon of the time, act greedily otherwise
  • greedy: always be greedy
  • unknowns: provide r-max like bonuses to unknown state-action pairs

How the RL Agent interacts with the Environment

The RL agent can interact with the environment in two ways: it can use the ROS messages defined in the rl_msgs package, or another method can call the agent and environment methods directly, as done in the rl_experiment package.

Using rl_msgs

The rl_msgs package defines a set of ROS messages for the agent and environment to communicate. These are similar to the messages used in RL-Glue (Tanner and White 2009), but simplified and defined in the ROS message format. The environment publishes three types of messages for the agent:

  • rl_msgs/RLEnvDescription: this message describes the environment, number of actions, number of features, if its episodic, etc.

  • rl_msgs/RLEnvSeedExperience: this message provides an experience seed for the agent to use for learning.

  • rl_msgs/RLStateReward: this is a message from the environment with the agent's new state and reward received on this time step.

The environment subscribes to one type of message from the agent:

  • rl_msgs/RLAction: this message sends the environment the action that the agent has selected.

  • rl_msgs/RLExperimentInfo: this message provides information on the results of the latest episode of the experiment.

When the environment is created, it sends an RLEnvDescription message to the agent. Then it will send any experience seeds for the agent in a series of RLEnvSeedExperience messages. Then it will send the agent an RLStateReward message with the agent's initial state in the domain. It should then receive an RLAction message, which it can apply to the domain and send a new RLStateReward message. When the episode has ended, the environment will receive an RLExperimentInfo message from the agent, and it will reset the domain and send the agent a new RLStateReward message with its initial state in the new episode.

Calling methods directly

Experiments can also be run by calling the agent methods directly (as done in the rl_experiment package). The methods that all Agents must implement are defined in the Agent interface in the rl_common package (API). Seeds can be given to the method by calling the seedExp method. The agent can be queried for an action after getting a new state and reward by calling next_action(reward, state).

References

Wiki: rl_agent (last edited 2011-05-23 23:13:57 by MeloneeWise)