syna.utils.rl module¶

Reinforcement learning helpers.

Simple utilities for RL experiments: a ReplayBuffer and a Trainer class used in DQN-style example scripts.

Trainer expectations:

The provided agent must implement select_action(state), update(…),
and sync_qnet() for periodic target network syncs. Trainer keeps a simple reward history and plots at the end of training.

class syna.utils.rl.ReplayBuffer(buffer_size: int, batch_size: int)[source]¶

Bases: object

Simple FIFO replay buffer for storing transitions.

Args:: buffer_size: maximum number of transitions to store. batch_size: number of transitions returned by sample().

add(state: ndarray, action: int, reward: float, next_state: ndarray, done: bool) → None[source]¶: Add a transition to the buffer.

sample()[source]¶: Sample a minibatch of transitions. If there are fewer than batch_size transitions available, sample what’s present. Returns:

state, action, reward, next_state, done (numpy arrays)

class syna.utils.rl.Trainer(env_name='CartPole-v1', num_episodes=300, sync_interval=20, epsilon=0.1, agent=None)[source]¶

Bases: object

Trainer for running episodes with a provided agent and environment.

Args:: env_name: gymnasium environment id. num_episodes: number of training episodes. sync_interval: how often (episodes) to call agent.sync_qnet(). epsilon: epsilon value set on agent for the final evaluation run. agent: object implementing select_action(state), update(…), and sync_qnet().

train()[source]¶: Run training loop, plot rewards, and run one final evaluation episode.