Learning a BProgram as a gym environment ++++++++++++++++++++++++++++++++++++++++ This example demonstrates how to learn a BProgram as a gym environment, using the package's extension to the OpenAI gym. In this extension, we have incorporated a :code:`localReward` parameter into the yield statement, reflecting the system's preferences. The :class:`BPEnv ` class implementation requires a b-program generator - a function that creates a new instance of the b-program and the list of program events. The default observation space for the b-program within :class:`BPEnv ` is represented as a Cartesian product of the b-thread's execution points, classified as multi-discrete. For developers seeking to tailor observation space to specific needs, alternative implementations can be created by extending the abstract class :class:`BPObservationSpace `, which includes access to both the b-thread's execution point and its local variables. The Reward computation at each state is determined through a function that receives the reward statements from all b-threads. The default approach calculates the total reward at each yield point by summing the individual rewards from all active b-threads. .. literalinclude :: ../../examples/bp_gym_env.py Note that not all events are necessarily considered actions. This distinction enables discernment between controllable and uncontrollable program behaviors. For instance, the following b-program implements the `frozen lake environment `_: .. literalinclude :: ../../examples/frozen_lake_env.py