A reinforcement learning approach to dynamic norm generation
University of New Brunswick
This thesis proposes a two-level learning framework for dynamic norm generation. This framework uses the Bayesian reinforcement learning technique to extract behavioral norms and domain-dependent knowledge in a certain environment and later incorporates them into the learning agents in different settings. Reinforcement learning (RL) and norms are mutually beneficial: norms can be extracted through RL, and RL can be improved by incorporating behavioral norms as prior probability distributions into learning agents. An agent should be confident about its beliefs in order to generalize them and use them in future settings. The confidence level is developed by checking two conditions: how familiar the agent is with the current world and its dynamics (including the norm system), and whether it has converged to an optimal policy. A Bayesian dynamic programming technique is implemented and then compared to other methods such as Q-learning and Dyna. It is shown that Bayesian RL outperforms other techniques in finding the best equilibrium for the exploration-exploitation problem. This thesis demonstrates how an agent can extract behavioral norms and adapt its beliefs based on the domain knowledge it has acquired through the learning process. Scenarios with different percentages of similarity and goals are examined. The experimental results show that the normative agent, having been trained in an initial environment, is able to adjust its beliefs about the dynamics and behavioral norms in a new environment, and thus it converges to the optimal policy more quickly, especially in the early stages of learning.