The agent will constantly alter its behavior depending on the best Q value available to it. And as it explores more and more of the environment, it will eventually learn the most optimal state action pairs to perform in almost every situation. | The agent will constantly alter its behavior depending on the best Q value available to it. And as it explores more and more of the environment, it will eventually learn the most optimal state action pairs to perform in almost every situation. |