强化学习导论（Reinforcement Learning: An introduction）读书笔记-1

it2022-05-05 143

文章目录

Chapter 1 IntroductionChapter 1.1 Introduction-Reinforcement LearningRL包括哪些部分RL与ML中supervised learning的不同RL与ML中unsupervised learning的不同Exploration & exploitationGoal-directed特性 Chapter 1.2 ExampleChapter 1.3 Elements of Reinforcement LearningChapter 1.4 Limitations and ScopeThe Definition of StateMain Content

Chapter 1 Introduction

Chapter 1.1 Introduction-Reinforcement Learning

RL包括哪些部分

Reinforcement learning, like many topics whose names end with “ing” such as machine learning and mountaineering, is simultaneously a problem, a class of solution methods that work well on the problem, and the field that studies this problem and its solution methods. It is convenient to use a single name for all three things, but at the same time essential to keep the three conceptually separate. In particular, the distinction between problems and solution methods is very important in reinforcement learning; failing to make this distinction is the source of many confusions.

带有“ing”后缀的词都包含以下三个部分

一个问题一类解决问题的方法研究问题和方法的领域？

把他们区分开是非常有必要的，在之后的理解过程中，很多迷惑都是因为没有把这些区分清楚。

RL与ML中supervised learning的不同

In interactive problems it is often impractical to obtain examples of desired behavior that are both correct and representative of all the situations in which the agent has to act.

在互动性问题中，找到有以下两个特点的例子通常是不实际的：

正确的可以反映所有agent必须采取行动的情况

RL与ML中unsupervised learning的不同

Reinforcement learning is also different from what machine learning researchers call unsupervised learning, which is typically about finding structure hidden in collections of unlabeled data. The terms supervised learning and unsupervised learning would seem to exhaustively classify machine learning paradigms, but they do not. Although one might be tempted to think of reinforcement learning as a kind of unsupervised learning because it does not rely on examples of correct behavior, reinforcement learning is trying to maximize a reward signal instead of trying to find hidden structure.

RL的目标是最大化reward函数而不是寻找隐藏的结构关系，是除了监督学习，非监督学习之外另外的一种机器学习。

Exploration & exploitation

Exploitation: the action of making use of and benefiting from resources.

To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward.

agent更加喜欢之前发现过的而且被发现有用于产生reward的action。但是为了发现他们，需要去探索之前没有发现过的action，就造成了矛盾。

The agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.

exploit的目的：获得rewardexplore的目的：获得更好的action selection

单独追求任何一个方面都可能导致任务失败。

For now, we simply note that the entire issue of balancing exploration and exploitation does not even arise in supervised and unsupervised learning, at least in the purest forms of these paradigms.

平衡exploration和exploition的问题仍然没有被解决，暂时看作在监督学习和非监督学习中还不会出现。

Goal-directed特性

This is in contrast to many approaches that consider subproblems without addressing how they might fit into a larger picture. … Although these approaches (some tricks in machine learning) have yielded many useful results, their focus on isolated subproblems is a significant limitation.

直接表明了问题的目标而不是通过定义一些下一级别目标转换。

Chapter 1.2 Example

Chapter 1.3 Elements of Reinforcement Learning

POLICY: A mapping from perceived states of the environment to actions to be taken when in those states. 通过已经观测到的状态去估计这些状态下动作的值。?强化学习中最重要的部分REWARD SIGNAL: goal of reinforcement problem. 定义了瞬间action的好坏VALUE: the total amount of reward an agent can expect to accumulate over the future, starting from the state. 反映长期状态下一个状态的好坏MODEL: model-based & model-free

Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners—viewed as almost the opposite of planning.

Chapter 1.4 Limitations and Scope

The Definition of State

The formal definition of state as we use it here is given by the framework of Markov decision processes presented in Chapter 3. More generally, however, we encourage the reader to follow the informal meaning and think of the state as whatever information is available to the agent about its environment.

不同于传统MDP定义中的state，本文作者给出了对于state的新的理解：agent可以读取环境相关信息的任何状态。

Main Content

主要内容并不涉及设计state signal，而是在给定的state signal下如何确定action。主要部分都是设计如何估计value function，但是并不是所有RL问题都需要估计value function。遗传算法等方法不需要估计value Function。

专利

最新回复(0)