OpenAI Five 中的 attention/图文 - 墨之科技

i 收藏

OpenAI Five 中的 attention

引言

所谓的 attention，实际是关系如何计算出一个权重的问题。

linear projection

因为 Available actions 的数量是变化的，而 FC 的节点数是固定的，这导致不能仅使用 softmax 来选取动作。假设 FC 的节点数为 n，则 one-hot 形式的 Available actions 被 Embedding 转换成长度为 n 的向量。

OpenAI Five 的论文将之称为 linear projection：

The primary action is chosen via a linear projection over the available actions.

得到 Available actions

For many of the actions we wrote simple action filters, which determine whether the action is available; these check if there is a valid target nearby, if the ability/item is on cooldown, etc. At each timestep we restrict the set of available actions using these filters and present the final choices to the model.

attention: weighted by mask

这两幅图的效果是一样的。不妨看第二幅图，首先对于 Units 的选择，仍然是 embedding-dot 组合来实现 linear projection。而在红色标记的地方，乘以了一个权重，sigmoid 得到 0 与 1 之间的权重。

问题

embedding 如何被训练？

{{login["user_name"]}} 退出

登录

图文信息

上一条
下一条
全部	全部图文

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

Guided Policy Search 引导策略搜索

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

scale * np.clip(np.random.normal(0, 1, (2,)), -3, 3)

get_circle_points 代码

TD3_BC 与 BC 训练结果

python 将 list 中的 dict 进行组合

.sh 枚举与遍历例子

css :hover 和前面的冒号不能有空格

random.shuffle(data)

python tree.map_structure

python dict 迭代 for key, value in d.items()

一种调用 softlearning 的方式，代码，类

python numpy x[None, ..., None]

flask debug=True 为什么会启动两次

类的继承与属性复制例子

gym Box 的两种定义方式

No registered env with id: halfcheetah-v2

DDPG HalfCheetah-v2 reward

.sh 文件自动输入密码的两种方式

DigitsFlow 设计

sys.path.append

d4rl dataset halfcheetah-expert-v2

shm 不断上涨的问题

mimetype x-mixed-replace boundary

WebSocket 推视频的优势

TD3_BC hopper，halfcheetah 实验结果

https 会加密哪些内容？

Python 类（实例）销毁

TD3_BC halfcheetah-v2 实验结果

mysql 修改密码、登录

PlanT 如何更新编辑的组件

视频识别在工程中的应用

图像位姿自动校准

Bus error (core dumped)

ocr.pytorch RCNN

pytorch errno 28 no space left on device

PlanT 如何实现 style 的 scoped

PlanT post 接口

websocket 接收数据的方法

传多个参数的方式

PlanT 是如何实现前端推送更新的？

Python 调用 dll 与回调

Python 调用 dll

python c++ 共享内存

ffmpeg 视频文件 rtsp

ffmpeg yuv rtsp

导入 Vue 组件

PlanT，一个没有前端的前端设计网站

自动重启，避免不断重启

python 显示隐藏终端、控制台

python 如何关闭 os.system 启动的程序？

cdn vue-quill-editor.js

hough detection 直线的表示

TD3_BC 未开始 Q 网络训练时，actor_Q_loss_list 为什么明显达不到 2.5

learn opencv hough line detection 代码

深度学习直线特征检测 line feature detection

TD3_BC LunarLander-V2 Critic loss 下降为什么分瓣？

在 TD3_BC 中，先训练好 policy 网络，然后仅训练 Critic 网络，效果是一样的

rtsp 转浏览器视频流

判断两个时间段是否交叉

判断多个时间，日期的大小

python WebsocketServer 只能用 localhost 和 127.0.0.1 访问的解决方法

性感美女，在线裸聊

美女性感自拍

深度学习统计图片集锦

HOG LBPs computer vision 观止

from typing import TypedDict 错误

精美图片收藏

深度学习图片集锦

anaconda 新手使用的 3 个步骤

Python UDP 通信的消息长度限制与分包

php 页面中使用 return 中断自身并返回结果

cross_entropy 中的 reduce_mean

php 中使用 json 的方法

JQuery $.get ajax 请求

Guided Policy Search 引导策略搜索

深度学习推荐

Pendulum 2DoF with NAF Algorithm

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

Mujoco UR5 机械臂仿真

机器人推荐

JS 获取 get 参数 get_url_param 函数

文贝推荐

漫谈区块链技术

Windows 全景合成软件

文贝推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

春江花月夜

网页弹出指定大小窗口 JS 代码

Visual Studio 2017 离线版和安装教程

文贝推荐

墨之科技，版权所有 © Copyright 2017-2027

湘ICP备14012786号邮箱：ai@inksci.com