DNC 知识图谱/图文 - 墨之科技

i 收藏

DNC 知识图谱

Hybrid computing using a neural network with dynamic external memory

摘要翻译：

人工神经网络非常适应于信号感知、序列学习和增强学习，但受限于表达变量与数据结构以及长时间存储数据的能力，因为缺乏额外存储区。这里，我们介绍一个机器学习方法，叫做可微神经计算机（Differentiable Neural Computer, DNC），它由一个可以从额外存储阵列中进行读取的神经网络构成，类似于传统计算机中的内存（RAM）。有如传统计算机，它可以使用存储区来表示并操作复杂的数据结构，又像神经网络一样能够从数据中学会这样去做。当使用监督学习进行训练时，我们展示了DNC能够成功地回答应用于模仿推理的综合问题以及自然语言的问题推断。我们展示了它可以学习一些任务，例如寻找特殊点之间的最短路径并且推断随机生成的图形中丢失的连接，并且将这些任务推广于特殊的图形例如交通网络和家族树。当使用增强学习进行训练时，一个DNC可以根据序列符号对应的目标完成一个移动拼图。总之，我们的结果展示了DNCs能够解决复杂的结构化任务，这些任务对于没有额外读写存储区的神经网络来说，是不可能的。

可微神经计算机（Differentiable Neural Computer, DNC）

文章解读：

N × W memory matrix M

-------------------

The read vector r returned by a read weighting
wr over memory M is a weighted sum over the memory locations:
r=i= M[i, ]w [i]
N
1
r , where the ‘·’ denotes all j = 1, …, W. Similarly,
the write operation uses a write weighting ww to first erase with
an erase vector e, then add a write vector v: M[i,j] ← M[i,j]
(1 − ww[i]e[j]) + ww[i]v[j].

-------------------------

The functional units that determine and
apply the weightings are called read and write heads. The operation of
the heads is illustrated in Fig. 1 and summarized below; see Methods
for a formal description.

由a b c d四个区域组成

a为神经网络，b为read and write heads，c为memory M

a中的递归架构

b与c之间的交互

similarity scores

b的写里面有write key、write vector和erase vector

b的读里面有read key和read mode（BCF）

另外还有read vectors传送给a区域

L[i, j]
is close to 1 if i was the next location written after j, and is close to 0
otherwise

As well as automatically increasing with each write to a location, usage
can be decreased after each read. This allows the controller to reallocate
memory that is no longer required

the bAbI dataset26

DNCs performed much better than both long shortterm
memory27 (LSTM; at present the benchmark neural network for
most sequence processing tasks) and the neural Turing machine 优于神经图灵机

{{login["user_name"]}} 退出

登录

图文信息

上一条
下一条
全部	全部图文

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

Guided Policy Search 引导策略搜索

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

scale * np.clip(np.random.normal(0, 1, (2,)), -3, 3)

get_circle_points 代码

TD3_BC 与 BC 训练结果

python 将 list 中的 dict 进行组合

.sh 枚举与遍历例子

css :hover 和前面的冒号不能有空格

random.shuffle(data)

python tree.map_structure

python dict 迭代 for key, value in d.items()

一种调用 softlearning 的方式，代码，类

python numpy x[None, ..., None]

flask debug=True 为什么会启动两次

类的继承与属性复制例子

gym Box 的两种定义方式

No registered env with id: halfcheetah-v2

DDPG HalfCheetah-v2 reward

.sh 文件自动输入密码的两种方式

DigitsFlow 设计

sys.path.append

d4rl dataset halfcheetah-expert-v2

shm 不断上涨的问题

mimetype x-mixed-replace boundary

WebSocket 推视频的优势

TD3_BC hopper，halfcheetah 实验结果

https 会加密哪些内容？

Python 类（实例）销毁

TD3_BC halfcheetah-v2 实验结果

mysql 修改密码、登录

PlanT 如何更新编辑的组件

视频识别在工程中的应用

图像位姿自动校准

Bus error (core dumped)

ocr.pytorch RCNN

pytorch errno 28 no space left on device

PlanT 如何实现 style 的 scoped

PlanT post 接口

websocket 接收数据的方法

传多个参数的方式

PlanT 是如何实现前端推送更新的？

Python 调用 dll 与回调

Python 调用 dll

python c++ 共享内存

ffmpeg 视频文件 rtsp

ffmpeg yuv rtsp

导入 Vue 组件

PlanT，一个没有前端的前端设计网站

自动重启，避免不断重启

python 显示隐藏终端、控制台

python 如何关闭 os.system 启动的程序？

cdn vue-quill-editor.js

hough detection 直线的表示

TD3_BC 未开始 Q 网络训练时，actor_Q_loss_list 为什么明显达不到 2.5

learn opencv hough line detection 代码

深度学习直线特征检测 line feature detection

TD3_BC LunarLander-V2 Critic loss 下降为什么分瓣？

在 TD3_BC 中，先训练好 policy 网络，然后仅训练 Critic 网络，效果是一样的

rtsp 转浏览器视频流

判断两个时间段是否交叉

判断多个时间，日期的大小

python WebsocketServer 只能用 localhost 和 127.0.0.1 访问的解决方法

性感美女，在线裸聊

美女性感自拍

深度学习统计图片集锦

HOG LBPs computer vision 观止

from typing import TypedDict 错误

精美图片收藏

深度学习图片集锦

anaconda 新手使用的 3 个步骤

Python UDP 通信的消息长度限制与分包

php 页面中使用 return 中断自身并返回结果

cross_entropy 中的 reduce_mean

php 中使用 json 的方法

JQuery $.get ajax 请求

Guided Policy Search 引导策略搜索

深度学习推荐

Pendulum 2DoF with NAF Algorithm

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

Mujoco UR5 机械臂仿真

机器人推荐

JS 获取 get 参数 get_url_param 函数

文贝推荐

漫谈区块链技术

Windows 全景合成软件

文贝推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

春江花月夜

网页弹出指定大小窗口 JS 代码

Visual Studio 2017 离线版和安装教程

文贝推荐

墨之科技，版权所有 © Copyright 2017-2027

湘ICP备14012786号邮箱：ai@inksci.com