深度增强学习中的环境/图文 - 墨之科技

i 收藏

深度增强学习中的环境

深度增强学习算法不关心具体的控制对象，算法具有通用性，同一种算法可以适应于多个任务。

因此可以把深度增强算法直接用于具体的控制任务中，前提是需要定义一个环境。

环境的信息包括状态、动作和奖励。

对于一个机械臂，状态只需要定义成机械臂的关节角以及关节角速度就行了。

文献[deep reinforcement learning for robotic manipulation with asynchronous off-policy updates]中的状态定义为：

State features include the 7 joint angles and their time derivatives, the end-effector position and the target position, totalling 20 dimensions.
7 个关节角度和角速度 + 末端操作器的位置 + 目标位置 = 7 + 7 + 3 + 3 = 20。

文中的奖励为：

d 为末端操作器与目标之间的距离，u 为动作。

对于动作的设计：

Both arms were controlled at the level of joint velocities, except the three JACO finger joints which are controlled with torque actuators.

文中提到了两种动作：关节角速度层面的控制和力矩马达。

关于 Pendulum-v0 的环境设计：

https://www.jianshu.com/p/af3a7853268f

https://github.com/openai/gym/wiki/Pendulum-v0

根据公式

newthdot = thdot + (-3*g/(2*l) * np.sin(th + np.pi) + 3./(m*l**2)*u) * dt

可以看出动作为扭转力，然后转化成力矩，得到角加速度，然后经过 dt的积分得到新的角速度。



为了得到 mujoco 的控制，分析 mujoco_py 中的例子。 
1. body_interaction.py
控制语句：
    sim.data.ctrl[0] = math.cos(t / 10.) * 0.01
    sim.data.ctrl[1] = math.sin(t / 10.) * 0.01


    <actuator>
        <motor gear="2000.0" joint="slide0"/>
        <motor gear="2000.0" joint="slide1"/>
    </actuator>

gear 应该为齿轮，具有放大的作用。


            <joint axis="1 0 0" damping="0.1" name="slide0" pos="0 0 0" type="slide"/>
            <joint axis="0 1 0" damping="0.1" name="slide1" pos="0 0 0" type="slide"/>

damping 是阻尼的意思，slide 说明是一个滑动关节。

我们更关心转动关节的控制。

位置控制

    <actuator>
        <position joint="j" kp="2000"/>
    </actuator>


<joint name='wr_jr' type='hinge' pos='0 0 .7' axis='1 0 0' range='-1.57 0.8'/>


一个 7 自由度的仿人机械臂      <actuator>
    
        <!--  ================= Torque actuators ================= /-->
        <!--<motor joint='s_abduction'     name='As_abduction' gear="100"/>
        <motor joint='s_flexion'     name='As_flexion'     gear="100"/>
        <motor joint='s_rotation'     name='As_rotation'     gear="100"/>
        <motor joint='e_flexion'     name='Ae_flexion'     gear="70"/>
        <motor joint='e_pronation'     name='Ae_pronation' gear="70"/>
        <motor joint='w_abduction'     name='Aw_abduction' gear="30"/>
        <motor joint='w_flexion'     name='Aw_flexion'     gear="30"/>
        <motor joint='rc_close'     name='Arc_close'     gear="10"/>
        <motor joint='lc_close'     name='Alc_close'     gear="10"/>-->
        
        <!--  ================= Position actuators ================= /-->
        <position joint='s_abduction'     name='As_abduction' kp="100" ctrlrange='-1.57 .7'/>
        <position joint='s_flexion'     name='As_flexion'     kp="100" ctrlrange='-.85 1.57'/>
        <position joint='s_rotation'     name='As_rotation'     kp="100" ctrlrange='-.85 0.85'/>
        <position joint='e_flexion'     name='Ae_flexion'     kp="70"  ctrlrange='-1.5 1.05'/>
        <position joint='e_pronation'     name='Ae_pronation' kp="70"  ctrlrange='-1.5 1.57'/>
        <position joint='w_abduction'     name='Aw_abduction' kp="30"  ctrlrange='-0.5 0.5'/>
        <position joint='w_flexion'     name='Aw_flexion'     kp="30"  ctrlrange='-1.05 1.05'/>
        <position joint='rc_close'         name='Arc_close'     kp="10"  ctrlrange='-1.05 1.05'/>
        <position joint='lc_close'         name='Alc_close'     kp="10"  ctrlrange='-1.05 1.05'/>

    </actuator>

可以使用算法，对这个机械臂进行控制。
控制语句使用 ctrl

{{login["user_name"]}} 退出

登录

图文信息

上一条
下一条
全部	全部图文

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

Guided Policy Search 引导策略搜索

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

Sample-Efficient Imitation Learning via Generative Adversarial Nets

深度学习推荐

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

深度学习推荐

SAMPLE EFFICIENT IMITATION LEARNING FOR CONTINUOUS CONTROL

深度学习推荐

scale * np.clip(np.random.normal(0, 1, (2,)), -3, 3)

get_circle_points 代码

TD3_BC 与 BC 训练结果

python 将 list 中的 dict 进行组合

.sh 枚举与遍历例子

css :hover 和前面的冒号不能有空格

random.shuffle(data)

python tree.map_structure

python dict 迭代 for key, value in d.items()

一种调用 softlearning 的方式，代码，类

python numpy x[None, ..., None]

flask debug=True 为什么会启动两次

类的继承与属性复制例子

gym Box 的两种定义方式

No registered env with id: halfcheetah-v2

DDPG HalfCheetah-v2 reward

.sh 文件自动输入密码的两种方式

DigitsFlow 设计

sys.path.append

d4rl dataset halfcheetah-expert-v2

shm 不断上涨的问题

mimetype x-mixed-replace boundary

WebSocket 推视频的优势

TD3_BC hopper，halfcheetah 实验结果

https 会加密哪些内容？

Python 类（实例）销毁

TD3_BC halfcheetah-v2 实验结果

mysql 修改密码、登录

PlanT 如何更新编辑的组件

视频识别在工程中的应用

图像位姿自动校准

Bus error (core dumped)

ocr.pytorch RCNN

pytorch errno 28 no space left on device

PlanT 如何实现 style 的 scoped

PlanT post 接口

websocket 接收数据的方法

传多个参数的方式

PlanT 是如何实现前端推送更新的？

Python 调用 dll 与回调

Python 调用 dll

python c++ 共享内存

ffmpeg 视频文件 rtsp

ffmpeg yuv rtsp

导入 Vue 组件

PlanT，一个没有前端的前端设计网站

自动重启，避免不断重启

python 显示隐藏终端、控制台

python 如何关闭 os.system 启动的程序？

cdn vue-quill-editor.js

hough detection 直线的表示

TD3_BC 未开始 Q 网络训练时，actor_Q_loss_list 为什么明显达不到 2.5

learn opencv hough line detection 代码

深度学习直线特征检测 line feature detection

TD3_BC LunarLander-V2 Critic loss 下降为什么分瓣？

在 TD3_BC 中，先训练好 policy 网络，然后仅训练 Critic 网络，效果是一样的

rtsp 转浏览器视频流

判断两个时间段是否交叉

判断多个时间，日期的大小

python WebsocketServer 只能用 localhost 和 127.0.0.1 访问的解决方法

性感美女，在线裸聊

美女性感自拍

深度学习统计图片集锦

HOG LBPs computer vision 观止

from typing import TypedDict 错误

精美图片收藏

深度学习图片集锦

anaconda 新手使用的 3 个步骤

Python UDP 通信的消息长度限制与分包

php 页面中使用 return 中断自身并返回结果

cross_entropy 中的 reduce_mean

php 中使用 json 的方法

JQuery $.get ajax 请求

Guided Policy Search 引导策略搜索

深度学习推荐

Pendulum 2DoF with NAF Algorithm

深度学习推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

生成对抗模仿学习 Generative Adversarial Imitation Learning

深度学习推荐

对抗生成网络 Generative Adversarial Networks

深度学习推荐

无奖励工程的端到端机器人强化学习 End-to-End Robotic Reinforcement Learning without Reward Engineering

深度学习推荐

普通策略梯度算法 vanilla policy gradient

深度学习推荐

信任域策略优化算法 trust region policy optimization

深度学习推荐

深度增强学习框架：rllab & garage

深度学习推荐

值分布增强学习算法分布式贝尔曼算子 a distributional perspective on reinforcement learning

深度学习推荐

高斯分布的信息熵、交叉熵和相对熵（KL散度）公式推导

深度学习推荐

Mujoco UR5 机械臂仿真

机器人推荐

JS 获取 get 参数 get_url_param 函数

文贝推荐

漫谈区块链技术

Windows 全景合成软件

文贝推荐

近端策略优化算法 Proximal Policy Optimization Algorithms

深度学习推荐

优先经验重播 Prioritized Experience Replay

深度学习推荐

Soft Actor-Critic

深度学习推荐

Stabilizing transformers for reinforcement learning

深度学习推荐

春江花月夜

网页弹出指定大小窗口 JS 代码

Visual Studio 2017 离线版和安装教程

文贝推荐

墨之科技，版权所有 © Copyright 2017-2027

湘ICP备14012786号邮箱：ai@inksci.com