Test (Reward 图解)

70: after 70*100 times of training.

average reward of 10 times of test.

max step is 50, train when the memory is larger than 1000.

深度学习推荐
深度学习推荐

墨之科技,版权所有 © Copyright 2017-2027

湘ICP备14012786号     邮箱:ai@inksci.com