70: after 70*100 times of training.
average reward of 10 times of test.
max step is 50, train when the memory is larger than 1000.
墨之科技,版权所有 © Copyright 2017-2027
湘ICP备14012786号 邮箱:ai@inksci.com