Sequence models Sample Clauses
Sequence models. Another approach to deal with partial observable markov decision processes, is incorporating sequences of (previous) states into the model [BB89, BB90]. It is possible to use a recurrent model together with a value iterator, if one modifies the Replaybuffer to capture the hidden states of the LSTM or RNN needed for training. We, however, opted to switch to a policy gradient method and deployed PPO-LSTM [BB90] without frame stacking. This method too, comes with significant downsides: it is famously hard to train and requires careful hyper parameter tuning. [BB91] We managed to get a proof of concept implementation running, be it at a computational cost, but its generalization performance remained severely lacking at around 62% percent, in comparison to methods that use multiple defocus frames. Version Status Date Page 2.0 Non-Confidential 2024.05.1172022.03.1 77/100
