Policy Gradient With Serial Markov Chain Reasoning Edoardo Cetin, Oya Celiktutan. Conference on Neural Information Processing Systems. NeurIPS 2022. [arXiv][Site]