How can I set the (argmax / greedy) output of the decoder at time t as an input at t+1 ?

I'm just trying the exact same thing.. I do this by not using the high-level APIs from TF (Helpers, Decoders) but by aligning two LSTMs together to form the network. I got the training working (with reinforcement learning) but I just cannot find a solution for inference.. To be more specific: How can I set the (argmax / greedy) output of the decoder at time t as an input at t+1 ?!