Summary
In the World Model paper, the authors proposed a generative neural network to tackle different RL environments. They divided the agent in two parts. First, a model is trained to capture the environment around the agent and create a model of the environment (World model). In second part, a controller model is trained to perform a task based on the world model. Both parts are trained separately.

World Model design
The model comprises of a Vision Model (V: A VAE network) to compress visual observation to a smaller latent vector. Which is then passed to a Memory model (M: a RNN) to consider the current observation (z) and previous history (h) and predict the future z vector. The Controller model (C: a linear NN) then uses that predicted next frame and choses an action (a). Here is the final flow diagram-

World Model flow
Interesting
One interesting part about this paper is, they used the agent’s dream to train for an environment without the actual environment. Since the agent is predicting the future environment vector, it feeds the predicted frame data as the next input frame to the World Model network and simulates a virtual environment. This way the agent can be trained inside the ‘dream’ of the agent. The model was trained for the vizdoom in the dream and then connected to the actual Vizdoom environment. The agent was able to avoid the fires.
Limitation
Some limitations of training in the dream is that sometimes the agent creates some state that is not possible in the actual environment. Like, in Vizdoom environment the agent sometimes extinguishes the fireballs which is not possible in the actual game. So, it performs well on the virtual environment but performs poorly on actual environment.
The interactive version of this paper can be found here