Evolving or learning in an ecology (that is, coevolving with many other species) means dealing with a dynamic environment. Gradient descent/REINFORCE would not only have to work over changes in the policy, but also in the non-stationary environment. What does a [[world model]] look like when the environment is known to be dynamic? Perhaps it is more correct to say, since [[The success of an ecology is defined by its longevity or robustness to perturbation]], that non-stationarity of an ecology is a fundamental property of _robust_ or healthy ecologies.