Individuals encode a function for mate selection— maps information about another individual to a scalar.
Eval is done with a predator-prey environment.
Outcome is:
- that extinction time is increased.
- evolve to favor mates with “survival traits”
Calls the approach an extension of “Evolutionary Reinforcement Learning” from [[Ackey 1991 - Interactions between learning and evolution]]
Agents get an action and an “evaluation“ network, plus a preference network.
The preference network learns a linear mapping from their genome and the potential mate’s genome, to a scalar. The weights of this network (linear mapping) are evolved.
Trials compared four inputs to the preference network:
- (other) just the potential mate’s genome $G_c$
- (Abs diff) The abs diff between the agent’s and the potential mate’s: $|G_a - G_c| \in R^n$
- (Squared diff) : $(G_a - G_c)^2 \in R^2$
- (Euclidean diff): $|(G_a-G_c)^2| \in R$ (note! Only a scalar for input to the preference network)
Results were that Other worked best, Abs diff and Squared diff followed closely.
A limited post-hoc analysis found that preference networks (1) selected for mates that assigned rationally high values to high-energy states, and (2) selected against mates that tried to mate when grass was distant.