Evolution vs Gradient Descent - Brian's Working Notes

- While we think of Gradient Descent as acting on a single (very high dimensional) landscape, each batch yields a different direction for the gradient. - Is this part of the reason why NNs avoid getting trapped in local minima now? Training on enough data leads to a shifting landscape that "massages out" minima. - In contrast, GAs tend to have a fixed landscape that they are evaluated on. - If GAs are given large neutral networks, that could act similarly to batches - Is there any advantage in evaluating GAs using a changing landscape? Such as computing fitness on a subset of the training data?