## Can we build agents without specifying an objective? Consider the alignment problem. An agent can be designed or trained to avoid specific undesirable outcomes. But this is a reactive effect. Instead, consider an agent with an internal motivation drawn from experiences interacting with other agents. Imagine if agents developed an intrinsic, generalized ethical sense indirectly through their interactions with each other. Can ethics emerge? What sorts of ethics emerge in what circumstances? I'd argue that it's better to have an agent with a deep ethical sense than an agent fine-tuned to avoid specific behaviors or outcomes. Obviously, generalizing outcomes or behaviors to avoid is hard, and difficult to be certain the agent will generalize to OOD future circumstances. ## Ecologies and societies Successful agents that interact with each other through simple games find Nash equilibria that provide an "solution" to the game. But even simple normal form games become intractable to analyze when played iteratively. A sense of ethics can provide patterns of behavior that let the agents reach new stable equilibrium faster when the game they are playing changes. ## Long-term learning For this approach to work, the agents' policies need to be able to hold an "ethical values" more tightly than lessons from the current environmental arrangement. Forgetting and socalled life-long learning are technical open problems in neural networks, so an alternative is needed that allows us to investigate this topic without solving the challenging problem of life-long learning. Ideas: - Agents have a "collective unconsciousness" -- a NN (e.g., PPO policy) that provides either action suggestions or inputs. The CU is shared across all agents and learns generationally, over environmental-change timespans. ## Similar work [[Jaques 2019]] looks at using intrinsic motivation through social influence in a multi-agent setting, but doesn't consider how past games/environments can contribute to current games.