https://arxiv.org/abs/2302.02948 SAC - off-policy, on-line RL - off-policy: trains on samples that were not generated by the current policy - on-line: ?