1. 강화학습 기초
2. A2C(Actor to Critic, 행동자-비평자)
3. TRPO(Trust Region Policy Optimization)
4. PPO(Proximal Policy Optimization)