Data-driven Derivative Hedging with Quadratic Variation Penalty
We consider the problem of hedging a European call option under a discrete rebalancing schedule with trades subject to transaction costs. We formulate this as a stochastic optimal control problem aiming to maximize the hedge P&L with the quadratic variation of the P&L as a penalty term. We solve the optimization numerically, using deep stochastic optimal control and deep reinforcement learning when the market follows either the Black-Scholes or a stochastic volatility model. Furthermore, under the Black-Scholes model, we show that delta hedging is not the optimal hedging strategy when penalized for the quadratic variation. Our results show that data-driven methods outperform traditional delta-hedging strategies when accounting for transaction costs and pathwise variability. We observe how these methods are well-suited for multi-step optimization problems and can effectively balance hedging costs over time.