This paper addresses a key issue in the context of the Energy Transition: the computation of trading strategies for valorizing storage units in markets with large renewable energy integration. In particular, we focus on (short-term) intra-day markets and propose a reinforcement learning-based approach for computing well-performing strategies.
The first part of the paper proposes a novel modeling framework for the strategic participation of energy storage in the European continuous intraday (CID) market where exchanges occur through a centralized order book. We show that under specific assumptions, finding an optimal trading strategy is equivalent to solving a finite-time Markov Decision Process (MDP). In order to get a glimpse into the complexity of this problem, we provide here below a schematic view of the timeline for actions for trading agents interacting with the CID market where products for a specific delivery period can be exchanged up to 30 minutes before its start – look at our paper for more information!
The formulated MDP has high-dimensional state and action spaces. In order to be able to solve the problem in an efficient way, we propose afterwards to use high-level actions and an alternative state space representation. We solve the resulting problem using an asynchronous distributed version of the fitted Q iteration algorithm, which is quite popular in the Reinforcement Learning (RL) literature. As a result, we obtain a time-variant trading strategy represented by T=40 Q-functions, where T denotes the length of the trading horizon.
The simulation results provide evidence that reinforcement learning can successfully learn high-performing trading strategies for valorizing storage devices in the European CID market. In particular, the RL strategy is shown to have a low-risk and can yield on average 2% more profits than the industrial benchmark on test sets.