This research paper presents a novel deep reinforcement learning (DRL) solution to the decision-making problem behind algorithmic trading in the stock markets: selecting the appropriate trading action (buy, hold or sell shares) without human intervention. Naturally, the core objective is to achieve an appreciable profit while efficiently mitigating the trading risk. This specific task is particularly complex due to the sequential nature of the problem as well as the stochastic and adversarial aspects of the environment. Moreover, a huge amount of both quantitative and qualitative information, which is generally not available, influences the dynamics of this environment. Until now, DRL algorithms mainly focused on well-known environment with specific properties, such as games. This research paper is a pioneer work assessing the ability of this new artificial intelligence (AI) approach to solve challenging real-life problems, in this case algorithmic trading.
The three main contributions of this scientific article are as follows. Firstly, the research paper rigorously formalizes the particular algorithmic trading problem at hand, and makes the link with the reinforcement learning (RL) formalism. Especially, the state and action spaces are discussed in detail together with the rewards and the main objective. Secondly, a novel DRL algorithm, denoted Trading Deep Q-Network (TDQN), is thoroughly presented. As its name indicates, it is inspired from the famous DQN algorithm developed to play Atari games, but significantly adapted to the particular algorithmic trading problem at hand. Thirdly, a new and more rigorous methodology is proposed to objectively assess the performance of trading strategies. This procedure is all the more important because multiple contributions in algorithmic trading tend to prioritize good results over a proper objective scientific approach.
The performance realized by the TDQN algorithm is interesting and informative. On the one hand, the DRL algorithm achieves promising results, surpassing on average the benchmark trading strategies considered. Moreover, the TDQN strategy demonstrates multiple benefits compared to more classical approaches, such as an appreciable versatility and a remarkable robustness to diverse trading costs. On the other hand, the research paper highlights the core challenges that DRL techniques still need to overcome: the management of poorly observable environments and the lack of generalization, to cite the main ones.