Handbook of learning and approximate dynamic programming (IEEE press series on computational intelligence)
Si J., Barto A., Powell W., Wunsch D. Wiley-IEEE Press, 2004. 0 pp. Type: Book
--------------------------------------------------------------------------------
Dynamic Programming (I.2.8); Optimization (G.1.6...); Control Theory (I.2.8)
Performance, Theory, Algorithms
--------------------------------------------------------------------------------
Dynamic programming (DP) refers to a collection of algorithms developed to solve sequential, multi-stage, decision problems, or to determine optimal control strategies for nonlinear and stochastic dynamic systems. The DP technique could be easily applied to problems having a perfect model of the environment, as a Markov decision process. Until recently, DP has been extensively used for solving toy problems. Due to tremendous demand from complex application domains, there has been a spurt of research activity in several other disciplines, such as intelligent control, computational intelligence, and so on. Approximate dynamic programming (ADP) is a newly coined paradigm to represent the research community at large whose main focus is to find high quality-approximate solutions to problems whose exact solutions via classical dynamic programming are not attainable in practice, mainly due to computational complexities, and a lack of domain knowledge related to the problem. This book is an edited collection of 23 chapters, based on the 2002 NSF Workshop on Approximate Dynamic Programming. The book is organized into three parts.
In the introductory chapter, Werbos provides an excellent roadmap of the field, clearly identifying the relevant mathematical principles and theoretical background, and presenting some pointers and future challenging opportunities. Part 1 (chapters 2 through 8) presents an overview of the various ADP paradigms, and discusses how some of these techniques could be used for practical problem solving. Barto and Dietterich, in chapter 2, narrate the relationship between reinforcement learning and supervised learning from an ADP perspective. Various aspects, such as training information, behavioral variety, and sequential decision tasks, are presented in detail. In chapter 3, Ferrari and Stengel provide an overview of model based adaptive critic designs, emphasizing the mathematical background and various ADP designs. Heuristic dynamic programming (HDP), dual HDP (DHDP), globalized DHDP, and action-dependent designs are presented. Pseudo-code is provided for many aspects of the algorithms, and some simple application examples are also presented towards the end of the chapter. In the next chapter, Lendaris and Neidhoffer provide ample guidance to the reader interested in adaptive critics for control. While chapter 3 is theoretically focused, the authors of chapter 4 are devoted to some of the problem formulation issues and utility functions.
Si et al., in chapter 5, introduce the direct neural dynamic programming (DNDP) approach, which is an online learning control paradigm. The methodology is illustrated using the triple link inverted pendulum, using many continuous state variables, and for a wireless network application. This chapter also presents several comparison studies using well-known algorithms, offering the reader additional quantitative insight on several ADP methods. Chapter 6 addresses the issue of the curse of dimensionality, by treating ADP as a dual of the linear programming problem. De Farias discusses the performance of approximate linear programming methods, and the approximation of error bounds with an application to queuing networks. In chapter 7, Grudic and Ungar, tackle the curse of dimensionality by using the policy gradient reinforcement learning framework. Their action transition policy gradient (APTG) algorithm is based on an estimate of the gradient in the policy space that increases the reward. Another algorithm, boundary localized reinforcement learning (BLRL), also presented in this chapter, could be used to improve the rate of convergence. Chapter 8 addresses the use of a semi-Markov decision process model, and the development of hierarchical reinforcement learning (HRL). Ryan begins this chapter with an overview of the HRL, and the associated problems with the standard Markov decision process models, and further presents some actual HRL algorithms and their pseudo-code.
Part 2 (chapters 9 through 15) covers recent research results, and presents some pointers toward important discoveries in the future. Bertsekas et al., in chapter 9, present a first iterative temporal difference method that converges without a diminishing step size. The proposed method is validated experimentally, and then compared with the least squared temporal difference (LSTD) method of several other works in the literature, and Sutton’s temporal difference (Â») method. Readers will be able to understand the superiority of the author’s approach from the analysis and the experimental results, which clearly show that the method is faster, simpler, and more reliable than other associated methods. In chapter 10, Warren and Powel present an ADP model for higher dimensional resource allocation problems. The authors use some numerical examples to illustrate the importance of the ADP method in addressing the complex problem considered. Mahadevan et al., in chapter 11, present a hierarchical probabilistic model for decision making involving concurrent action, multi-agent coordination, and hidden state estimation in stochastic environments. The reader will be able to clearly understand the various procedures for hierarchical modeling based on multi-resolution statistical modeling of the past history of observations and actions. In chapter 12, Cao presents the learning and optimization of stochastic systems from a system theoretic perspective. Illustrations are included using a queuing example. Anderson et al., in chapter 13, present a hybrid combination of robust control and reinforcement learning. Readers will be able to follow the discussions about integral-quadratic constraints and stability analysis, and then learn how reinforcement learning could aid the control system. The framework is validated using some simple examples. Supervised actor-critic reinforcement learning is presented by Rosenstein et al. in chapter 14. The key idea is that supervised learning makes the structure part of an action-critic framework for reinforcement learning. The methodology is illustrated using simple examples that any reader would enjoy. In chapter 15, Prokhorov presents back propagation through time (BPTT) and derivative adaptive critics (DAC), to determine the derivatives for training parameters in recurrent neural networks. A hybrid approach involving BPTT and DAC are provided, with their pseudo-code.
Part 3 (chapters 16 through 23) is focused on the application of ADP to several real world applications, involving large and complex problems, in an attempt to provide some insights for selecting suitable paradigms for solving future problems. Esogbue and Hearnes, in chapter 16, present a learning scheme to act in a near-optimal manner through reinforcement learning, for problems that either have no model, or have a model that is is very complex throughout. The method is further illustrated using some controller applications. In chapter 17, Kang and Bien present a hierarchical reinforcement learning scheme for solving decision making problems in which more than one goal must be fulfilled. The authors used a neuro-fuzzy logic controller to extend multiple reward reinforcement learning, and some simulation results are presented. Balakrishnan and Han, in chapter 18, introduce an adaptive critic-based neural network, to steer an agile missile with bounds on the angle of attack. The designed adaptive neurocontroller could provide solutions in minimum time, even when the initial flight path angles are changed from zero to any positive non-zero value. In chapter 19, Venayagamoorthy et al. propose a straightforward application of adaptive critic design for power system control. The authors also provide some generic guidelines on how to use adaptive critic networks for the specific problems discussed in the chapter. Anderson et al., in chapter 20, present a case study of applying a robust reinforcement learning method to the heating, ventilation, and air conditioning control of buildings. In chapter 21, Enns and Si present a helicopter flight control using direct neural dynamic programming, while Momoh addresses optimal power flow tools in chapter 22. In the final chapter, Momoh and Zivi present several challenging and benchmark problems relevant to power systems. After reviewing the various problem descriptions and formulations, a reader would know where ADP could be applied to some of the challenging problems.
The editors have summarized current developments and future challenges, and they have presented a rich collection of contributions in the area. The chapters are well organized, with most of the contents very well explained without using many additional references. I was particularly impressed to read about the ADP success stories featured in chapters 16 through 23. This is highly encouraging, since one of the key motives for coining the ADP paradigm was to address large-scale problems that are important in our society, where we are challenged with complex problems. This edited volume resembles a plane ride to the three different aspects of approximate dynamic programming, namely an overview of the topic, its technical aspects, and successful applications. I am sure that any passenger interested in ADP would enjoy this magnificent ride. I only have one comment on the organization of the volume: from a reader’s point of view, the book would have appealed better if all the related chapters were bundled together, instead of being presented under three different sections.
I recommend this book for engineers, scientists, and practitioners who would like to have a state-of-the-art research overview in the area of approximate dynamic programming. Finally, I would like to congratulate the editors, for putting together this wonderful collection of research contributions.