Saturday, February 18, 2023

On "Practical Issues in Temporal Difference Learning"

Just finished studying this seminal (perhaps politically more correct oocytic) paper published in 1991/1992! It was a breakthrough if not a game changer (pardon the pun)!

In the 1990s, Big Blue (IBM) was one of the dominant players in the AI & machine learning field. This paper was written by a researcher of the once famous IBM Thomas J. Watson Research Center. In 1997, IBM Deep Blue beet the World Champion in chess. How times have changed since then!

This is perhaps one of the first papers that empirically and very convincingly demonstrated that a learning algorithm in form of a neural network without any prior knowledge, expert input, massive databases, or hand crafted features could learn a complex and random nature game like Backgammon in self play and defeat or match the best available computer games at the time and human champions. As they say, the rest is history!

One of the more interesting discoveries: "... In qualitative terms, the TD nets have developed a style of play emphasizing running and tactical play, whereas the EP nets [supervised training on human expert preferences] favor more quiescent positional play emphasizing blocking rather than racing. This is more in line with human expert play, but it often leads to complex prime vs. prime and back-game situations that are hard for the network to evaluate properly. This suggests one possible advantage of the TO approach over the EP approach: by imitating an expert teacher, the learner may get itself into situations that it can't handle. ..."

Many of the issues that still face AI & machine learning were presented and discussed in this paper.

From the abstract:
"This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TD(λ) algorithm, can be successfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical perspective. These practical issues are then examined in the context of a case study in which TD(λ) is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of hand-crafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results against world-class human play."
 
Practical Issues in Temporal Difference Learning

No comments: