[This is the third post in a series. The other posts are here: 1 2 4]
On Thursday I wrote about progress in computer chess, and how a graph of Elo rating (which I called the natural measure of playing skill) versus time showed remarkably consistent linear improvement over several decades. I used this to argue that sometimes exponential improvements in the inputs to AI systems (computer speed and algorithms) lead to less-than-exponential improvement in AI performance.
Readers had various objections to this. Some said that linear improvements in Elo rating should really be seen as exponential improvements in quality; and some said that the arrival of the new AI program AlphaZero (which did not appear in my graph and was not discussed in my post) is a game-changer that invalidates my argument. I’ll address those objections in this post.
First, let’s talk about how we measure AI performance. For chess, I used Elo rating, which is defined so that if Player A has a rating 100 points higher than Player B, we should expect A to collect 64% of the points when playing B. (Winning a game is one point, a drawn game is half a point for each player, and losing gets you zero points.)
There is an alternative rating system, which I’ll call ExpElo, which turns out to be equivalent to Elo in its predictions. Your ExpElo rating is determined by exponentiating your Elo rating. Where Elo uses the difference of two player’s ratings to predict win percentage, ExpElo uses a ratio of the ratings. Both Elo and ExpElo are equally compelling from an abstract mathematical standpoint, and they are entirely equivalent in their predictions. But where a graph of improvement in Elo is linear, a graph of improvement in ExpElo would be exponential. So is the growth in chess performance linear or exponential?
Before addressing that question, let’s stop to consider that this situation is not unique to chess. Any linearly growing metric can be rescaled (by exponentiating the metric) to get a new metric that grows exponentially. And any exponentially growing metric can be rescaled (by taking the logarithm) to get a new metric that grows linearly. So for any quantity that is improving, we will always be able to choose between a metric that grows linearly and one that grows exponentially.
The key question for thinking about AI is: which metric is the most natural measure of what we mean by intelligence on this particular task? For chess, I argue that this is Elo (and not ExpElo). Long before this AI debate, Arpad Elo proposed the Elo system and that was the one adopted by chess officials. The U.S. Chess Federation divides players into skill classes (master, expert, A, B, C, and so on) that are evenly spaced, 200 Elo points wide. For classifying human chess performance, Elo was chosen. So why should we switch to a different metric for thinking about AI?
Now here’s the plot twist: the growth in computer chess rating, whether Elo or ExpElo, is likely to level off soon, because the best computers seem to be approaching perfect play, and you can’t get better than perfect.
In every chess position, there is some move (or moves) that is optimal, in the sense of leading to the best possible game outcome. For an extremely strong player, we might ask what that player’s error rate is: in high-level play, for what fraction of the positions it encounters will it make a non-optimal move?
Suppose a player, Alice, has an error rate of 1%, and suppose (again to simplify the explanation) that a chess game lasts fifty moves for each player. Then in the long run Alice will make a non-optimal move once every two games–in half of the games she will play optimally. This implies that if Alice plays a chess match against God (who always makes optimal moves), Alice will get at least 25% of the points, because she will play God evenly in the half of games where she makes all optimal moves, and (worst case) she will lose the games where she errs. And if Alice can score at least 25% against God, then Alice’s Elo rating is no more than 200 points below God’s. The upshot is that there is some rating–the “Rating of God”–that cannot be exceeded, and that is true in both Elo and ExpElo systems.
Clever research by Ken Regan and others has shown that the best chess programs today have fairly low error rates and therefore are approaching the Rating of God. Regan’s research suggests that the RoG is around 3600, which is notable because the best program on my graph, Stockfish, is around 3400, and AlphaZero, the new AI chess player from Google’s DeepMind, may be around 3500. If Regan’s estimate is right, then AlphaZero is playing the majority of its games optimally and would score about 36% against God. The historical growth rate of AI Elo ratings has been about 50 points per year, so it would appear that growth can continue for only a couple of years before leveling off. Whether the growth in chess performance has been linear or exponential so far, it seems likely to flatline within a few years.
Leave a Reply