The Legend of AlphaGo Part III: Epic Battles and the Future of AI

Clare Teng

In the first two articles we’ve learned what AlphaGo is – an AI trained to play Go – and how it works – deep learning – so what’s next? Waiting…

It may not be obvious but training these models requires a lot of computational power. In part 2 of the series, we heard how AlphaGo used 30 million data points to learn from expert games via supervised learning (SL). This model alone took 50 graphics processing units (GPUs) and 3 weeks to train. To evaluate the player’s moves after learning and improving through self-play, took a further week. For context, a personal computer (PC) usually has 1 central processing unit (CPU), and potentially 1 GPU (if fitted). GPUs are much more powerful and faster versions of CPUs, so 50 of them working for 4 weeks is indeed a lot of computational power!

So, after waiting for 4 weeks, AlphaGo is finally ready to play. The team first pitched AlphaGo against state-of-the-art computer Go programs – Crazy Stone, Zen and Pachi – and it lost only 1 out of 495 games. That’s a 99.8% success rate. But it gets even better. As discussed in part 1 of the series, the rules of the game allow for a handicap to account for differences in skill level between players. In the second set of matches against the other Go programs, AlphaGo gave its opponent 4 handicap stones, i.e. 4 extra stones to place on the board. The final score? AlphaGo still won >77% of the matches.

Okay, so what about a human player? The first opponent was Fan Hui, 3-time European Go champion. In a set of 5 games, AlphaGo won 5-0 without handicap. AlphaGo then went on to play Mr Lee Sedol, considered the greatest Go player of the past decade…

In March 2016, with a grand prize of 1 million dollars, over a period of 6 days of play, AlphaGo won 4-1, with an audience of >200 million people across the world. Not only did AlphaGo win its matches, it also invented new moves previously unseen by Go experts all over the world. Wow.

So, what’s next? Since AlphaGo, numerous other iterations have been developed including Master, AlphaGo Zero, AlphaZero and MuZero. Each iteration continues to improve on its predecessor, with AlphaGo Zero trained without any expert knowledge at all!

No doubt the developments will continue, and the introduction of chat engines such as ChatGPT will only accelerate the growth, but what are the potential limitations and challenges?

Well, as mentioned above – and discussed in part 2 of the series – AI models need a lot of training data to be able to generalise well. And again as discussed previously, it takes a long time to train these models on very powerful GPUs. A child learning how to ride a bicycle in part 2, will most likely expend much less energy and time learning compared to an AI model! The rules of playing Go are incredibly well defined (as is usually the case with board games), but real life environments are messy. This means building a reinforcement learning (RL) model that involves learning with real-world scenarios is extremely difficult. The cost of running learn-to-play simulations is not comparable to the learning capability of humans in complex surroundings, it is order of magnitudes larger.

AlphaGo has shown us that it is possible to build a model that can learn entirely from its environment’s cues, albeit with well-defined rules and the ability to run millions of simulations. Several advances have been made in RL since, notably in self-driving cars. Instead of the expected many millions of iterations needed to reach a reasonable competency level, recent algorithms were able to learn lane-following in just 20 minutes. Here the emphasis is on being able to solve a given task with intelligent trial and error rather than just looking at millions of seemingly random data points.

In this 3-part series on AlphaGo, I hope I’ve given you a flavour of why AlphaGo was significant in the world of AI, as well as a quick overview of how the algorithms work, and the challenges of extrapolating RL algorithms into the real-world that researchers face today. Thanks for reading!

One comment

The Legend of AlphaGo Part II: How does it work? – TOM ROCKS MATHS says:

April 29, 2024 at 2:20 pm

[…] are repeated, refined and repeated again until eventually we have a master Go player. In the third and final part of the series we will see AlphaGo put to the test against human players for the first time, and explore what may […]

LikeLike

TOM ROCKS MATHS

Maths, but not as you know it…

The Legend of AlphaGo Part III: Epic Battles and the Future of AI

One comment

Leave a comment Cancel reply

The Legend of AlphaGo Part III: Epic Battles and the Future of AI

Share this:

Related

One comment

Leave a comment Cancel reply