Could Latest Poker AI Breakthrough Revolutionize Final Table Coverage?

A new poker AI achieved "superhuman" performance at six-max no-limit hold'em. Could the tech be used to analyze human play at final tables?

By happenstance, as the WSOP Main Event was moving toward its conclusion, a new paper was released on a poker-playing AI that can win against elite players in six-handed no-limit hold’em.

The technology, dubbed Pluribus, is the latest in a series of poker AIs from computer scientists Noam Brown and Tuomas Sandholm, who hail from Carnegie Mellon University in Pittsburgh. The previous iteration of the poker AI, Libratus, was able to win convincingly against top players in heads-up no-limit hold’em back in 2017. Naturally, adding more players to the table was the next horizon to reach.

“The past two decades have witnessed rapid progress in the ability of AI systems to play increasingly complex forms of poker,” they wrote in a paper published July 11 in the journal Science. “However, all prior breakthroughs have been limited to settings involving only two players. Developing a superhuman AI for multiplayer poker was the widely-recognized main remaining milestone.”

What the bot can do

Pluribus is capable of approaching what is known as Game Theory Optimal poker strategy in six-handed no-limit hold’em poker. In other words, it can play in such a way that humans don’t really have much of a chance to exploit it. The bot doesn’t have lapses in concentration, like mere mortals, and so it plays “superhuman” at every moment. It’s ruthlessly consistent.

This is also called a “Nash equilibrium” strategy. Brown gave a rock-paper-scissors example to illustrate the concept. The Nash equilibrium strategy for that game would be to play rock, paper, and scissors with equal probability so your opponent can’t exploit you. Pluribus can’t play perfect poker, so it didn’t in any way solve six-handed no-limit hold’em. It just took it to another level by executing a mastery of hand balance that humans struggle to achieve. Pluribus comes closer to the Nash equilibrium than humans.

As the computer scientists put it in their stated objective for the project:

“The shortcomings of Nash equilibria outside of two-player zero-sum games have raised the question among researchers of what the right goal should even be in such games. In the case of six-player poker, we take the viewpoint that our goal should not be a specific game-theoretic solution concept, but rather to create an AI that empirically defeats human opponents consistently, including elite human professionals. (This is what’s commonly regarded as “superhuman” performance for AI bots.)”

Pluribus does not adapt its strategy to its opponents, so it also doesn’t exploit the patterns and tendencies that human players unavoidably have. It plays in a way that is said to be nearly unbeatable, regardless of the opponent’s playing style and strategy.

“Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand — being careful to balance its strategy across all the hands so it remains unpredictable to the opponent,” said the research findings. “Once this balanced strategy across all hands is computed, Pluribus then executes an action for the hand it is actually holding.”

Removing the human element

An AI capable of reading human emotions and body language for poker “tells” would be able to exploit human play (this isn’t possible yet), so in this sense Pluribus is merely an objective arbiter of poker strategy. A human poker player may be able to dominate a specific fellow human more thoroughly than Pluribus can, especially as the sample size of hands played is reduced, but that’s only through an exploitative strategy, which deviates from GTO.

After accounting for variance, the bot was able to demonstrate its superiority in two scenarios, one in which there were five other identical AIs at the table and one human player, and the other in which there was just Pluribus and five humans. Thousands of hands were logged.

“If each chip was worth a dollar, Pluribus would have won an average of about $5 per hand and would have made about $1,000/hour playing against five human players,” wrote Brown, who is a Research Scientist at Facebook’s AI division, which was also beyond the project. “These results are considered a decisive margin of victory by poker professionals.”

The bot was especially good at value betting against its human opponents. Its mastery of bet sizing was also superior to its human competitors, which included world-class poker players Greg Merson, Darren Elias, Chris Ferguson, Anthony Gregg, Nick Petrangelo, and Jason Les.

According to Brown, the bot was able to bolster some human-deduced poker strategy, such as trying to avoid limping into a pot, unless you’re in the small blind. Pluribus, through playing itself over and over again over the course of eight days of training, was able to figure out that limping is not GTO.

“While Pluribus initially experimented with limping when computing its blueprint strategy offline through self play, it gradually discarded this action from its strategy as self play continued,” said the paper.

The AI was also fond of what humans call “donk betting,” which is betting when your previous action was just calling. It’s often viewed as a play amateurs utilize, but Pluribus has breathed some new life into the technique. Poker pros do “donk bet,” but the computer scientists found that Pluribus determined the strategy was much more optimal than humans give it credit for.

Donk betting might have to be renamed.

A win for TV coverage?

The bot has major implications for online poker sites, which have to guard against cheating. That goes almost without saying. However, Pluribus is not an existential threat to poker. It could not only raise the bar on human skill level, thus enriching the game, but also make the game more spectator friendly.

The chess world has long used computer programs to provide objective analysis of the games of the world’s top players. The chess engines show what the most optimal lines are. Poker is also a game of lines, where a decision in an earlier round determines the tree of options available in later rounds. You plan ahead in both chess and poker, and computers can tell you the most optimal path.

Brown told US Bets that while the project wasn’t designed to create a commercial product that can analyze human poker play, the technology is capable of doing just that.

“I do think that the AI can provide an interesting objective analysis of what a person should do, and determine that in real time (though since live poker is typically broadcast on a delay, I don’t know if it really needs to strictly be in real time),” Brown told US Bets via email. “That said, I’m really focused on advancing fundamental AI research rather than commercializing this work in the poker space, but I do think this work can be used to improve the way people play poker. Understanding GTO strategy is a huge part of the game and techniques like this can help people understand that aspect better.”

The technology could tell viewers, on ESPN’s coverage of the WSOP Main Event final table, for example, what is the optimal bet size for a player in a certain spot. Pluribus was found to play at about twice the speed of human players. If you factor in slow, deliberate human play at a final table with millions of dollars on the line, the AI would have more than enough time to come up with the GTO play or bet size before the human makes a decision. Viewers of the final table could see what an objective AI thinks of someone’s play, in addition to the indispensable human commentary on the broadcast.

Jason Les, who played Pluribus, told US Bets that he also thinks it could enhance coverage of a tournament final table, but he sees it being more interesting for experienced poker players rather than recreational ones. Les also competed in the 2017 Libratus match.

“You could use this tech to build an AI that provides objective analysis of human play in live tournaments,” Les said. “For advanced viewers, it would be fantastic, I know I’d love it. I’m not sure if that would enhance the experience for most spectators though. I think the traditional appeal for poker spectators is watching two guys take up psychological warfare against each other. Playing a game of trying to exploit each other instead of attempting to reach a Nash equilibrium approximation like Pluribus does.”

Pluribus requires about $150 worth of cloud computing resources, making the tech much more accessible. “This efficiency stands in stark contrast to other recent AI milestone projects, which required the equivalent of millions of dollars’ worth of computing resources to train,” Brown wrote in his research.


Related Posts

(function ($) { "use strict"; $(window).load(function(){ $('body').on('load', function(){ $(this).addClass('windowLoadFired'); }) }); })(jQuery);