Analyzing the World Chess Championship 2024: Empirical synthesized approach

46 points by maximamel 7 months ago

I've spent the last few months in between searching for jobs exploring engine analysis of chess games and how parts of the analysis can be used to construct narratives about how the games went, so I opened up this article very quickly. This is a fine first attempt but I feel it's missing something very important, mainly the win/draw/loss percentages. Chess engines previously used centipawn as a way of comparing moves and positions within the search space, but now many if not most of the top engines are also incorporating the win/draw/loss percentage estimations that come from neural networks.

To that end, Julian at the Chess Engine Lab has developed a style of narrative and analysis that I feel like really uses the WDL percentages well.

https://substack.com/@chessenginelab

His series on the 2024 World Chess Championship is great and I haven't seen anything else come close in terms of using a chess engine to craft an accessible analysis of the matches. Take one look at the WDL percentages from Game 14 and it becomes extremely clear what's about to happen and how the game evolved: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...

https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam...

bsder 7 months ago

Those analyses are interesting.
The overarching story seems less like Ding made a blunder but more that Gukesh missed quite few opportunities to beat Ding long before the final game.
- knuckleheads 7 months ago
  
  That’s the joy of high level engines being a thousand points better than the best human player! The chess engines are further away from the grandmasters than the grandmasters from us. Engines consistently see our mistakes that would take a hundred grandmasters a hundred life times to find. What the chess engine calls a mistake might only be understood as such after twenty moves that nobody could understand until suddenly it becomes clear that the chess engine is going to crush the other player. There’s so many layered levels of logic that it passes into mysticism again. In the same moment, I think we are entering a new age of understanding games, with the engines better able to explain themselves via WDL and new measurements for things like sharpness and value of tempo being explored. It’s a fun time to be into chess explanations.

maximamel 7 months ago

I analyzed the 2024 World Chess Championship match using empirical and synthesized approach. I focused on metrics like conversion rates, resilience rates, and the impact of errors on the match outcome. The analysis concentrates on providing a more overall outlook on the match, without doing a game-by-game breakdown. Let me know your thoughts!

gradstudent 7 months ago

The analysis mentions the correlation with the played moves vs. engines is ~95% for both players. But I recall a credible-seeming youtube analysis from last year's Hans Niemann cheating scandal which said the best players only have a ~70-75% correlation on average.

https://youtu.be/jfPzUgzrOcQ?t=222

I'm trying to cohere these two "facts". Does anyone know if the 2024 championship games simply played out along very well established lines?

chongli 7 months ago

You can’t compare those because they’re two different events. The World Chess Championship is unique among chess events because of the very long time controls (120 minutes per side, additional 30 minutes after 40 moves, plus 30 seconds per move starting from move 41) and the huge amount of prep time the players get to face only one opponent.
The prep time means players can stay within the top engine line for many many moves because they’ve memorized it completely. The generous time controls means the players have a lot of time to calculate the best move once they’re out of the prepared line. Lastly, the large amount of time increment after 40 moves (30 minutes plus 30s per move) means the players should be able to solve for draws or mates in the endgame. This is part of the reason Ding’s decisive blunder was so shocking: he had plenty of time but moved too quickly, not realizing his bishop could be trapped in the corner and traded off into a losing pawn endgame after he offered the rook trade.
maximamel 7 months ago

I think those are two different definitions. In the video, the engine correlation represents the amount of moves that matched the top move of a chess engine, as defined here: https://en.chessbase.com/post/let-s-check-engine-correlation... The accuracy metric in the article is defined a bit differently according to how Lichess computes it: https://lichess.org/page/accuracy
mellosouls 7 months ago

The scandal was a big nothing in the end (Niemann didn't cheat at the time, though he had admitted to doing so as a younger player), and the video lacks credibility in that regard.
It's not clear where your 70-75% claim comes from, but you would expect a higher accuracy in classical vs speed games for instance.

pncnmnp 7 months ago

> The average centipawn loss shows a very slight advantage (less than 1 centipawn) for Gukesh. This connects well with the accuracy metric we got, which showed a negligible advantage for Gukesh.

As I understand, for average centipawn loss, lower is better. It kind of measures how much worse a player’s average moves are compared to the best moves suggested by the engine. Based on your data, Ding has a very slight advantage, not Gukesh. Here is an article from chess.com (https://www.chess.com/blog/raync910/average-centipawn-loss-c...):

> The term average centipawn loss (ACPL) represents how much “value” a player drops by making incorrect moves during a chess game. ..... The lower an ACPL that player has, the more perfectly they played (at least in the eyes of the engine assessing the game).

maximamel 7 months ago

Thank you, you're right, I corrected this mistake. As the difference in acpl is negligible anyway, it does not affect the overall conclusions and insights.
eterm 7 months ago

Indeed. I'm not normally one to write off an article over a small mistake, but that's such a fundamental mistake that it puts into the question the value of the rest of the analysis.

JohnMakin 7 months ago

If you are a beginner/intermediate and looking to improve, I would take the common chess engines like chess.com/lichess with a huge, huge grain of salt, particularly accuracy %, and especially in the opening moves. While it is generally true that high accuracy is a more 'correct' game, there are many times moves will be marked inaccuracies that are neutral at worst, or, even worse in my mind, will give you perfect scores for accuracy when using an opening with 100% accuracy that is generally and statistically considered a losing opening. Lots of times, gambits will be regarded as errors/inaccuracies as well. It's important when you're viewing these numbers to look at why something was flagged and dig deeper, because if you blindly follow metrics, you will hit a ceiling where you absolutely will not beat more sound opponents. The other issue is that depending on the depth the engine is using, it may flag things differently.

Certhas 7 months ago

With all due respect, I don't think this is a very interesting analysis. It misses context, and the categories chosen are too arbitrary to carry much insight.

If you gradually misplay a position, but then your opponent makes one suboptimal move, your opponent has an inaccuracy while you don't. Low ACPL can indicate that players played well but also that they chose very safe, boring positions/opennings.

Further, engine evaluations can be misleading or useless in human chess. A position might be objectively winning/defensible, but only if you find a sequence of inhuman engine moves that are practically hard to find. Simply grouping together "evaluation > 1" as winning advantage to get a "conversion rate" is pretty uninformative.

The final blunder did not occur out of nowhere. Ding missed a much safer way to draw the game and went into a position that Nakamura judged as 50/50 between a draw and a Gukesh win [1].

I think it is much more informative to actually watch top players comment on the games and match overall. Keep in mind that Carlsen and Nakamura, who comment on the game in [1], are actually stronger players by ELO than the two finalists of the world championship [2].

[1] https://www.youtube.com/watch?v=uXc7Bc3zd0M

[2] https://2700chess.com/

maximamel 7 months ago

Thanks for sharing your opinion. I actually addressed many of the points you raised in the conclusions section of my article. I acknowledged the limitations of analyzing a chess match purely through numerical metrics. However, I still believe that looking at the match through this analytical lens offers a valuable perspective, complementing other types of analysis, such as commentary from players and bloggers. It provides a unique angle that, while imperfect, can uncover insights that might otherwise be overlooked. Ultimately, I see this as an additional tool in understanding the match, rather than a replacement for more traditional forms of analysis.
- Certhas 7 months ago
  
  What is the added insight though?
  - maximamel 7 months ago
    
    I believe it offers a neutral perspective on the game, without any bias that could exist in any analysis that is not data-driven. Sometimes when a chess commentator dislikes a particular player's style, it could be reflected in his commentary. For me personally, analyzing the match this way changed my view on it, but I completely understand if you do not feel that way.
    
    Certhas 7 months ago
    
    [flagged]
  - mistr0 7 months ago
    
    [flagged]

fatso784 7 months ago

Wow. This actually disproves a key subtext of the match mentioned by some commentators: that Ding failed to convert winning positions to wins. Instead, it shows that Ding converted more often than Gukesh. The fact that Gukesh won seems more a statistical anomaly in light of this evidence. We are indeed probably post-hoc rationalizing the winner.

The_Colonel 7 months ago

It doesn't really disprove anything. The problem with this type of analysis is that it's based on engines which are many levels above human play.
While watching the commentary, you will often see comments from super GMs like "engine suggest move XY, but it's not a move a human player would find/consider". The move may be optimal, but only if you're at this Stockfish 3600 ELO level because you need to precisely execute a series of 3600 ELO moves to exploit it. A suboptimal move for 3600 ELO player may be the optimal move for a 2800 ELO player, but Stockfish won't tell you.
I'm not saying this analysis isn't interesting, but we shouldn't overinterpret it.
- banannaise 7 months ago
  
  To add to this, part of what sets engines apart from humans is their understanding of time. The engine always knows whether it has time to complete an attack before the opponent can defend or counterattack - in other words, which player is truly attacking.
  If you make a calculation mistake, suddenly your attack falters, and you may have sacrificed material and/or positional integrity that puts you critically behind or makes you vulnerable to counterattack.
  This is part of how you get the narrative (in multiple games) that Ding got ahead but lost his nerve. The engine was saying he had time to attack, but he didn't have the certainty an engine does. He didn't immediately press that attack, and his opportunity disappeared.
- taytus 7 months ago
  
  To your point, Magnus Carlsen, arguably the GOAT, hung a rook yesterday.
seanhunter 7 months ago

Whenever you look at an analysis that leads you to a conclusion like this, your starting position must be that the analysis is wrong. ACPL etc are poor metrics to evaluate this chess match in particular where fatigue, time pressure, psychological factors so clearly dominated.
maximamel 7 months ago

Yes. To be honest, when the match was over, I was also left with the feeling that Ding did not capitalize enough on his opportunities. But later after crunching the data I saw that it was actually the other way around.
- seanhunter 7 months ago
  
  Sincerely, doesn't this make you question your methodology? It's such an obviously incorrect conclusion that you may as well have concluded that Ding actually won.
  I feel like your over-reliance on engine stats like ACPL has led you to some conclusions that may have been true had stockfish been playing leela but really have little or nothing to do with humans playing chess.

iconhacker 7 months ago

Ding is clearly a better player who can play at high level with little preparation. Gukesh have the whole Indian team behind him hardly put any pressure on Ding. Sometimes not the best player wins. And I am disappointed that Gukesh played as an average GM when he is out of his prep. There is definitely something lacking.

seanhunter 7 months ago

Definitely very much like an average GM.
Facts:
- 18 years old, GM since age 12
- Youngest ever world champion after being youngest ever winner of the candidates
- Youngest ever to have > 2750 FIDE
- Wins individual gold on board 1 at the olympiad with 9 points out of 10 matches and no losses
- Magnus Carlsen says "Gukesh almost never makes mistakes, which makes him an extremely dangerous opponent under any circumstances..."
Hackernews commenter:
- Gukesh plays like an average GM when out of prep
- rramadass 7 months ago
  
  Rightly pointed out. One can argue on why Ding made the mistake but that has nothing to do with Gukesh's talent. For me the fact that he is only 18 is huge; it means that he has more than a decade of honours waiting for him.
  HN seems to lately have had an influx of folks who just want to toot their "opinions" however clueless it might be. They need to be reminded of the Asimov quote; There is a false notion that democracy means that "my ignorance is just as good as your knowledge."
dh5 7 months ago

No one at this level is playing tournaments without massive preparation. This is especially true at the WCC where both sides have not just one but teams of seconds to help with prep.
Ding has been a shadow of himself ever since he won the world championship and if anything has been seen as the weakest world champion since 2006.
- iconhacker 7 months ago
  
  Yet, it took Gukesh pure luck to win the final game. So it is Ding’s loss than Gukesh’s win. Who is the weakest, your logic?
  - seanhunter 7 months ago
    
    “The winner of a game of chess is he who makes the last mistake but one.” - Probably Lasker but people often attribute this quote to Tartekower[1]
    Ding won the WCC due to a terrible blunder by Nepomniachi. Good as he was Fischer blundered his bishop like a patzer during his WCC match against Spassky and came back to win the match. Chess games between humans are generally decided by somebody making a mistake. Is that luck?
    Of course not. To put himself in a position to benefit from that luck he had to play extremely well for the entire match so that it would be even going into the last game.
    [1] https://chesshistory.com/winter/extra/mistake.html

bitmagier 7 months ago

This nice analysis shows the truth of an old chess saying which goes like this: A single blunder throws away a game with 40 perfectly played moves.

So lots of smaller inaccuracies together don't count as much as a single blunder.

Etheryte 7 months ago

A lot of commentators bring up the question of what if Ding had not blundered under time pressure in the last game. This overlooks the fact that Ding systematically struggled with time management and was under immense time pressure practically every game. Similar to how you can only cry wolf so many times, if you're always out of time, something's gotta give at one point. What was surprising to me was that blunders like that didn't happen more often.

k1kingy 7 months ago

Ding's problem was not his time management but his mental game. It's no secret he's been struggling for the last ~12 months and he admitted as much that he didn't prep well for this match. Having said that, the guy got himself into favorable positions multiple times and then was happy to trade-off pieces/repeat moves to get the draw.
The last game was where he took it a step too far. Several times during the game he had the opportunity to pressure Gukesh to find the correct sequence of moves, only to take the easy way out and trade a piece to make the game more drawish.
His blunder at the end was him thinking he'd just trade off the Rooks and kill off the game, but missed the fact that he basically sac'd his Bishop in the process.
- The_Colonel 7 months ago
  
  > Having said that, the guy got himself into favorable positions multiple times and then was happy to trade-off pieces/repeat moves to get the draw.
  According to the engine, he was in a slightly advantageous position, but from the post-game interviews it's clear he didn't realize his advantage.
  Often it's also an advantage which only an engine can exploit (by a series of difficult to find engine moves).
  - seanhunter 7 months ago
    
    This is so important, and people who watch games with engine eval or with commentators who use it don't realise it at all. Often in a chess game you're looking at a position and going "am I even ahead here?"...and when I do an analysis of my own games I'll often find that my own sense of the position doesn't at all agree with an engine evaluation. Clearly the situation is going to be much harder at the elite level where the edges are smaller and doubly so in this particular world championship where the positions were generally extremely complex and double-edged.
  - k1kingy 7 months ago
    
    But I think that just comes back to his mental game/lack of match prep. The Ding of 5 years ago that pushed even Magnus Carlsen wouldn't have been out of prep 5-8 moves into every game and could've afforded himself more time in the mid-game to find the advantage.
    It's basically what allowed Gukesh to do exactly that throughout the match. His opening prep was impressive and because he allowed himself time to think outside of the opening he at least tried to push on most games.
dmurray 7 months ago

It's still a good "what if", though. He'd made it through the first 13.9 of 14 games with only one tactical blunder. Even if he was overwhelmingly more likely to blunder than Gukesh in the final position (between time management, mental exhaustion, and the fact that the position isn't dangerous for Black at all while it's slightly dangerous for White), he was still an overwhelming favourite at that point to play 10 more reasonable moves and make it to the tie breaks, where several factors would have worked in his favour.
Something doesn't gotta give, when there's only a few moves left in a simplified position.
thom 7 months ago

While Ding did put himself under pressure in many games by taking long thinks in situations where it didn't really seem to benefit him, the pressure he put himself under in the final game was different. He forced a very uncomfortable endgame because he clearly thought he could draw it on autopilot. When he blundered he had 10 minutes on his clock and 30 second increment, he wasn't really under enormous time pressure, but it was a nasty position of his choosing. Either way, hard to have sympathy on a strategic level, as devastated as he clearly was in the moment.
gwd 7 months ago

Did he "struggle with time", or did he just work harder to find the move a chess engine would choose?
Basically in every single stat, Ding plays more like the chess engines; and overall he was able to capitalize better on an advantage and recover better from a disadvantage than Gukesh. Just looking at the data, I think it would be reasonable to conclude that Gukesh won mostly by luck: that the more probable outcome was that Ding didn't blunder in the final game.
On the other hand, Ding isn't a chess engine; he takes longer and gets tired sooner than a chess engine. One aspect of human chess is management of both time and intellectual energy, so there's certainly an argument to be made that the extra effort Ding put in to play more like a chess engine wasn't the optimal strategy for a human.
- Etheryte 7 months ago
  
  I think this misses the forest for the trees. At the end of the day, if you're competing to be the world champion in chess, your goal isn't to play as close to the engine as possible, it's to win games. If you play with 100% accuracy, but lose on time, you don't get to be the champion.
- pertymcpert 7 months ago
  
  You're discounting the fact that Gukesh could have also sacrificed good time management and spent more compute time on his moves for precision. The fact that he didn't do that doesn't mean he won on luck.

vouaobrasil 7 months ago

I think it's cool that we can get so technical in this day and age when it comes to chess. But I'll admit, I thought chess was more interesting in the older times when there wasn't chess engine analysis at your fingertips and there was a bit more mystery and no endgame bases that determine perfect play. I honestly do believe there eventually comes a point in human activity where knowing too much detracts from the beauty of the thing, which is very different from what I believed when I was younger.

I think computers do that -- they're fascinating and definitely helpful in knowledge acquisition but they often reveal too much. Maybe it's stuff we just shouldn't know.

k1kingy 7 months ago

This is a big reason why some of the top GMs (Magnus Carlsen in particular) aren't super interested in playing classical chess anymore.
It's at the point now where a reasonably good GM can learn and memorize a set of openings and more often than not draw the game against the top players. So in order to not lose games or rating points the game is played relatively conservative, rather than trying to push for a win.
Magnus and others are now trying to hype up freestyle chess (Fischer Random/Chess960) in order to take away the standard openings in order to avoid this memorization game and instead go back to the days where you're forced to calculate over the board.
- vouaobrasil 7 months ago
  
  Yes indeed. I have played Fischer Random myself and I find it a nicer game to play than traditional chess at times, especially with beginners who don't know much about openings.

Etheryte 7 months ago

I can't help but feel that while the concept is interesting, the article gives off too many LLM vibes. The factual errors and long winding sentences feel off.

andrelaszlo 7 months ago

I thought so too, the writing style has the tone, sentence structure, and word choice of an LLM.
Some examples that stood out to me:
"This allowed me to appreciate the nuances of the match and gain deeper insights into the strategies employed by both players."
"These reflections led me to analyze the match from an empirical and synthesized standpoint, aiming to form a cohesive picture of it as a whole."
I ran it through GPTZero: "We are highly confident this text was ai generated: 100% Probability AI generated"
The same goes for the author's comments and replies: https://news.ycombinator.com/threads?id=maximamel
This is what I would write if I was doing an LLM impression: "Thank you, you're right, I corrected this mistake."
- porridgeraisin 7 months ago
  
  Yeah especially the "rather than a replacement for more traditional forms of analysis" in one of his comments here tipped me off.
  - Koshcheiushko 7 months ago
    
    also , " Let me know your thoughts! "

whoitwas 7 months ago

[flagged]