I've spent the last few months in between searching for jobs exploring engine analysis of chess games and how parts of the analysis can be used to construct narratives about how the games went, so I opened up this article very quickly. This is a fine first attempt but I feel it's missing something very important, mainly the win/draw/loss percentages. Chess engines previously used centipawn as a way of comparing moves and positions within the search space, but now many if not most of the top engines are also incorporating the win/draw/loss percentage estimations that come from neural networks.
To that end, Julian at the Chess Engine Lab has developed a style of narrative and analysis that I feel like really uses the WDL percentages well.
His series on the 2024 World Chess Championship is great and I haven't seen anything else come close in terms of using a chess engine to craft an accessible analysis of the matches. Take one look at the WDL percentages from Game 14 and it becomes extremely clear what's about to happen and how the game evolved:
https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
I analyzed the 2024 World Chess Championship match using empirical and synthesized approach. I focused on metrics like conversion rates, resilience rates, and the impact of errors on the match outcome. The analysis concentrates on providing a more overall outlook on the match, without doing a game-by-game breakdown. Let me know your thoughts!
The analysis mentions the correlation with the played moves vs. engines is ~95% for both players. But I recall a credible-seeming youtube analysis from last year's Hans Niemann cheating scandal which said the best players only have a ~70-75% correlation on average.
You can’t compare those because they’re two different events. The World Chess Championship is unique among chess events because of the very long time controls (120 minutes per side, additional 30 minutes after 40 moves, plus 30 seconds per move starting from move 41) and the huge amount of prep time the players get to face only one opponent.
The prep time means players can stay within the top engine line for many many moves because they’ve memorized it completely. The generous time controls means the players have a lot of time to calculate the best move once they’re out of the prepared line. Lastly, the large amount of time increment after 40 moves (30 minutes plus 30s per move) means the players should be able to solve for draws or mates in the endgame. This is part of the reason Ding’s decisive blunder was so shocking: he had plenty of time but moved too quickly, not realizing his bishop could be trapped in the corner and traded off into a losing pawn endgame after he offered the rook trade.
The scandal was a big nothing in the end (Niemann didn't cheat at the time, though he had admitted to doing so as a younger player), and the video lacks credibility in that regard.
It's not clear where your 70-75% claim comes from, but you would expect a higher accuracy in classical vs speed games for instance.
> The average centipawn loss shows a very slight advantage (less than 1 centipawn) for Gukesh. This connects well with the accuracy metric we got, which showed a negligible advantage for Gukesh.
As I understand, for average centipawn loss, lower is better. It kind of measures how much worse a player’s average moves are compared to the best moves suggested by the engine. Based on your data, Ding has a very slight advantage, not Gukesh. Here is an article from chess.com (https://www.chess.com/blog/raync910/average-centipawn-loss-c...):
> The term average centipawn loss (ACPL) represents how much “value” a player drops by making incorrect moves during a chess game. ..... The lower an ACPL that player has, the more perfectly they played (at least in the eyes of the engine assessing the game).
Thank you, you're right, I corrected this mistake.
As the difference in acpl is negligible anyway, it does not affect the overall conclusions and insights.
Indeed. I'm not normally one to write off an article over a small mistake, but that's such a fundamental mistake that it puts into the question the value of the rest of the analysis.
Wow. This actually disproves a key subtext of the match mentioned by some commentators: that Ding failed to convert winning positions to wins. Instead, it shows that Ding converted more often than Gukesh. The fact that Gukesh won seems more a statistical anomaly in light of this evidence. We are indeed probably post-hoc rationalizing the winner.
It doesn't really disprove anything. The problem with this type of analysis is that it's based on engines which are many levels above human play.
While watching the commentary, you will often see comments from super GMs like "engine suggest move XY, but it's not a move a human player would find/consider". The move may be optimal, but only if you're at this Stockfish 3600 ELO level because you need to precisely execute a series of 3600 ELO moves to exploit it. A suboptimal move for 3600 ELO player may be the optimal move for a 2800 ELO player, but Stockfish won't tell you.
I'm not saying this analysis isn't interesting, but we shouldn't overinterpret it.
Yes. To be honest, when the match was over, I was also left with the feeling that Ding did not capitalize enough on his opportunities. But later after crunching the data I saw that it was actually the other way around.
With all due respect, I don't think this is a very interesting analysis. It misses context, and the categories chosen are too arbitrary to carry much insight.
If you gradually misplay a position, but then your opponent makes one suboptimal move, your opponent has an inaccuracy while you don't. Low ACPL can indicate that players played well but also that they chose very safe, boring positions/opennings.
Further, engine evaluations can be misleading or useless in human chess. A position might be objectively winning/defensible, but only if you find a sequence of inhuman engine moves that are practically hard to find. Simply grouping together "evaluation > 1" as winning advantage to get a "conversion rate" is pretty uninformative.
The final blunder did not occur out of nowhere. Ding missed a much safer way to draw the game and went into a position that Nakamura judged as 50/50 between a draw and a Gukesh win [1].
I think it is much more informative to actually watch top players comment on the games and match overall. Keep in mind that Carlsen and Nakamura, who comment on the game in [1], are actually stronger players by ELO than the two finalists of the world championship [2].
Thanks for sharing your opinion. I actually addressed many of the points you raised in the conclusions section of my article. I acknowledged the limitations of analyzing a chess match purely through numerical metrics. However, I still believe that looking at the match through this analytical lens offers a valuable perspective, complementing other types of analysis, such as commentary from players and bloggers. It provides a unique angle that, while imperfect, can uncover insights that might otherwise be overlooked. Ultimately, I see this as an additional tool in understanding the match, rather than a replacement for more traditional forms of analysis.
I believe it offers a neutral perspective on the game, without any bias that could exist in any analysis that is not data-driven. Sometimes when a chess commentator dislikes a particular player's style, it could be reflected in his commentary. For me personally, analyzing the match this way changed my view on it, but I completely understand if you do not feel that way.
A lot of commentators bring up the question of what if Ding had not blundered under time pressure in the last game. This overlooks the fact that Ding systematically struggled with time management and was under immense time pressure practically every game. Similar to how you can only cry wolf so many times, if you're always out of time, something's gotta give at one point. What was surprising to me was that blunders like that didn't happen more often.
Ding's problem was not his time management but his mental game. It's no secret he's been struggling for the last ~12 months and he admitted as much that he didn't prep well for this match. Having said that, the guy got himself into favorable positions multiple times and then was happy to trade-off pieces/repeat moves to get the draw.
The last game was where he took it a step too far. Several times during the game he had the opportunity to pressure Gukesh to find the correct sequence of moves, only to take the easy way out and trade a piece to make the game more drawish.
His blunder at the end was him thinking he'd just trade off the Rooks and kill off the game, but missed the fact that he basically sac'd his Bishop in the process.
But I think that just comes back to his mental game/lack of match prep. The Ding of 5 years ago that pushed even Magnus Carlsen wouldn't have been out of prep 5-8 moves into every game and could've afforded himself more time in the mid-game to find the advantage.
It's basically what allowed Gukesh to do exactly that throughout the match. His opening prep was impressive and because he allowed himself time to think outside of the opening he at least tried to push on most games.
While Ding did put himself under pressure in many games by taking long thinks in situations where it didn't really seem to benefit him, the pressure he put himself under in the final game was different. He forced a very uncomfortable endgame because he clearly thought he could draw it on autopilot. When he blundered he had 10 minutes on his clock and 30 second increment, he wasn't really under enormous time pressure, but it was a nasty position of his choosing. Either way, hard to have sympathy on a strategic level, as devastated as he clearly was in the moment.
It's still a good "what if", though. He'd made it through the first 13.9 of 14 games with only one tactical blunder. Even if he was overwhelmingly more likely to blunder than Gukesh in the final position (between time management, mental exhaustion, and the fact that the position isn't dangerous for Black at all while it's slightly dangerous for White), he was still an overwhelming favourite at that point to play 10 more reasonable moves and make it to the tie breaks, where several factors would have worked in his favour.
Something doesn't gotta give, when there's only a few moves left in a simplified position.
Did he "struggle with time", or did he just work harder to find the move a chess engine would choose?
Basically in every single stat, Ding plays more like the chess engines; and overall he was able to capitalize better on an advantage and recover better from a disadvantage than Gukesh. Just looking at the data, I think it would be reasonable to conclude that Gukesh won mostly by luck: that the more probable outcome was that Ding didn't blunder in the final game.
On the other hand, Ding isn't a chess engine; he takes longer and gets tired sooner than a chess engine. One aspect of human chess is management of both time and intellectual energy, so there's certainly an argument to be made that the extra effort Ding put in to play more like a chess engine wasn't the optimal strategy for a human.
I think this misses the forest for the trees. At the end of the day, if you're competing to be the world champion in chess, your goal isn't to play as close to the engine as possible, it's to win games. If you play with 100% accuracy, but lose on time, you don't get to be the champion.
You're discounting the fact that Gukesh could have also sacrificed good time management and spent more compute time on his moves for precision. The fact that he didn't do that doesn't mean he won on luck.
I think it's cool that we can get so technical in this day and age when it comes to chess. But I'll admit, I thought chess was more interesting in the older times when there wasn't chess engine analysis at your fingertips and there was a bit more mystery and no endgame bases that determine perfect play. I honestly do believe there eventually comes a point in human activity where knowing too much detracts from the beauty of the thing, which is very different from what I believed when I was younger.
I think computers do that -- they're fascinating and definitely helpful in knowledge acquisition but they often reveal too much. Maybe it's stuff we just shouldn't know.
This is a big reason why some of the top GMs (Magnus Carlsen in particular) aren't super interested in playing classical chess anymore.
It's at the point now where a reasonably good GM can learn and memorize a set of openings and more often than not draw the game against the top players. So in order to not lose games or rating points the game is played relatively conservative, rather than trying to push for a win.
Magnus and others are now trying to hype up freestyle chess (Fischer Random/Chess960) in order to take away the standard openings in order to avoid this memorization game and instead go back to the days where you're forced to calculate over the board.
Yes indeed. I have played Fischer Random myself and I find it a nicer game to play than traditional chess at times, especially with beginners who don't know much about openings.
I can't help but feel that while the concept is interesting, the article gives off too many LLM vibes. The factual errors and long winding sentences feel off.
I've spent the last few months in between searching for jobs exploring engine analysis of chess games and how parts of the analysis can be used to construct narratives about how the games went, so I opened up this article very quickly. This is a fine first attempt but I feel it's missing something very important, mainly the win/draw/loss percentages. Chess engines previously used centipawn as a way of comparing moves and positions within the search space, but now many if not most of the top engines are also incorporating the win/draw/loss percentage estimations that come from neural networks.
To that end, Julian at the Chess Engine Lab has developed a style of narrative and analysis that I feel like really uses the WDL percentages well.
https://substack.com/@chessenginelab
His series on the 2024 World Chess Championship is great and I haven't seen anything else come close in terms of using a chess engine to craft an accessible analysis of the matches. Take one look at the WDL percentages from Game 14 and it becomes extremely clear what's about to happen and how the game evolved: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam... https://chessenginelab.substack.com/p/engine-analysis-of-gam...
Those analyses are interesting.
The overarching story seems less like Ding made a blunder but more that Gukesh missed quite few opportunities to beat Ding long before the final game.
I analyzed the 2024 World Chess Championship match using empirical and synthesized approach. I focused on metrics like conversion rates, resilience rates, and the impact of errors on the match outcome. The analysis concentrates on providing a more overall outlook on the match, without doing a game-by-game breakdown. Let me know your thoughts!
The analysis mentions the correlation with the played moves vs. engines is ~95% for both players. But I recall a credible-seeming youtube analysis from last year's Hans Niemann cheating scandal which said the best players only have a ~70-75% correlation on average.
https://youtu.be/jfPzUgzrOcQ?t=222
I'm trying to cohere these two "facts". Does anyone know if the 2024 championship games simply played out along very well established lines?
You can’t compare those because they’re two different events. The World Chess Championship is unique among chess events because of the very long time controls (120 minutes per side, additional 30 minutes after 40 moves, plus 30 seconds per move starting from move 41) and the huge amount of prep time the players get to face only one opponent.
The prep time means players can stay within the top engine line for many many moves because they’ve memorized it completely. The generous time controls means the players have a lot of time to calculate the best move once they’re out of the prepared line. Lastly, the large amount of time increment after 40 moves (30 minutes plus 30s per move) means the players should be able to solve for draws or mates in the endgame. This is part of the reason Ding’s decisive blunder was so shocking: he had plenty of time but moved too quickly, not realizing his bishop could be trapped in the corner and traded off into a losing pawn endgame after he offered the rook trade.
I think those are two different definitions. In the video, the engine correlation represents the amount of moves that matched the top move of a chess engine, as defined here: https://en.chessbase.com/post/let-s-check-engine-correlation... The accuracy metric in the article is defined a bit differently according to how Lichess computes it: https://lichess.org/page/accuracy
The scandal was a big nothing in the end (Niemann didn't cheat at the time, though he had admitted to doing so as a younger player), and the video lacks credibility in that regard.
It's not clear where your 70-75% claim comes from, but you would expect a higher accuracy in classical vs speed games for instance.
> The average centipawn loss shows a very slight advantage (less than 1 centipawn) for Gukesh. This connects well with the accuracy metric we got, which showed a negligible advantage for Gukesh.
As I understand, for average centipawn loss, lower is better. It kind of measures how much worse a player’s average moves are compared to the best moves suggested by the engine. Based on your data, Ding has a very slight advantage, not Gukesh. Here is an article from chess.com (https://www.chess.com/blog/raync910/average-centipawn-loss-c...):
> The term average centipawn loss (ACPL) represents how much “value” a player drops by making incorrect moves during a chess game. ..... The lower an ACPL that player has, the more perfectly they played (at least in the eyes of the engine assessing the game).
Thank you, you're right, I corrected this mistake. As the difference in acpl is negligible anyway, it does not affect the overall conclusions and insights.
Indeed. I'm not normally one to write off an article over a small mistake, but that's such a fundamental mistake that it puts into the question the value of the rest of the analysis.
Wow. This actually disproves a key subtext of the match mentioned by some commentators: that Ding failed to convert winning positions to wins. Instead, it shows that Ding converted more often than Gukesh. The fact that Gukesh won seems more a statistical anomaly in light of this evidence. We are indeed probably post-hoc rationalizing the winner.
It doesn't really disprove anything. The problem with this type of analysis is that it's based on engines which are many levels above human play.
While watching the commentary, you will often see comments from super GMs like "engine suggest move XY, but it's not a move a human player would find/consider". The move may be optimal, but only if you're at this Stockfish 3600 ELO level because you need to precisely execute a series of 3600 ELO moves to exploit it. A suboptimal move for 3600 ELO player may be the optimal move for a 2800 ELO player, but Stockfish won't tell you.
I'm not saying this analysis isn't interesting, but we shouldn't overinterpret it.
Yes. To be honest, when the match was over, I was also left with the feeling that Ding did not capitalize enough on his opportunities. But later after crunching the data I saw that it was actually the other way around.
With all due respect, I don't think this is a very interesting analysis. It misses context, and the categories chosen are too arbitrary to carry much insight.
If you gradually misplay a position, but then your opponent makes one suboptimal move, your opponent has an inaccuracy while you don't. Low ACPL can indicate that players played well but also that they chose very safe, boring positions/opennings.
Further, engine evaluations can be misleading or useless in human chess. A position might be objectively winning/defensible, but only if you find a sequence of inhuman engine moves that are practically hard to find. Simply grouping together "evaluation > 1" as winning advantage to get a "conversion rate" is pretty uninformative.
The final blunder did not occur out of nowhere. Ding missed a much safer way to draw the game and went into a position that Nakamura judged as 50/50 between a draw and a Gukesh win [1].
I think it is much more informative to actually watch top players comment on the games and match overall. Keep in mind that Carlsen and Nakamura, who comment on the game in [1], are actually stronger players by ELO than the two finalists of the world championship [2].
[1] https://www.youtube.com/watch?v=uXc7Bc3zd0M
[2] https://2700chess.com/
Thanks for sharing your opinion. I actually addressed many of the points you raised in the conclusions section of my article. I acknowledged the limitations of analyzing a chess match purely through numerical metrics. However, I still believe that looking at the match through this analytical lens offers a valuable perspective, complementing other types of analysis, such as commentary from players and bloggers. It provides a unique angle that, while imperfect, can uncover insights that might otherwise be overlooked. Ultimately, I see this as an additional tool in understanding the match, rather than a replacement for more traditional forms of analysis.
What is the added insight though?
I believe it offers a neutral perspective on the game, without any bias that could exist in any analysis that is not data-driven. Sometimes when a chess commentator dislikes a particular player's style, it could be reflected in his commentary. For me personally, analyzing the match this way changed my view on it, but I completely understand if you do not feel that way.
[flagged]
This nice analysis shows the truth of an old chess saying which goes like this: A single blunder throws away a game with 40 perfectly played moves.
So lots of smaller inaccuracies together don't count as much as a single blunder.
A lot of commentators bring up the question of what if Ding had not blundered under time pressure in the last game. This overlooks the fact that Ding systematically struggled with time management and was under immense time pressure practically every game. Similar to how you can only cry wolf so many times, if you're always out of time, something's gotta give at one point. What was surprising to me was that blunders like that didn't happen more often.
Ding's problem was not his time management but his mental game. It's no secret he's been struggling for the last ~12 months and he admitted as much that he didn't prep well for this match. Having said that, the guy got himself into favorable positions multiple times and then was happy to trade-off pieces/repeat moves to get the draw.
The last game was where he took it a step too far. Several times during the game he had the opportunity to pressure Gukesh to find the correct sequence of moves, only to take the easy way out and trade a piece to make the game more drawish.
His blunder at the end was him thinking he'd just trade off the Rooks and kill off the game, but missed the fact that he basically sac'd his Bishop in the process.
> Having said that, the guy got himself into favorable positions multiple times and then was happy to trade-off pieces/repeat moves to get the draw.
According to the engine, he was in a slightly advantageous position, but from the post-game interviews it's clear he didn't realize his advantage.
Often it's also an advantage which only an engine can exploit (by a series of difficult to find engine moves).
But I think that just comes back to his mental game/lack of match prep. The Ding of 5 years ago that pushed even Magnus Carlsen wouldn't have been out of prep 5-8 moves into every game and could've afforded himself more time in the mid-game to find the advantage.
It's basically what allowed Gukesh to do exactly that throughout the match. His opening prep was impressive and because he allowed himself time to think outside of the opening he at least tried to push on most games.
While Ding did put himself under pressure in many games by taking long thinks in situations where it didn't really seem to benefit him, the pressure he put himself under in the final game was different. He forced a very uncomfortable endgame because he clearly thought he could draw it on autopilot. When he blundered he had 10 minutes on his clock and 30 second increment, he wasn't really under enormous time pressure, but it was a nasty position of his choosing. Either way, hard to have sympathy on a strategic level, as devastated as he clearly was in the moment.
It's still a good "what if", though. He'd made it through the first 13.9 of 14 games with only one tactical blunder. Even if he was overwhelmingly more likely to blunder than Gukesh in the final position (between time management, mental exhaustion, and the fact that the position isn't dangerous for Black at all while it's slightly dangerous for White), he was still an overwhelming favourite at that point to play 10 more reasonable moves and make it to the tie breaks, where several factors would have worked in his favour.
Something doesn't gotta give, when there's only a few moves left in a simplified position.
Did he "struggle with time", or did he just work harder to find the move a chess engine would choose?
Basically in every single stat, Ding plays more like the chess engines; and overall he was able to capitalize better on an advantage and recover better from a disadvantage than Gukesh. Just looking at the data, I think it would be reasonable to conclude that Gukesh won mostly by luck: that the more probable outcome was that Ding didn't blunder in the final game.
On the other hand, Ding isn't a chess engine; he takes longer and gets tired sooner than a chess engine. One aspect of human chess is management of both time and intellectual energy, so there's certainly an argument to be made that the extra effort Ding put in to play more like a chess engine wasn't the optimal strategy for a human.
I think this misses the forest for the trees. At the end of the day, if you're competing to be the world champion in chess, your goal isn't to play as close to the engine as possible, it's to win games. If you play with 100% accuracy, but lose on time, you don't get to be the champion.
You're discounting the fact that Gukesh could have also sacrificed good time management and spent more compute time on his moves for precision. The fact that he didn't do that doesn't mean he won on luck.
I think it's cool that we can get so technical in this day and age when it comes to chess. But I'll admit, I thought chess was more interesting in the older times when there wasn't chess engine analysis at your fingertips and there was a bit more mystery and no endgame bases that determine perfect play. I honestly do believe there eventually comes a point in human activity where knowing too much detracts from the beauty of the thing, which is very different from what I believed when I was younger.
I think computers do that -- they're fascinating and definitely helpful in knowledge acquisition but they often reveal too much. Maybe it's stuff we just shouldn't know.
This is a big reason why some of the top GMs (Magnus Carlsen in particular) aren't super interested in playing classical chess anymore.
It's at the point now where a reasonably good GM can learn and memorize a set of openings and more often than not draw the game against the top players. So in order to not lose games or rating points the game is played relatively conservative, rather than trying to push for a win.
Magnus and others are now trying to hype up freestyle chess (Fischer Random/Chess960) in order to take away the standard openings in order to avoid this memorization game and instead go back to the days where you're forced to calculate over the board.
Yes indeed. I have played Fischer Random myself and I find it a nicer game to play than traditional chess at times, especially with beginners who don't know much about openings.
I can't help but feel that while the concept is interesting, the article gives off too many LLM vibes. The factual errors and long winding sentences feel off.
I thought so too, the writing style has the tone, sentence structure, and word choice of an LLM.
Some examples that stood out to me:
"This allowed me to appreciate the nuances of the match and gain deeper insights into the strategies employed by both players."
"These reflections led me to analyze the match from an empirical and synthesized standpoint, aiming to form a cohesive picture of it as a whole."
I ran it through GPTZero: "We are highly confident this text was ai generated: 100% Probability AI generated"
The same goes for the author's comments and replies: https://news.ycombinator.com/threads?id=maximamel
This is what I would write if I was doing an LLM impression: "Thank you, you're right, I corrected this mistake."
Yeah especially the "rather than a replacement for more traditional forms of analysis" in one of his comments here tipped me off.
[flagged]