I tried it out and it couldn't one-shot the problem I tried, but after some coaching it was able to arrive at the correct answer. I wanted it to search for a few words inside a grid of letters. It was only able to succeed once it enumerated the grid and counted the position of each letter.
Someone suggested I try asking it for a program to do this search, and it correctly generated the grid and list of letters. The search code had some bugs, but after a few iterations it was able to succeed. It turned out to be a really simply bug too: the letters in the grid were uppercase, while the words in the search list were lowercase. Converting the words to uppercase and stripping any spaces was enough to get it to succeed!
Overall this really feels like an amazing step forward, and it's the first Google model that has genuinely impressed me. Now I'm very excited for what will be coming next.
I tried it out and it couldn't one-shot the problem I tried, but after some coaching it was able to arrive at the correct answer. I wanted it to search for a few words inside a grid of letters. It was only able to succeed once it enumerated the grid and counted the position of each letter.
Someone suggested I try asking it for a program to do this search, and it correctly generated the grid and list of letters. The search code had some bugs, but after a few iterations it was able to succeed. It turned out to be a really simply bug too: the letters in the grid were uppercase, while the words in the search list were lowercase. Converting the words to uppercase and stripping any spaces was enough to get it to succeed!
Overall this really feels like an amazing step forward, and it's the first Google model that has genuinely impressed me. Now I'm very excited for what will be coming next.
You can see the chain of thought! That's nice.
OTOH, I asked it a math question that o1 got right and it messed up.
The link for the docs: https://ai.google.dev/gemini-api/docs/thinking-mode
"Thinking Mode is an experimental model and has the following limitations:
32k token input limit Text and image input only 8k token output limit Text only output No built-in tool usage like Search or code execution"
[flagged]