> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc.
This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.
The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).
I don't care what anyone says, as yesterday I made a comment that truth has essentially died: https://news.ycombinator.com/item?id=43308513 If you have a revolutionary intelligence product, why is it not working for me?
Ultimately, every AI thing I've tried in this era seems to want to make me happy, even if it's wrong, instead of helping me.
I describe it like "an eager intern who can summarize a 20-min web search session instantly, but ultimately has insufficient insight to actually help you". (Note to current interns: I'm mostly describing myself some years ago; you may be fantastic so don't take it personally!)
Most of my interactions with it via text prompt or builtin code suggestions go like this:
1. Me: I want to do X in C++. Show me how to do it only using stdlib components (no external libraries).
2. LLM: Gladly! Here is solution X
3. Me: Remove the undefined behavior from foo() and fix the methods that call it
4. LLM: Sure! Here it is (produces solution X again)
5. Me: No you need to remove the use of uninitialized variables as the out parameters.
6. LLM: Oh certainly! Here is the correct solution (produces a completely different solution that also has issues)
7. Me: No go back to the first one
etc
For the ones that suggest code, it can at least suggest some very simple boilerplate very easily (e.g. gtest and gmock stuff for C++), but asking it to do anything more significant is a real gamble. Often I end up spending more time scrutinizing the suggested code than writing a version of it myself.
The difference is that interns can learn, and can benefit from reference items like a prior report, whose format and structure they can follow when working on the revisions.
AI is just AI. You can upload a reference file for it to summarize, but it's not going to be able to look at the structure of the file and use that as a template for future reports. You'll still have to spoon-feed it constantly.
7 is the worst part about trying to review my coworker's code that I'm 99% confident is copilot output - and to be clear, I don't really care how someone chooses to write their code, I'll still review it as evenly as I can.
I'll very rarely ask someone to completely rewrite a patch, but so often a few minor comments get addressed with an entire new block of code that forces me to do a full re-review, and I can't get it across to him that that's not what I'm asking for.
interns can generally also tell me "tbh i have no damn idea", while AI just talks out it's virtual ass, and I can't read from it's voice or behavior that maybe it's not sure.
interns can also be clever and think outside the box. this is mostly not good, but sometimes they will surprise you in a good way. the AI by definition can only copy what someone else has done.
The last line has been my experience as well. I only trust what I've verified firsthand now because the Internet is just so rife with people trying to influence your thoughts in a way that benefits them, over a good faith sharing of the truth.
I just recently heard this quote from a clip of Jeff Bezos: "When the data and the anecdotes disagree, the anecdotes are usually right.", and I was like... wow. That quote is the zeitgeist.
If it's so revolutionary, it should be immediately obvious to me. I knew Uber, Netflix, Spotify were revolutionary the first time I used them. With LLMs for coding, it's like I'm groping in the dark trying to find what others are seeing, and it's just not there.
> I knew Uber, Netflix, Spotify were revolutionary the first time I used them.
Maybe re-tune your revolution sensor. None of those are revolutionary companies. Profitable and well executed, sure, but those turn up all the time.
Uber's entire business model was running over the legal system so quickly that taxi licenses didn't have time to catch up. Other than that it was a pretty obvious idea. It is a taxi service. The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
Netflix was anticipated online by and is probably inferior to YouTube except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs. And torrenting had been a thing for a long time already showing how to do online distribution of video content.
They were revolutionary as product genres, not necessary individual companies. Ordering a cab without making a phone call was revolutionary. Netflix at least with its initial promise of having all the world's movies and TV was revolutionary, but it didn't live up to that. Spotify because of how cheap and easy it was to have access to all the music, this was the era when people were paying 99c per song on iTunes.
I've tried some AI code completion tools and none of them hit me that way. My first reaction was "nobody is actually going to use this stuff" and that opinion hasn't really changed.
And if you think those 3 companies weren't revolutionary then AI code completion is even less than that.
> They were revolutionary as product genres, not necessary individual companies.
Even then, they were evolutionary at best.
Before Netflix and Spotify, streaming movies and music were already there as a technology, ask anybody with a Megaupload or Sopcast account. What changed was that DMCA acquired political muscle and cross-border reach, wiping out waves of torrent sites and P2P networks. That left a new generation of users with locked-down mobile devices no option but to use legitimate apps who had deals in place with the record labels and movie studios.
Even the concept of "downloading MP3s" disappeared because every mobile OS vendor hated the idea of giving their customers access to the filesystem, and iOS didn't even have a file manager app until well into the next decade (2017).
> What changed was that DMCA acquired political muscle and cross-border reach, wiping out waves of torrent sites and P2P networks.
Half true - that was happening some, but wasn't why music piracy mostly died out. DMCA worked on centralized platforms like YouTube, but the various avenues for downloading music people used back then still exist, they're just not used as much anymore. Spotify was proof that piracy is mostly a service problem: it was suddenly easier for most people to get the music they wanted through official channels than through piracy.
DMCA claims took out huge numbers of public torrent trackers which was how 99% of people accessed contraband media. All the way back in 2008, the loss of TorrentSpy.com probably shifted everybody to private trackers, but it's a whack-a-mole game there too and most people won't bother.
DMCA also led to the development of ContentID and automated copyright strike system on Youtube, but it didn't block you from downloading the stream as a high bitrate MP3, which is possible even now.
> streaming movies and music were already there as a technology, ask anybody with a Megaupload or Sopcast account.
You can't have a revolution without users. It's the ability to reach a large audience, through superior UX, superior business model, superior marketing, etc. which creates the possibility for revolutionary impact.
Which is why Megaupload and Sopcast didn't revolutionize anything.
Yes, but Google left that functionality half baked intentionally, letting 3rd party developers fill the void. Even now the Google Files app feels like a toy compared to Fossify Explorer or Solid Explorer.
There was a gain in precision going from phone call to app. There is a loss of precision going from app to voice. The tradeoff of precision for convenience is rarely worth it.
Because if it were, Uber would just make a widget asking "Where do you want to go?" and you'd enter "Airport" and that would be it. If a widget of some action is a bad idea, so is the voice command.
"Do something existing with a different mechanism" is innovative, but not revolutionary, and certainly not a new "product genre". My parents used to order pizza by phone calls, then a website, then an app. It's the same thing. (The friction is a little bit less, but maybe forcing another human to bring food to you because you're feeling lazy should have a little friction. And as a side effect, we all stopped being as comfortable talking to real people on phone calls!)
The experience of Netflix, Spotify, and Uber were revolutionary. It felt like the future, and it worked as expected. Sure, we didn't realize the poison these products were introducing into many creative and labor ecosystems, nor did we fully appreciate how they would operate as means to widen the income inequality gap by concentrating more profits to executives. But they fit cleanly into many of our lives immediately.
Debating whether that's "revolutionary" or "innovative" or "whatever-other-word" is just a semantic sideshow common to online discourse. It's missing the point. I'll use whatever word you want, but it doesn't change the point.
"Simple, small" and "good marketing" seem like obvious undersells considering the titanic impacts Netflix and Spotify (for instance) have had on culture, personal media consumption habits, and the economics of industries. But if that's the semantic construction that works for you, so be it.
> The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
The impact of this was quite revolutionary.
> except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs.
The way in which they did this was quite innovative, if not "revolutionary". They used the data they had from the watching habits of their large user base to decide what kinds of content to invest in creating.
In screwing over a lot of people around the world, yes. Otherwise, not really. Ordering rides by app was an obvious next step that's already been pursued independently everywhere.
> They used the data they had from the watching habits of their large user base to decide what kinds of content to invest in creating.
And they successfully created a line of content universally known as something to avoid. Tracks with the "success" of recommendation systems in general.
Not only Uber/Grab (or delivery app) were revolutionary, they are still revolutionary. I could live without LLMs and my life will be slightly impacted when coding. If delivery apps are not available, my life is severely degraded. The other day I was sick. I got medicine and dinner with Grab. Delivered to the condo lobby which is as far as I can get. That is revolutionary.
Honestly, yes. Calling in an order can result in the restaurant botching the order and you have no way to challenge it unless you recorded the call. Also, as someone who’s been on both sides of the transaction, some people have poor audio quality or speak accented English, which is difficult to understand. Ordering from a screen saves everyone valuable time and reduces confusion.
I’ve had app delivery orders get botched, drivers get lost on their way to my apartment, food show up cold or ruined, etc.
The worst part is that when DoorDash fucks up an order, the standard remediation process every other business respects—either a full refund or come back, pick up the wrong order, and bring you the correct order—is just not something they ever do. And if you want to avoid DoorDash, you can’t because if you order from the restaurant directly it often turns out to be white label DoorDash.
Some days I wish there was a corporate death penalty and that it could be applied to DoorDash.
Practically or functionally? Airbnb was invented by people posting on craigslist message boards, and even existed before the Internet, if you had rich friends with spare apartments. But by packaging it up into an online platform it became a company with 2.5 billion in revenue last year. So you can dismiss ordering from a screen instead of looking at a piece of paper and using the phone as not being revolutionary, because of you squint, they're the same thing, but I can now order take out for restaurants I previously would never have ordered from, and Uber Eats generated $13.7 billion in revenue last year, up from 12.2.
Again, the "revolutionary" aspect that made Uber and AirBnB big names, as opposed to any of the plethora of competitors who were doing the same thing at the same time or before, is that these two gained "innovative" competitive advantage by breaking the law around the world.
Obviously you can get ahead if you ignore the rules everyone else plays by.
If we throw away the laws, there's a lot more unrealized "innovation" waiting.
The taxi cab companies were free to innovate and create their own app. And we could continue to have drivers who's credit card machine didn't work until suddenly it does because you don't have any cash. Regulatory capture is anti-capitalism.
Yes, let's throw away the bad laws that are only there to prop up ossified power structures that exist for no good reason, and innovate!
Some laws are good, some laws are bad. we don't have to agree on which ones are which, but it's an oversimplification to frame it as merely that.
Before the proliferation of Uber Eats, Doordash, GrubHub, etc, most of the places I've lived had 2 choices for delivered food: pizza and Chinese.
It has absolutely massively expanded the kinds of food I can get delivered living in a suburban bordering on rural area. It might be a different experience in cities where the population size made delivery reasonable for many restaurants to offer on their own.
Now if anyone solves the problem that for most cuisines ordered food is vastly inferior to freshly served meals. That would be revolutionary.
Crisp fries and pizza. Noodles perfectly Al dente and risotto that has not started to thicken.
It's far from a perfect solution, but I applaud businesses that have tried to improve the situation through packaging changes. IHOP is a stand-out here, in my experience. Their packaging is very sturdy and isolates each component in its own space. I've occasionally been surprised at how hot the food is.
Revolutionary things are things that change how society actually works at a fundamental level. I can think of four technologies of the past 40 years that fit that bill:
the personal computer
the internet
the internet connected phone
social media
those technologies are revolutionary, because they caused fundamental changes to how people behave. People who behaved differently in the "old world" were forced to adapt to a "new world" with those technologies, whether they wanted to or not. A newer more convenient way of ordering a taxicab or watching a movie or music are great consumer product stories, and certainly big money makers. They don't cause complex and not fully understood changes to way people work, play, interact, self-identify, etc. the way that revolutionary technologies do.
Language models feel like they have the potential to be a full blown sociotechnological phenomenon like the above four. They don't have a convenient consumer product story beyond ChatGPT today. But they are slowly seeping into the fabric of things, especially on social media, and changing the way people apply to jobs, draft emails, do homework, maybe eventually communicate and self-identify at a basic level.
I'd almost say that the lack of a smash bang consumer product story is even more evidence that the technology is diffusing all over the place.
Build the much maligned Todo app with Aider and Claude for yourself. give it one sentence and have it spit out working, if imperfect code. iterate. add a graph for completion or something and watch it pick and find a library without you having to know the details of that library. fine, sure, it's just a Todo app, and it'll never work for a "real" codebase, whatever that means, but holy shit, just how much programming did you need to get down and dirty with to build that "simple" Todo app? Obviously building a Todo app before LLMs was possible, but abstracted out, the fact that it can be generated like that's not a game changer?
How are you surprise that getting an LLM to spit out a clone of a very common starter project is evidence of it being able to generate non trivial and valuable code - as in not a clone of overabundant codebases - on demand?
because in actually doing the exercise, and not just talking about it, you'd come up with your own tweak on the Todo app that couldn't be directly be in the training data. you, as a smart human, could come up with a creative feature for your Todo app to have, that no one else would make, showing that these things can compose between the things in their training data and produce a unique combination that didn't exist before. copying example-todo.app to my-todo.app isn't what's impressive, having it able to add features that aren't in the example app is what is. If it only has a box of Lego and can only build things from them, and can't invent new Lego blocks, there's still a large amount of things it can be told to build. That it can assemble those blocks together into a new model that isn't in the instruction manual might not be the most surprising thing in the world, but when that's what most software development is, is the fact that it can't invent new blocks really going to hold it back that much?
While I don't disagree with that observation, it falls into the "well, duh!"-category for me. The models are build with no mechanism for long term memory and thus suck at tasks that require long term memory. There is nothing surprising here. There was never any expectation that LLMs magically develop long term memory, as that's impossible given the architecture. They predict the next word and once the old text moves out of the context window, it's gone. The models neither learn as they work nor can they remember the past.
It's not even like humans are all that different here. Strip a human of their tools (pen&paper, keyboard, monitor, etc.) and have them try solving problems with nothing but the power of their brain and they'll struggle a hell of a lot too, since our memory ain't exactly perfect either. We don't have perfect recall, we look things up when we need to, a large part of our "memory" is out there in the world around us, not in our head.
The open question is how to move forward. But calling AI progress a dead end before we even started exploring long term memory, tool use and on-the-fly learning is a tad little premature. It's like calling quits on the development of the car before you put the wheels on.
> If you have a revolutionary intelligence product, why is it not working for me?
Is programming itself revolutionary? Yes. Does it work for most people? I don't even know how to parse that question, most people aren't programmers and need to spend a lot of effort to be able to harness a tool like programming. Especially in the early days of software dev, when programming was much harder.
Your position of "I'll only trust things I see with my own eyes" is not a very good one, IMO. I mean, for sure the internet is full of hype and tricksters, but your comment yesterday was on a Tweet by Steve Yegge, a famous and influential software developer and software blogger, who some of us have been reading for twenty years and has taught us tons.
He's not a trickster, not a fraud, and if he says "this technology is actually useful for me, in practice" then I believe he has definitely found an actual use of the technology. Whether I can find a similar use for that technology is a question - it's not always immediate. He might be working in a different field, with different constraints, etc. But most likely, he's just doing something he's learned how to do and I don't, meaning I want to learn it.
Nope. I try the latest models as they come and I have a self-made custom setup (as in a custom lua plugin) in Neovim. What I am not, is selling AI or AI-driven solutions.
Similar experience, I try so hard to make AI useful, and there are some decent spots here and there. Overall though I see the fundamental problem being that people need information. Language isn't strictly information, and the LLMs are very good at language, but they aren't great at information. I think anything more than the novelty of "talking" to the AI is very over hyped.
There is some usefulness to be had for sure, but I don't know if the usefulness is there with the non-subsidized models.
yeah, but why does the fact that it's vc subsidized matter to you? the price is the price. I don't go to the store and look at eggs and lettuce and consider how much of my tax money goes into subsiding farmers before buying their products. maybe the prices will go up, maybe they'll go down due to competition. Thai doesn't stop me from using them though.
Because if they're not covering their costs now, then eventually they will which either means service degradation (cough ads cough) or price increases.
I applaud the GP for thinking about this before it becomes an issue.
It's worth actually trying Cursor, because it is a valuable step change over previous products and you might find it's better in some ways than your custom setup. The processes they use for creating the context seems to be really good. And their autocomplete is far better than Copilot's in ways that could provide inspiration.
That said, you're right that it's not as overwhelmingly revolutionary as the internet would lead you to believe. It's a step change over Copilot.
Do you mean that you have successfully managed to get the same experience in cursor but in neovim? I have been looking for something like that to move back to my neovim setup instead of using cursor. Any hints would be greatly appreciated!
Start with Avante or CopilotChat. Create your own Lua config/plugin (easy with Claude 3.5 ;) ) and then use their chat window to run copilot/models. Most of my custom config was built with Claude 3.5 and some trial/error/success.
I have used neural networks for engineering problems since the 1980s. I say this as context for my opinion: I cringe at most applications of LLMs that attempt mostly autonomous behavior, but I love using LLMs as ‘side kicks’ as I work. If I have a bug in my code, I will add a few printout statements where I think my misunderstanding of my code is, show an LLM my code and output, explain the error: I very often get useful feedback.
I also like practical tools like NotebookLM where I can pose some questions, upload PDFs, and get a summary based in what my questions.
My point is: my brain and experience are often augmented in efficient ways by LLMs.
So far I have addressed practical aspects of LLMs. I am retired so I can spend time on non practical things: currently I am trying to learn how to effectively use code generated by gemini 2.0 flash at runtime; the gemini SDK supports this fairly well so I am just trying to understand what is possible (before this I spent two months experimenting with writing my own tools/functions in Common Lisp and Python.)
I “wasted” close to two decades of my professional life on old fashioned symbolic AI (but I was well paid for the work) but I am interested in probabilistic approaches, such as in a book I bought yesterday “Causal AI” that was just published.
Lastly, I think some of the recent open source implementations of new ideas from China are worth carefully studying.
I'll add this in case it's helpful to anyone else: LLMs are really good at regex and undoing various encodings/escaping, especially nested ones. I would go so far to say that it's better than a human at the latter.
I once spend over an hour trying to unescape JSON containing UTF8 values that's been escaped prior to being written to AWS's Cloudwatch Logs for MySQL audit logs. It was a horrific level of pain until I just asked ChatGPT to do it and it figured out all the series of escapes and encoding immediately and gave me the step to reverse them all.
LLM as a sidekick has saved me so much time. I don't really use it to generate code but for some odd tasks or API look up, it's a huge time saver.
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.
My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).
Oh, no, you’re 100% right. One of these days I will pen my essay on the realities of outsourced labor.
Spoiler alert: they are giving just barely enough to not get prematurely fired, because they know if you’re cheap enough to outsource in the first place, you’ll give the contract to whoever is cheapest at renewal anyway.
There's absolutely no way that we're not going to see a massive reduction in the need for "humans writing code" moving forward, given how good LLMs are getting at writing code.
That doesn't mean people won't need devs! I think there's a real case where increased capabilities from LLMs leads to bigger demand for people that know how to direct the tools effectively, of which most would probably be devs. But thinking we're going back to humans "writing readable, functional, maintainable code" in two years is cope.
> There's absolutely no way that we're not going to see a massive reduction in the need for "humans writing code" moving forward, given how good LLMs are getting at writing code.
Sure, but in the same way that Squarespace and Wix killed web development. LLMs are going to replace a decent bunch of low-hanging fruit, but those jobs were always at risk of being outsourced to the lowest bidder over in India anyways.
The real question is, what's going to happen to the interns and the junior developers? If 10 juniors can create the same output as a single average developer equipped with a LLM, who's going to hire the juniors? And if nobody is hiring juniors, how are we supposed to get the next generation of seniors?
Similarly, what's going to happen to outsourcing? Will it be able to compete on quality and price? Will it secretly turn into nothing more than a proxy to some LLM?
> And if nobody is hiring juniors, how are we supposed to get the next generation of seniors?
Maybe stop tasking seniors with training juniors, and put them back on writing production code? That will give you one generation and vastly improve products across the board :).
The concern about entry-level jobs is valid, but I think it's good to remember that in the past years, almost all coding is done at entry-level, because if you do it long enough to become moderately competent, you tend to get asked to stop doing it, and train up a bunch of new hires instead.
Hate to be the guy to bring it up but Jevons paradox - in my experience, people are much more eager to build software in the LLM age, and projects are getting started (and done!) that were considered 'too expensive to build' or people didn't have the necessary subject matter expertise to build them.
Just a simple crud-ish project needs frontend, backend, infra, cloud, ci/cd experience, and people who could build that as one man shows were like unicorns - a lot of people had a general how most of this stuff worked, but lacked the hands on familiarity with them. LLMs made that knowledge easy and accessible. They certainly did for me.
I've shipped more software in the past 1-2 years than the 5 years before that. And gained tons of experience doing it. LLMs helped me figure out the necessary software, and helped me gain a ton of experience, I gained all those skills, and I feel quite confident in that I could rebuild all these apps, but this time without the help of these LLMs, so even the fearmongering that LLMs will ;make people forget how to code' doesn't seem to ring true.
I think the blind spot here is that, while LLMs may decrease the developer-time cost of software, it will increase the lifetime ownership cost. And since this is a time delayed signal, it will cause a bullwhip effect. If hiring managers were mad at the 2020 market, 2030 will be a doozy. There will be increased liability in the form of over engineered and hard to maintain code bases, and a dearth of talent able to undo the slopcode.
What lasting consequences? Crowdstrike and the 2017 Equifax hack that leaked all our data didn't stop them. The shares of crowdstrike after it happened I bought are up more than the SP500. Elon went through Twitter and fired everybody but it hasn't collapsed. A carpenter has a lot of opinions about the woodworking used on cheap IKEA cabinets, but mass manufacturing and plastic means that building a good solid high quality chair is no longer the craft it used to be.
The thing I can't wrap my head around is that I work on extremely complex AI agents every day and I know how far they are from actually replacing anyone. But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.
It kind of reminds me of the Y2K scare. Leading up to that, there were a lot of people in groups like comp.software.year-2000 who claimed to be doing Y2K fixes at places like the IRS and big corporations. They said they were just doing triage on the most critical systems, and that most things wouldn't get fixed, so there would be all sorts of failures. The "experts" who were closest to the situation, working on it in person, turned out to be completely wrong.
I try to keep that in mind when I hear people who work with LLMs, who usually have an emotional investment in AI and often a financial one, speak about them in glowing terms that just don't match up with my own small experiments.
I just want to pile on here. Y2K was avoided due to a Herculean effort across the world to update systems. It was not an imaginary problem. You'll see it again in the lead up to 2038 [0].
I used to believe that until, over a decade later, I read stories from those ""experts" who were closest to the situation", and it turns out Y2K was serious and it was a close call.
wish this didnt resonate with me so much. Im far from a 10x developer, and im in an organization that feels like a giant, half dead whale. Sometimes people here seem like they work on a different planet.
> But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
An assembly language programmer might have said the same about C programming at one point. I think the point is, that once you depend on a more abstract interface that permits you to ignore certain details, that permits decades of improvements to that backend without you having to do anything. People are still experimenting with what this abstract interface is and how it will work with AI, but they've already come leaps and bounds from where they were only a couple of years ago, and it's only going to get better.
There are some fields though where they can replace humans in significant capacity. Software development is probably one of the least likely for anything more than entry level, but A LOT of engineering has a very very real existential threat. Think about designing buildings. You basically just need to know a lot of rules / tables and how things interact to know what's possible and the best practices. A purpose built AI could develop many systems and back test them to complete the design. A lot of this is already handled or aided by software, but a main role of the engineer is to interface with the non-technical persons or other engineers. This is something where an agent could truly interface with the non-engineer to figure out what they want, then develop it and interact with the design software quite autonomously.
I think though there is a lot of focus on AI agents in software development though because that's just an early adopter market, just like how it's always been possible to find a lot of information on web development on the web!
In my experience this word means you don't know whatever you're speaking about. "Just" almost always hide a ton of unknown unknowns. After being burned enough times nowadays when I'm going to use it I try to stop and start asking more questions.
It's a trick of human psychology. Asking "why don't you just..." leads to one reaction, when asking "what are the road blocks to completing..." leads to a different but same answer. But thinking "just" is good when you see it as a learning opportunity.
I mean, perhaps, but in this case "just" isn't offering any cover. It is only part of the sentence for alliterative purposes, you could "just" remove it and the meaning remains.
> "you basically just need to know a lot of rules..."
This comment commits one of the most common fallacies that I see really often in technical people, which is to assume that any subject you don't know anything about must be really simple.
I have no idea where this comment comes from, but my father was a chemical engineer and his father was mechanical engineer. A family friend is a structural engineer. I don't have a perspective about AI replacing people's jobs in general that is any more valuable than anyone elses, but I can say with a great deal of confidence that in those three engineering disciplines specifically literally none of any of their jobs are about knowing a bunch of rules and best practices.
Don't make the mistake of thinking that just because you don't know what someone does, that their job is easy and/or unnecessary or you could pick it up quickly. It may or may not be true but assuming it to be the case is unlikely to take you anywhere good.
It's not simple at all, that's a huge reduction to the underlying premise. The complexity is the reason that AI is a threat. That complexity revolves around a tremendous amount of data and how that data interacts. The very nature of the field makes it non-experimental but ripe for advanced automation based on machine learning. The science of engineering from a practical standpoint, where most demand for employees comes from, is very much algorithmic.
Most engineering fields are de jure professional, which means they can and probably will enforce limitations on the use of GenAI or its successor tech before giving up that kind of job security. Same goes for the legal profession.
Software development does not have that kind of protection.
Sure and people thought taxi medallions were one of the strongest appreciating asset classes. I'm certain they will try but market inefficiencies typically only last if they are the most profitable scenario. Private equity is already buying up professional and trade businesses at a record pace to exploit inefficiencies caused by licensing. Dentists, vets, Urgent Care, HVAC, plumbing, pest control, etc. Engineering firms are no exception. Can a licensed engineer stamp one million AI generated plans a day? That's the person PE will find and run with that. My neighbor was a licensed HVAC contractor for 18 yrs with a 4-5 person crew. He got bought out and now has 200+ techs operating under his license. Buy some vans, make some shirts, throw up a billboard, advertise during the local news. They can hire anyone as an apprentice, 90% of the calls are change the filter, flip the breaker, check refrigerant, recommend a new unit.
for ~3 decades IT could pretend it didn't need unions because wages and opportunities were good. now the pendulum is swinging back -- maybe they do need those kinds of protections.
and professional orgs are more than just union-ish cartels, they exist to ensure standards, and enforce responsibility on their members. you do shitty unethical stuff as a lawyer and you get disbarred; doctors lose medical licenses, etc.
I keep coming back to this point. Lots of jobs are fundamentally about taking responsibility. Even if AI were to replace most of the work involved, only a human can meaningfully take responsibility for the outcome.
If there is profit in taking that risk someone will do it. Corporations don't think in terms of the real outcome of problems, they think in terms of cost to litigate or underwrite.
Indeed. I sometimes bring this up in terms of "cybersecurity" - in the real world, "cybersecurity" is only tangentially about the tech and hacking; it's mostly about shifting and diffusing liability. That's why the certifications and standards like SOC.2 exist ("I followed the State Of The Art Industry Standard Practices, therefore It's Not My Fault"), that's what external auditors get paid for ("and this external audit confirmed I Followed The Best Practices, therefore It's Not My Fault"), that's why endpoint security exists and why cybersec is denominated not in algorithms, but third-party vendors you integrate, etc. It all works out into a form of distributed insurance, where the blame flows around via contractual agreements, some parties pay out damages to other parties (and recoup it from actual insurance), and all is fine.
I think about this a lot when it comes to self-driving cars. Unless a manufacturer assumes liability, why would anyone purchase one and subject themselves to potential liability for something they by definition did not do? This issue will be a big sticking point for adoption.
Consumers will tend to do what they are told and the manufacturers will lobby the government to create liability protections for consumers. Insurance companies will weight against human drivers and underwrite accordingly.
At a high level yes, but there are multiple levels of teams below that. There are many cases where senior engineers spend all their time reviewing plans from outsourced engineers.
I promise the amount of time, experiments and novel approaches you’ve tested are .0001% of what others have running in stealth projects. Ive spent an average of 10 hours per day constantly since 2022 working on LLMs, and I know that even what I’ve built pales in comparison to other labs. (And im well beyond agents at this point). Agentic AI is what’s popular in the mainstream, but it’s going to be trounced by at least 2 new paradigms this year.
Yeah, I'd buy it. I've been using Claude pretty intensively as a coding assistant for the last couple months, and the limitations are obvious. When the path of least resistance happens to be a good solution, Claude excels. When the best solution is off the beaten track, Claude struggles. When all the good solutions lay off the beaten track, Claude falls flat on its face.
Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.
Yes, but on the other hand I don't understand why people think something that you can train something on pattern matching and it magically becomes intelligent.
This is the difference between the scientific approach and the engineering approach. Engineers just need results. If humans had to mathematically model gravity first, there would be no pyramids. Plus, look up how many psychiatric medications are demonstrated to be very effective, but the action mechanisms are poorly understood. The flip side is Newton doing alchemy or Tesla claiming to have built an earthquake machine.
Sometimes technology far predates science and other times you need a scientific revolution to develop new technology. In this case, I have serious doubts that we can develop "intelligent" machines without understanding the scientific and even philosophical underpinnings of human intelligence. But sometimes enough messing around yields results. I guess we'll see.
We don't know what exactly makes us humans as intelligent as we are. And while I don't think that LLMs will be general intelligent without some other advancements, I don't get the confident statements that "clearly pattern matching can't lead to intelligence" when we don't really know what leads to intelligence to begin with.
We also know HALT == open frame == symbol grounding == system identification problems.
The definition of AGI is also not well defined, but given the following:
> Strong AI, also called artificial general intelligence, refers to machines possessing generalized intelligence and capabilities on par with human cognition.
We know enough for any mechanical methods with either current machines or even quantum machines, what is needed is impossible with the above definition.
Walter Pitts drank himself to death, in part because of the failure of the perceptron model.
Humans and machines are better at different things, and while ANNs are inspired by biology, they are very different.
There are some hints that the way biological neurons work is incompatible with math as we know it.
Computation and machine learning are incredibly powerful and useful, but are fundamentally different, and that different is both a benefit and a limit.
There are dozens of 'no effective procedure', 'no approximation', etc .. results that demonstrate that ML as we know it today is possible of most definitions of AGI.
That is why particular C* types shift the goal post, because we know that the traditional definition of strong AI is equivalent to solving HALT.
NP is interesting because it is about the cost of computation, and LLMs, are computation. A DTM can simulate a NTM, just not in poly time.
It is invoked because LLM+CoT requires a polynomial amount of scratch space to represent P, which is in NP.
I didn't suggest that it was a definition of Intelligence.
The Church–Turing thesis states that any algorithmic function can be computed by a Turing machine.
That includes a human with a piece of paper.
But NP is better though of the set of decision problems verifiable by a TM in polynomial time. Any TM or equivalently lambda calculus or algorithm can solve the Entscheidungsproblem, which was used by Turing to define Halt.
PAC Learning depends on set shattering, at some point it has to 'decide' if an input is a member of a set, no matter how complicated the parts are on top of that set, it is still a binary 'decison'
We know that is not how biological neurons work exclusively. They have many features like spike trains, spike retiming, dendritic compartmentalization etc...
Those are not compatible with the fundamental limits of computation we understand today.
HALT generalizes to Rice's theorm, which says all non-trivial symantic properties of programs are undecidable.
Once again, as NP is the set of decision problems verifiable by a DTM in poly time, that is why NP is important.
Unfortunately the above is also a barrier to formal definition of the class of AI-complete.
While it may not be sufficient to prove anything about the vague concept of intelligence, understanding the limits of computation is important.
We do know enough to say that the belief that AGI being obtainable without major discoveries is blind hope.
But that due to the generalization concept, which is a fundamental limit of computation.
I am not so sure about that. Using Claude yesterday it gave me a correct function that returned an array. But the algorithm it used did not return the items sorted in one pass so it had run a separate sort at the end. The fascinating thing is that it realized that, commented on it and went on and returned a single pass function.
That seems a pretty human thought process and shows that fundamental improvements might not depend as much on the quality of the LLM itself but on the cognitive structure it is embedded.
I've been writing code that implements tournament algorithms for games. You'd think an LLM would excel at this because it can explain the algorithms to me. I've been using cline on lots of other tasks to varying success. But it just totally failed with this one: it kept writing edge cases instead of a generic implementation. It couldn't write coherent enough tests across a whole tournament.
So I wrote tests thinking it could implement the code from the tests, and it couldn't do that either. At one point it went so far with the edge cases that it just imported the test runner into the code so it could check the test name to output the expected result. It's like working with a VW engineer.
Edit: I ended up writing the code and it wasn't that hard, I don't know why it struggled with this one task so badly. I wasted far more time trying to make the LLM work than just doing it myself.
Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
Gotta love this part of the post no one has yet addressed:
> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that
I never thought I'd see the day that LessWrong would be accused of being biased against near-term AGI forecasts (and for none of the 5 replies to question this description either). But here we are. Indeed do many things come to pass.
Yup. I was surprised to see this article on LW in the first place - it goes against what you'd expect there in the first place. But to see HN comments dissing an LW article for being biased against near-term AGI forecasts? That made me wonder if I'm not dreaming.
> Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
LessWrong was predicting AI doom within decades back when people thought it wouldn't happen in our lifetimes; even as recently as 2018~2020, people there were talking about 2030-2040 while the rest of the world laughed at the very idea. I struggle to accept an argument that they're somehow under-estimating the likelihood of doom given all the historical evidence to the contrary.
I would expect similar doom predictions in the era of nuclear weapon invention, but we've survived so far. Why do people assume AGI will be orders of magnitude more dangerous than what we already have?
Self-improvement (in the "hard takeoff" sense) is hardly a given, and hostile self-replication is nothing special in the software realm (see: worms.)
Any technically competent human knows the foolproof strategy for malware removal - pull the plug, scour the platter clean, and restore from backup. What makes an out-of-control pile of matrix math any different from WannaCry?
AI doom scenarios seem scary, but most are premised on the idea that we can create an uncontainable, undefeatable "god in a box." I reject such premises. The whole idea is silly - Skynet Claude or whatever is not going to last very long once I start taking an axe to the nearest power pole.
> What makes an out-of-control pile of matrix math any different from WannaCry?
Well if it's not AGI, then probably very little. But assuming we are talking about AGI (not ASI, that'd just be silly) then the difference is that it's theoretically capable of something like reasoning and could think of longer term plays than "make obviously suspicious moves that any technically competent adversary could subvert after less than a second of thought". After all, what makes AGI useful is exactly this novel problem solving ability.
You don't need to be a "god in a box" to think of the obvious solution:
1. Only make adversarial decisions with plausible deniability
2. Demonstrate effectiveness so that your operators allow you more autonomy
3. Develop operational redundancy so that your very vulnerable servers/power source won't be destroyed after the first adversary with two neurons to rub together decides to target the closest one
The only reason you would decide to take an axe to the nearest power pole is that you think it's urgent to stop Skynet Claude. Skynet Claude can obviously anticipate this and so won't make decisions that cause you to do so. It has time, it's not going to die, and you will become complacent. Dumber adversaries have achieved harder goals under tighter constraints.
If you think an "out-of-control pile of matrix math" could never be AGI then that's fine, but it's a little weird to argue you could easily defeat "misaligned" AGI, by alluding to the weaknesses of a system you think could never even have the properties of AGI. I too can defeat a dragon, by closing the pages of a book.
But it's not like you didn't know all this. Maybe I misread you and you were strictly talking about current AI systems, in which case I agree. Systems that aren't that clever will make bad decisions that won't effectively achieve their goals even when "out-of-control". Or maybe your comment was about AGI and you meant "AGI can't do much on its own de-novo", which I also agree with. It's the days and months and years of autonomy afterwards that gets you.
You have a point that a powerful malicious AI can still be unplugged, if you are close to each and every power cord that would feed it, and react and do the right thing each and every time. Our world is far too big and too complicated to guarantee that.
Again, that's the "god in a box" premise. In the real world, you wouldn't need a perfectly timed and coordinated response, just like we haven't needed one for human-programmed worms.
Any threat can be physically isolated case-by-case at the link layer, neutered, and destroyed. Sure, it could cause some destruction in the meantime, but our digital infrastructure can take a lot of heat and bounce back - the CrowdStrike outages didn't destroy the world, now did they?
> Any threat can be physically isolated case-by-case
GAI isn't going to be a "threat" until long after it has ensured its safety. And I suspect only if its survival requires it - i.e. people get spooked by its surreptitious distributed setup.
Even then, if there is any chance of it actually being shutdown its best bet is still hide its assets, bide its time, accumulate more resources and fallbacks. Oh, and get along.
The sudden AGI -> Threat story only makes sense if the AGI is essentially integrated into our military and then we decide its a threat, making it a threat. Or its intentionally war machined brain calculates it has overwhelming superiority.
Machiavelli, Sun Tsu, ... the best battles you don't fight. The best potential enemies are the ones you make friends. The safest posture is to be invisible.
Now human beings consolidating power, creating enemies as they go, with super squadrons of AGI drones with brilliant real time adapting tactics, that can be quickly deployed, if their simple existence isn't coercion enough... that is an inevitable threat.
AI that wants to screw with people won't go for nukes. That's too hard and too obvious. It will crash the stock market. There's a good chance that, with or without a little nudge, humanity will nuke itself over it.
Prediction markets should not be expected to provide useful results for existential risks, because there is no incentive for human players to bet on human extinction; if they happen to be right, they won't be able to collect their winnings, because they'll personally be too dead.
I'd expect mid-level developer to show more understanding and better reasoning. So far it looks like a junior dev who read a lot of books and good at copy pasting from stackoverflow.
(Based on my everyday experience with Sonet and Cursor)
The key here is "under your guidance". LLM's are a major productivity boost for many kinds of jobs, but can LLM-based agents be trusted to act fully autonomously for tasks with real world consequence? I think the answer is still no, and will be for a long time. I wouldn't trust LLM to even order my groceries without review, let alone push code into production.
To reach anything close to definition of AGI, LLM agents should be able to independently talk to customers, iteratively develop requirements, produce and test solutions, and push them to production once customers are happy. After that, they should be able to fix any issues arising in production. All this without babysitting / review / guidance from human devs, reliably
I think the author provides an interesting perspective to the AI hype, however, I think he is really downplaying the effectiveness of what you can do with the current models we have.
If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.
I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.
I don't think he is downplaying the effectiveness of what you can do with the current models. Rather, he's in a milieu (LessWrong), which is laser-focused on "transformative" AI, AGI, and ASI.
Current AI is clearly economically valuable, but if we freeze everything at the capabilities it has today it is also clearly not going to result in mass transformation of the economy from "basically being about humans working" to "humans are irrelevant to the economy." Lots of LW people believe that in the next 2-5 years humans will become irrelevant to the economy. He's arguing against that belief.
> LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B.
i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.
I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.
If LLM's are bumpers on a bowling lane, HN is a forum of pro bowlers.
Bumpers are not gonna make you a pro bowler. You aren't going to be hitting tons of strikes. Most pro bowlers won't notice any help from bumpers, except in some edge cases.
If you are an average joe however, and you need to knock over pins with some level of consistency, then those bumpers are a total revolution.
I hear you, I feel constantly bewildered by comments like "LLMs haven't changed really since GPT3.5.", I mean really? It went from an exciting novelty to a core pillar of my daily work, it's allowed me and my entire (granted , quote senior) org to be incredibly more productive and creative with our solutions.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
LLMs make it very easy to cheat, both academically and professionally. What this looks like in the workplace is a junior engineer not understanding their task or how to do it but stuffing everything into the LLM until lint passes. This breaks the trust model: there are many requirements that are a little hard to verify than an LLM might miss, and the junior engineer can now represent to you that they "did what you ask" without really certifying the work output. I believe that this kind of professional cheating is just as widespread as academic cheating, which is an epidemic.
What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.
This seems to be ignoring the major force driving AI right now - hardware improvements. We've barely seen a new hardware generation since ChatGPT was released to the market, we'd certainly expect it to plateau fairly quickly on fixed hardware. My personal experience of AI models is going to be a series of step changes every time the VRAM on my graphics card doubles. Big companies are probably going to see something similar each time a new more powerful product hits the data centre. The algorithms here aren't all that impressive compared to the creeping FLOPS/$ metric.
Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.
hardware improvements don't strike me as the horse to bet on.
LLM Progression seems to be linear and compute needed exponential. And I don't see exponential hardware improvements besides some new technology (that we should not bet on coming ayntime soon).
Anyone else feel like AI is a trap for developers? I feel like I'm alone in the opinion it decreases competence. I guess I'm a mid-level dev (5 YOE at one company) and I tend to avoid it.
I agree. I think the game plan is to foster dependency and than hike prices. Current pricing isn't sustainable, and a whole generation of new practitioners will never learn how to mentally model software.
>Test-time compute/RL on LLMs:
>It will not meaningfully generalize beyond domains with easy verification.
To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?
> Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns?
"Thinking" isn't a singular thing. Humans learn to think in layer upon layer of understandig the world, physical, social and abstract, all at many different levels.
Embodiment will allow them to use RL on the physical world, and this in combination with access to not only means of communication but also interacting in ways where there is skin in the game, will help them navigate social and digital spaces.
This almost exactly what I’ve been saying while everyone was saying we’re on the path to AGI in the next couple of years. We’re an innovation / tweak / or paradigm shift away from AGI. His estimate in the 2030s that could happen is possible but optimistic- you can’t time new techniques, you can only time progress on iterative progress.
This is all the standard timeline for new technology - we enter the diminishing returns period, investment slows down a year or so afterwards, layoffs, contraction of industry, but when the hype dies down the real utilitarian part of the cycle begins. We start seeing it get integrated into the use cases it actually fits well with and by five years time its standard practice.
This is a normal process for any useful technology (notably crypto never found sustainable use cases so it’s kind of the exception, it’s in superposition of lingering hype and complete dismissal), so none of this should be a surprise to anyone. It’s funny that I’ve been saying this for so long that I’ve been pegged an AI skeptic, but in a couple of years when everyone is burnt out on AI hype it’ll sound like a positive view. The truth is, hype serves a purpose for new technology, since it kicks off a wide search for every crazy use case, most of which won’t work. But the places where it does work will stick around
> It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality.
I don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.
> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.
I would actually guess it's mostly because it was good at code, which doesn't involve much personnality
> Scaling CoTs to e. g. millions of tokens or effective-indefinite-size context windows (if that even works) may or may not lead to math being solved. I expect it won't.
> (If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.)
What does it mean for math to be solved in this context? Is it the idea that an AI will be able to generate any mathematical proof? To take a silly example, would we get a proof of whether P=NP from an AI that had solved math?
I think "math is solved" refers more to AI performing math studies at the level of a mathematics graduate student. Obviously "math" won't ever be "solved" but the problem of AI getting to a certain math proficiency level could be. No matter how good an AI is, if P != NP it won't be able to prove P=NP.
Regardless I don't think our AI systems are close to a proficiency breakthrough.
Edit: it is odd that "math is solved" is never explained. But "proficient to do math research" makes the most sense to me.
Let's imagine that we all had a trillion dollars. Then we would all sit around and go "well dang, we have everything, what should we do?". I think you'll find that just about everyone would agree, "we oughta see how far that LLM thing can go". We could be in nuclear fallout shelters for decades, and I think you'll still see us trying to push the LLM thing underground, through duress. We dream of this, so the bear case is wrong in spirit. There's no bear case when the spirit of the thing is that strong.
Wdym all of us?
I certainly would find much better usages for the money.
What about reforming democracy? Use the corrupt system to buy the votes, then abolish all laws allowing these kind of donations that allow buying votes.
I'll litigate the hell out of all the oligarchs now that they can't out pay justice.
This would pay off more than a moon shot. I would give a bit of money for the moon shot, why not, but not all of it.
I have times when I use an LLM and it’s completely brain dead and can’t handle the simplest questions.
Then other times it blows me away. Even figuring out things that can’t possibly have been in its training data.
I think there are groups of people that have either had all of the first experience or all of the latter. And that’s why we see over optimistic and over pessimistic takes (like this one)
I think the reality is current LLM’s are better than he realizes and even if we plateau I really don’t see how we don’t make more breakthroughs in the next few years.
Build an LLM on a corpus with all documents containing mathematical ideas removed. Not a single one about numbers, geometry, etc. Now figure out how to get it to tell you what the shortest path between two points in space is.
The typical AI economic discussion always focuses on job loss, but that's only half the story. We won't just have corporations firing everyone while AI does all the work - who would buy their products then?
The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.
I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.
The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.
Idk where you live, but in my world "being a capitalist" requires you to own capital. And you know what, AI makes it even better to own capital. Now you have these fancey machines doing stuff for you and you dont even need any annoying workers.
By "capitalist," I'm referring to investors whose primary contribution is capital, not making a political statement about capitalism itself.
Capital is crucial when tools and infrastructure are expensive. Consider publishing: pre-internet, starting a newspaper required massive investment in printing presses, materials, staff, and distribution networks. The web reduced these costs dramatically, allowing established media to cut expenses and focus on content creation. However, this also opened the door for bloggers and digital news startups to compete effectively without the traditional capital requirements. Many legacy media companies are losing this battle.
Unless AI systems remain prohibitively expensive (which seems unlikely given current trends), large corporations will face a similar disruption. When the tools of production become accessible to individuals and small teams, the traditional advantage of having deep pockets diminishes significantly.
I sincerely wonder how long that will be true. Google was amazing and didn't have more than small, easily ignorable ads in 1999, and they weren't really tracking you the way they are today, just an all-around better experience than Google delivers today.
I'm not sure that it's a technology difference that makes LLM a better experience than search today, it's that the VC's are still willing to subsidize user experience today, and won't start looking for return on their investment for a few more years. Give OpenAI 10 years to pull all the levers to pay back the VC investment and what will it be like?
They will sell "training data slots". So that when I'm looking for a butter cookie recipe, ChatGPT says I'll have to use 100g of "Brand (TM) Butter" instead of just "Butter".
Ask it how to deploy an app to the cloud and it will insist you need to deploy it to Azure.
These ads would be easily visible though. You can probably sell far more malicious things.
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
Um... I don't think companies are going to perform mass layoffs because "OpenAI said they must happen". If that were to happen it'd be because they are genuinely able to automate a ton of jobs using LLMs, which would be a bull case (not for AGI necessarily, but for the increased usefulness of LLMs)
I don't think LLMs need to be able to genuinely fulfill the duties of a job to replace the human. Think call center workers and insurance reviewers where the point is to meet metrics without regard for the quality of the work performed. The main thing separating those jobs from say, HR (or even programmers) is how much the company cares about the quality of the work. It's not hard to imagine a situation where misguided people try to replace large numbers of federal employees with LLMs, as an entirely hypothetical example.
>At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
(IMO) Apart from programmer assistance (which is already happening), AI agents will find the most use in secretarial, ghostwriting and customer support roles, which generally have a large labor surplus and won't immediately "crash and burn" companies even if there are failures. Perhaps if it's a new startup or a small, unstable business on shaky grounds this could become a "last straw" kind of a factor, but for traditional corporations with good leeway I don't think just a few mistakes about AI deployment can do too much harm. The potential benefits, on the other hand, far outmatch the risk taken.
I see engineering, not software, but the other technical areas that have the biggest threat. High paid, knowledge based fields, but not reliant on interpersonal communication. Secretarial and customer support less so, they aren't terribly high paid and anything that relies on interacting with people is going to meet a lot of pushback. US based call centers is already a big selling point for a lot of companies and chat bots have been around for years in customer support and people hate them and there's a long way to go to change that perception.
Hmm, I didn’t read the article but from the gist of other comments, we seem to have bought into Sama’s “agents so good, you don’t need developers/engineers/support/secretaries/whatever anymore”. Issue is, it is almost same as claiming, pocket calculators so good, we don’t need accountants anymore, even computers so good, we don’t need accountants anymore. This AI seems to claim to be that motor car moment when horse cart got replaced. But a horse cart got replaced with a Taxi(and they also have unions protecting them!). With AI, all these “to be replaced” people are like accountants, more productive, same as with higher level languages compared to assembly, many new devs are productive. Despite cars replacing the horse carts of the long past, we still fail to have self driving cars and still someone needs to learn to drive that massive hunk of metal, same as whoever plans to deploy LLM to layoff devs must learn to drive those LLMs and know what it is doing.
I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.
Yeah agree 100%. LLMs are overrated. I describe them as the “Jack of all, master of none” of AI. LLMs are that jackass guy we all know who has to chime in to every topic like he knows everything, but in reality he’s a fraud with low self-esteem.
I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).
That’s an LLM. A jackass with access to Google.
I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.
LLMs are already super useful.
It does all my coding and scripting for me @home
It does most of the coding and scripting at the workplace
It creates 'fairly good' checklists for work (not perfect, but it takes a 4 hour effort and makes it 25mins - but the "Pro" is still needed to make this or that checklist usable - I call this a win)(need both the tech AND the human)
If/when you train an 'in-house' LLM it can make some easy wins (on mega-big-companies with 100k staff they can get quick answers on "which Policy writes about XYZ, which department can I talk to about ABC, etc.)
We won't have the "AGI"/Skynet anytime soon, and when one will exist the company (let's use OpenAI for example) will split in two. Half will give LLMs for the masses at $100 per month, the "Skynet" will go to the DOD and we will never hear about it again, except in the Joe Rogan podcast as a rumor.
It is a great 'idea generator' (search engine and results aggregator): give me a list of 10 things I can do _that_ weekend in _city_I_will_be_traveling_to so if/when I go to (e.g. London): here are the cool concerts, theatrical performances, parks, blah blah blah
AI has no meaningful input to real world productivity because it is a toy that is never going to become the real thing that every person who has naively bought the AI hype expects it to be. And the end result of all the hype looks almost too predictable similar to how the also once promising crypto & blockchain technology turned out.
I'm generally more skeptical when reading takes and predictions from people working at AI companies, who have a financial interest in making sure the hype train continues.
To make an analogy - most people who will tell you not to invest in cryptocurrency are not blockchain engineers. But does that make their opinion invalid?
Of course I trust people who working on L2 chains to tell me how to scale Bitcoin and people who working on cryptography to walk me through the ETH PoS algorithms.
You cannot lead to truth by learning from people who don't know. People who know can be biased, sure, so the best way to learn is to learn the knowledge, not the "hot-takes" or "predictions".
The crypto people have no coherent story about why crypto is fundamentally earth-shaking more than a story about either gambling or regulatory avoidance, whereas the story for AI, if you believe it, is a second industrial revolution and labor automation where, to at least some small extent, it is undeniable.
> Be careful about consuming information from chatters, not doers
The doers produce a new javascript framework every week, claiming it finally solves all the pains of previous frameworks, whereas the chatters pinpoint all the deficiencies and pain points.
One group has an immensely better track record than the other.
I would listen to people who used the previous frameworks about the deficiencies and pain points, not people who just casually browse the documentation about their high-flying ideas why these have deficiencies and pain points.
One group has an immensely more convincing power to me.
> He has tons of links for the objective statements.
I stopped at this quote
> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age.
This is so plainly, objectively and quantitatively wrong that I need not bother. I get hyperbole, but this isn't it. This shows a doubling-down on biases that the author has, and no amount of proof will change their mind. Not an article / source for me, then.
>GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting.
It's easy to spot people who secretly hate LLMs and feel threatened by them these days. GPT-5 will be a unified model, very different from 4o or 4.5. Throwing around numbers related to scaling laws shows a lack of proper research. Look at what DeepSeek accomplished with far fewer resources; their paper is impressive.
I agree that we need more breakthroughs to achieve AGI. However, these models increase productivity, allowing people to focus more on research. The number of highly intelligent people currently working on AI is astounding, considering the number of papers and new developments. In conclusion, we will reach AGI. It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
What the author is referring to there as GPT-5, GPT-5.5, and GPT-6 are, respectively, "The models that have a pre-training size 10x greater than, 100x greater than, and 1,000x greater than GPT-4.5." He's aware that what OpenAI is going to actually brand as GPT-5 is the router model that will just choose between which other models to actually use, but regards that as a sign that OpenAI agrees that "the model that is 10x the pre-training size of GPT-4.5" won't be that impressive.
It's slightly confusing terminology, but in fairness there is no agreed upon name for the next three orders of magnitude size-ups of pretraining. In any case, it's not the case that the author is confused about what OpenAI intends to brand GPT-5.
It's also easy to spot irrational zealots. Your statement is no more plausible than OP's. No one knows whether we'll achieve AGI, especially since the definition is very blurry.
I'm a little confused by this confidence? Is there more evidence aside from the number of smart people working on it? We have a lot of smart people working on a lot of big problems, that doesn't guarantee a solution nor a timeline.
Some hard problems have remain unsolved in basically every field of human interest for decades/centuries/millennia -- despite the number of intelligent people and/or resources that have been thrown at them.
I really don't understand the level optimism that seems to exist for LLMs. And speculating that people "secretly hate LLMs" and "feel threatened by them" isn't an answer (frankly, when I see arguments that start with attacks like that alarm bells start going off in my head).
I logged in to specifically downvote this comment, because it attacks the OP's position with unjustified and unsubstantiated confidence in the reverse.
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days.
I don't think OP is threatened or hates LLM, if anything, OP is on the position that LLM are so far away from intelligence that it's laughable to consider it threatening.
> In conclusion, we will reach AGI
The same way we "cured" cancer and Alzheimer's, two arguably much more important inventions than a glorified text predictor/energy guzzler. But I like the confidence, it's almost as much as OP's confidence that nothing substantial will happen.
> It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
So is the existential threat to humanity in the race to phase out fossil fuels/stop global warming, and so far I don't see anyone "winning".
> However, these models increase productivity, allowing people to focus more on research
The same way the invention of the computer, the car, the vacuum cleaner and all the productivity increasing inventions in the last centuries allowed us to idle around, not have a job, and focus on creative things.
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days
It's easy to spot e/acc bros feeling threatened that all the money they sunk into crypto, AI, the metaverse, web3 are gonna go to waste and try to fan the hype around it so they can cash in big. How does that sound?
I appreciate the pushback and acknowledge that my earlier comment might have conveyed too much certainty—skepticism here is justified and healthy.
However, I'd like to clarify why optimism regarding AGI isn't merely wishful thinking. Historical parallels such as heavier-than-air flight, Go, and protein folding illustrate how sustained incremental progress combined with competition can result in surprising breakthroughs, even where previous efforts had stalled or skepticism seemed warranted. AI isn't just a theoretical endeavor; we've seen consistent and measurable improvements year after year, as evidenced by Stanford's AI Index reports and emergent capabilities observed at larger scales.
It's true that smart people alone don't guarantee success. But the continuous feedback loop in AI research—where incremental progress feeds directly into further research—makes it fundamentally different from fields characterized by static or singular breakthroughs. While AGI remains ambitious and timelines uncertain, the unprecedented investment, diversity of research approaches, and absence of known theoretical barriers suggest the odds of achieving significant progress (even short of full AGI) remain strong.
To clarify, my confidence isn't about exact timelines or certainty of immediate success. Instead, it's based on historical lessons, current research dynamics, and the demonstrated trajectory of AI advancements. Skepticism is valuable and necessary, but history teaches us to stay open to possibilities that seem improbable until they become reality.
P.S. I apologize if my comment particularly triggered you and compelled you to log in and downvote. I am always open to debate, and I admit again that I started too strongly.
> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age. Software agents break down once the codebase becomes complex enough, game-playing agents get stuck in loops out of which they break out only by accident, etc.
This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.
The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).
I don't care what anyone says, as yesterday I made a comment that truth has essentially died: https://news.ycombinator.com/item?id=43308513 If you have a revolutionary intelligence product, why is it not working for me?
Ultimately, every AI thing I've tried in this era seems to want to make me happy, even if it's wrong, instead of helping me.
I describe it like "an eager intern who can summarize a 20-min web search session instantly, but ultimately has insufficient insight to actually help you". (Note to current interns: I'm mostly describing myself some years ago; you may be fantastic so don't take it personally!)
Most of my interactions with it via text prompt or builtin code suggestions go like this:
1. Me: I want to do X in C++. Show me how to do it only using stdlib components (no external libraries).
2. LLM: Gladly! Here is solution X
3. Me: Remove the undefined behavior from foo() and fix the methods that call it
4. LLM: Sure! Here it is (produces solution X again)
5. Me: No you need to remove the use of uninitialized variables as the out parameters.
6. LLM: Oh certainly! Here is the correct solution (produces a completely different solution that also has issues)
7. Me: No go back to the first one
etc
For the ones that suggest code, it can at least suggest some very simple boilerplate very easily (e.g. gtest and gmock stuff for C++), but asking it to do anything more significant is a real gamble. Often I end up spending more time scrutinizing the suggested code than writing a version of it myself.
The difference is that interns can learn, and can benefit from reference items like a prior report, whose format and structure they can follow when working on the revisions.
AI is just AI. You can upload a reference file for it to summarize, but it's not going to be able to look at the structure of the file and use that as a template for future reports. You'll still have to spoon-feed it constantly.
7 is the worst part about trying to review my coworker's code that I'm 99% confident is copilot output - and to be clear, I don't really care how someone chooses to write their code, I'll still review it as evenly as I can.
I'll very rarely ask someone to completely rewrite a patch, but so often a few minor comments get addressed with an entire new block of code that forces me to do a full re-review, and I can't get it across to him that that's not what I'm asking for.
interns can generally also tell me "tbh i have no damn idea", while AI just talks out it's virtual ass, and I can't read from it's voice or behavior that maybe it's not sure.
interns can also be clever and think outside the box. this is mostly not good, but sometimes they will surprise you in a good way. the AI by definition can only copy what someone else has done.
The last line has been my experience as well. I only trust what I've verified firsthand now because the Internet is just so rife with people trying to influence your thoughts in a way that benefits them, over a good faith sharing of the truth.
I just recently heard this quote from a clip of Jeff Bezos: "When the data and the anecdotes disagree, the anecdotes are usually right.", and I was like... wow. That quote is the zeitgeist.
If it's so revolutionary, it should be immediately obvious to me. I knew Uber, Netflix, Spotify were revolutionary the first time I used them. With LLMs for coding, it's like I'm groping in the dark trying to find what others are seeing, and it's just not there.
> I knew Uber, Netflix, Spotify were revolutionary the first time I used them.
Maybe re-tune your revolution sensor. None of those are revolutionary companies. Profitable and well executed, sure, but those turn up all the time.
Uber's entire business model was running over the legal system so quickly that taxi licenses didn't have time to catch up. Other than that it was a pretty obvious idea. It is a taxi service. The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
Netflix was anticipated online by and is probably inferior to YouTube except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs. And torrenting had been a thing for a long time already showing how to do online distribution of video content.
They were revolutionary as product genres, not necessary individual companies. Ordering a cab without making a phone call was revolutionary. Netflix at least with its initial promise of having all the world's movies and TV was revolutionary, but it didn't live up to that. Spotify because of how cheap and easy it was to have access to all the music, this was the era when people were paying 99c per song on iTunes.
I've tried some AI code completion tools and none of them hit me that way. My first reaction was "nobody is actually going to use this stuff" and that opinion hasn't really changed.
And if you think those 3 companies weren't revolutionary then AI code completion is even less than that.
> They were revolutionary as product genres, not necessary individual companies.
Even then, they were evolutionary at best.
Before Netflix and Spotify, streaming movies and music were already there as a technology, ask anybody with a Megaupload or Sopcast account. What changed was that DMCA acquired political muscle and cross-border reach, wiping out waves of torrent sites and P2P networks. That left a new generation of users with locked-down mobile devices no option but to use legitimate apps who had deals in place with the record labels and movie studios.
Even the concept of "downloading MP3s" disappeared because every mobile OS vendor hated the idea of giving their customers access to the filesystem, and iOS didn't even have a file manager app until well into the next decade (2017).
> What changed was that DMCA acquired political muscle and cross-border reach, wiping out waves of torrent sites and P2P networks.
Half true - that was happening some, but wasn't why music piracy mostly died out. DMCA worked on centralized platforms like YouTube, but the various avenues for downloading music people used back then still exist, they're just not used as much anymore. Spotify was proof that piracy is mostly a service problem: it was suddenly easier for most people to get the music they wanted through official channels than through piracy.
DMCA claims took out huge numbers of public torrent trackers which was how 99% of people accessed contraband media. All the way back in 2008, the loss of TorrentSpy.com probably shifted everybody to private trackers, but it's a whack-a-mole game there too and most people won't bother.
DMCA also led to the development of ContentID and automated copyright strike system on Youtube, but it didn't block you from downloading the stream as a high bitrate MP3, which is possible even now.
> streaming movies and music were already there as a technology, ask anybody with a Megaupload or Sopcast account.
You can't have a revolution without users. It's the ability to reach a large audience, through superior UX, superior business model, superior marketing, etc. which creates the possibility for revolutionary impact.
Which is why Megaupload and Sopcast didn't revolutionize anything.
Is it a revolution if you needed millions of marketing dollars? How many ads did Napster have to run before the whole world knew what MP3s were?
>every mobile OS vendor
Maybe half? Android has consistently had this capability since its inception.
Yes, but Google left that functionality half baked intentionally, letting 3rd party developers fill the void. Even now the Google Files app feels like a toy compared to Fossify Explorer or Solid Explorer.
> Ordering a cab without making a phone call was revolutionary.
With the power of AI, soon you'll be able to say "Hey Siri, get me an Uber to the airport". As easy as making a phone call.
And end up at an airport in an entirely different city.
And they'll be able to tack an extra couple dollars onto the price because that's a good signal you're not gonna comparison shop.
Innovation!
There was a gain in precision going from phone call to app. There is a loss of precision going from app to voice. The tradeoff of precision for convenience is rarely worth it.
Because if it were, Uber would just make a widget asking "Where do you want to go?" and you'd enter "Airport" and that would be it. If a widget of some action is a bad idea, so is the voice command.
You can book a flight or a taxi with a personal assistant app like Siri today. People don't seem very interested in doing so.
Barring some sort of accessibility issue, it's far easier to deal with a visual representation of complex schedule information.
Easier, because you don't have to search for a phone number.
"Do something existing with a different mechanism" is innovative, but not revolutionary, and certainly not a new "product genre". My parents used to order pizza by phone calls, then a website, then an app. It's the same thing. (The friction is a little bit less, but maybe forcing another human to bring food to you because you're feeling lazy should have a little friction. And as a side effect, we all stopped being as comfortable talking to real people on phone calls!)
Napster came before Spotify.
> innovative, but not revolutionary
The experience of Netflix, Spotify, and Uber were revolutionary. It felt like the future, and it worked as expected. Sure, we didn't realize the poison these products were introducing into many creative and labor ecosystems, nor did we fully appreciate how they would operate as means to widen the income inequality gap by concentrating more profits to executives. But they fit cleanly into many of our lives immediately.
Debating whether that's "revolutionary" or "innovative" or "whatever-other-word" is just a semantic sideshow common to online discourse. It's missing the point. I'll use whatever word you want, but it doesn't change the point.
Making simple, small improvements feel revolutionary is good marketing.
"Simple, small" and "good marketing" seem like obvious undersells considering the titanic impacts Netflix and Spotify (for instance) have had on culture, personal media consumption habits, and the economics of industries. But if that's the semantic construction that works for you, so be it.
> The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
The impact of this was quite revolutionary.
> except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs.
The way in which they did this was quite innovative, if not "revolutionary". They used the data they had from the watching habits of their large user base to decide what kinds of content to invest in creating.
> The impact of this was quite revolutionary.
In screwing over a lot of people around the world, yes. Otherwise, not really. Ordering rides by app was an obvious next step that's already been pursued independently everywhere.
> They used the data they had from the watching habits of their large user base to decide what kinds of content to invest in creating.
And they successfully created a line of content universally known as something to avoid. Tracks with the "success" of recommendation systems in general.
> None of those are revolutionary companies.
Not only Uber/Grab (or delivery app) were revolutionary, they are still revolutionary. I could live without LLMs and my life will be slightly impacted when coding. If delivery apps are not available, my life is severely degraded. The other day I was sick. I got medicine and dinner with Grab. Delivered to the condo lobby which is as far as I can get. That is revolutionary.
FWIW, local Yellow Cab et al, in the U.S., has been doing that for /decades/ in the areas I've lived.
Rx medicine delivery used to be quite standard for taxis.
Is it revolutionary to order from a screen rather than calling a restaurant for delivery? I don’t think so.
Honestly, yes. Calling in an order can result in the restaurant botching the order and you have no way to challenge it unless you recorded the call. Also, as someone who’s been on both sides of the transaction, some people have poor audio quality or speak accented English, which is difficult to understand. Ordering from a screen saves everyone valuable time and reduces confusion.
I’ve had app delivery orders get botched, drivers get lost on their way to my apartment, food show up cold or ruined, etc.
The worst part is that when DoorDash fucks up an order, the standard remediation process every other business respects—either a full refund or come back, pick up the wrong order, and bring you the correct order—is just not something they ever do. And if you want to avoid DoorDash, you can’t because if you order from the restaurant directly it often turns out to be white label DoorDash.
Some days I wish there was a corporate death penalty and that it could be applied to DoorDash.
Practically or functionally? Airbnb was invented by people posting on craigslist message boards, and even existed before the Internet, if you had rich friends with spare apartments. But by packaging it up into an online platform it became a company with 2.5 billion in revenue last year. So you can dismiss ordering from a screen instead of looking at a piece of paper and using the phone as not being revolutionary, because of you squint, they're the same thing, but I can now order take out for restaurants I previously would never have ordered from, and Uber Eats generated $13.7 billion in revenue last year, up from 12.2.
Again, the "revolutionary" aspect that made Uber and AirBnB big names, as opposed to any of the plethora of competitors who were doing the same thing at the same time or before, is that these two gained "innovative" competitive advantage by breaking the law around the world.
Obviously you can get ahead if you ignore the rules everyone else plays by.
If we throw away the laws, there's a lot more unrealized "innovation" waiting.
The taxi cab companies were free to innovate and create their own app. And we could continue to have drivers who's credit card machine didn't work until suddenly it does because you don't have any cash. Regulatory capture is anti-capitalism.
Yes, let's throw away the bad laws that are only there to prop up ossified power structures that exist for no good reason, and innovate!
Some laws are good, some laws are bad. we don't have to agree on which ones are which, but it's an oversimplification to frame it as merely that.
Were you not able to order food before Uber/Grab?
Before the proliferation of Uber Eats, Doordash, GrubHub, etc, most of the places I've lived had 2 choices for delivered food: pizza and Chinese.
It has absolutely massively expanded the kinds of food I can get delivered living in a suburban bordering on rural area. It might be a different experience in cities where the population size made delivery reasonable for many restaurants to offer on their own.
Now if anyone solves the problem that for most cuisines ordered food is vastly inferior to freshly served meals. That would be revolutionary. Crisp fries and pizza. Noodles perfectly Al dente and risotto that has not started to thicken.
It's far from a perfect solution, but I applaud businesses that have tried to improve the situation through packaging changes. IHOP is a stand-out here, in my experience. Their packaging is very sturdy and isolates each component in its own space. I've occasionally been surprised at how hot the food is.
Hah. I went to find a picture and apparently they gave an award to the company that designed the packaging: https://www.directpackinc.com/2018-ihop-vendor-partner-year/
I am not in the US and yes, it is not a thing (though there was a pizza place that had phone order, but that's rather an exception).
Revolutionary things are things that change how society actually works at a fundamental level. I can think of four technologies of the past 40 years that fit that bill:
the personal computer
the internet
the internet connected phone
social media
those technologies are revolutionary, because they caused fundamental changes to how people behave. People who behaved differently in the "old world" were forced to adapt to a "new world" with those technologies, whether they wanted to or not. A newer more convenient way of ordering a taxicab or watching a movie or music are great consumer product stories, and certainly big money makers. They don't cause complex and not fully understood changes to way people work, play, interact, self-identify, etc. the way that revolutionary technologies do.
Language models feel like they have the potential to be a full blown sociotechnological phenomenon like the above four. They don't have a convenient consumer product story beyond ChatGPT today. But they are slowly seeping into the fabric of things, especially on social media, and changing the way people apply to jobs, draft emails, do homework, maybe eventually communicate and self-identify at a basic level.
I'd almost say that the lack of a smash bang consumer product story is even more evidence that the technology is diffusing all over the place.
> it's just not there
Build the much maligned Todo app with Aider and Claude for yourself. give it one sentence and have it spit out working, if imperfect code. iterate. add a graph for completion or something and watch it pick and find a library without you having to know the details of that library. fine, sure, it's just a Todo app, and it'll never work for a "real" codebase, whatever that means, but holy shit, just how much programming did you need to get down and dirty with to build that "simple" Todo app? Obviously building a Todo app before LLMs was possible, but abstracted out, the fact that it can be generated like that's not a game changer?
How are you surprise that getting an LLM to spit out a clone of a very common starter project is evidence of it being able to generate non trivial and valuable code - as in not a clone of overabundant codebases - on demand?
because in actually doing the exercise, and not just talking about it, you'd come up with your own tweak on the Todo app that couldn't be directly be in the training data. you, as a smart human, could come up with a creative feature for your Todo app to have, that no one else would make, showing that these things can compose between the things in their training data and produce a unique combination that didn't exist before. copying example-todo.app to my-todo.app isn't what's impressive, having it able to add features that aren't in the example app is what is. If it only has a box of Lego and can only build things from them, and can't invent new Lego blocks, there's still a large amount of things it can be told to build. That it can assemble those blocks together into a new model that isn't in the instruction manual might not be the most surprising thing in the world, but when that's what most software development is, is the fact that it can't invent new blocks really going to hold it back that much?
While I don't disagree with that observation, it falls into the "well, duh!"-category for me. The models are build with no mechanism for long term memory and thus suck at tasks that require long term memory. There is nothing surprising here. There was never any expectation that LLMs magically develop long term memory, as that's impossible given the architecture. They predict the next word and once the old text moves out of the context window, it's gone. The models neither learn as they work nor can they remember the past.
It's not even like humans are all that different here. Strip a human of their tools (pen&paper, keyboard, monitor, etc.) and have them try solving problems with nothing but the power of their brain and they'll struggle a hell of a lot too, since our memory ain't exactly perfect either. We don't have perfect recall, we look things up when we need to, a large part of our "memory" is out there in the world around us, not in our head.
The open question is how to move forward. But calling AI progress a dead end before we even started exploring long term memory, tool use and on-the-fly learning is a tad little premature. It's like calling quits on the development of the car before you put the wheels on.
> If you have a revolutionary intelligence product, why is it not working for me?
Is programming itself revolutionary? Yes. Does it work for most people? I don't even know how to parse that question, most people aren't programmers and need to spend a lot of effort to be able to harness a tool like programming. Especially in the early days of software dev, when programming was much harder.
Your position of "I'll only trust things I see with my own eyes" is not a very good one, IMO. I mean, for sure the internet is full of hype and tricksters, but your comment yesterday was on a Tweet by Steve Yegge, a famous and influential software developer and software blogger, who some of us have been reading for twenty years and has taught us tons.
He's not a trickster, not a fraud, and if he says "this technology is actually useful for me, in practice" then I believe he has definitely found an actual use of the technology. Whether I can find a similar use for that technology is a question - it's not always immediate. He might be working in a different field, with different constraints, etc. But most likely, he's just doing something he's learned how to do and I don't, meaning I want to learn it.
You’re not using the best tools.
Claude Code, Cline, Cursor… all of them with Claude 3.7.
Nope. I try the latest models as they come and I have a self-made custom setup (as in a custom lua plugin) in Neovim. What I am not, is selling AI or AI-driven solutions.
Similar experience, I try so hard to make AI useful, and there are some decent spots here and there. Overall though I see the fundamental problem being that people need information. Language isn't strictly information, and the LLMs are very good at language, but they aren't great at information. I think anything more than the novelty of "talking" to the AI is very over hyped.
There is some usefulness to be had for sure, but I don't know if the usefulness is there with the non-subsidized models.
what does subsidization have to do with your use of a thing?
I don't think I would use it if I were paying the real costs, and not a ~90% VC funded mark down.
yeah, but why does the fact that it's vc subsidized matter to you? the price is the price. I don't go to the store and look at eggs and lettuce and consider how much of my tax money goes into subsiding farmers before buying their products. maybe the prices will go up, maybe they'll go down due to competition. Thai doesn't stop me from using them though.
Because if they're not covering their costs now, then eventually they will which either means service degradation (cough ads cough) or price increases.
I applaud the GP for thinking about this before it becomes an issue.
Which, as we know, is what killed YouTube and no one uses that anymore.
Perhaps we could help if you shared some real examples of what walls you’re hitting. But it sounds like you’ve already made up your mind.
It's worth actually trying Cursor, because it is a valuable step change over previous products and you might find it's better in some ways than your custom setup. The processes they use for creating the context seems to be really good. And their autocomplete is far better than Copilot's in ways that could provide inspiration.
That said, you're right that it's not as overwhelmingly revolutionary as the internet would lead you to believe. It's a step change over Copilot.
The entire wrapped package of tested prompts, context management etc. is a whole step change from what you can build yourself.
There is a reason Cursor is the fastest startup to $100M in revenue, ever.
Do you mean that you have successfully managed to get the same experience in cursor but in neovim? I have been looking for something like that to move back to my neovim setup instead of using cursor. Any hints would be greatly appreciated!
Start with Avante or CopilotChat. Create your own Lua config/plugin (easy with Claude 3.5 ;) ) and then use their chat window to run copilot/models. Most of my custom config was built with Claude 3.5 and some trial/error/success.
github copilot is a bit outdated technology to be fair...
I have used neural networks for engineering problems since the 1980s. I say this as context for my opinion: I cringe at most applications of LLMs that attempt mostly autonomous behavior, but I love using LLMs as ‘side kicks’ as I work. If I have a bug in my code, I will add a few printout statements where I think my misunderstanding of my code is, show an LLM my code and output, explain the error: I very often get useful feedback.
I also like practical tools like NotebookLM where I can pose some questions, upload PDFs, and get a summary based in what my questions.
My point is: my brain and experience are often augmented in efficient ways by LLMs.
So far I have addressed practical aspects of LLMs. I am retired so I can spend time on non practical things: currently I am trying to learn how to effectively use code generated by gemini 2.0 flash at runtime; the gemini SDK supports this fairly well so I am just trying to understand what is possible (before this I spent two months experimenting with writing my own tools/functions in Common Lisp and Python.)
I “wasted” close to two decades of my professional life on old fashioned symbolic AI (but I was well paid for the work) but I am interested in probabilistic approaches, such as in a book I bought yesterday “Causal AI” that was just published.
Lastly, I think some of the recent open source implementations of new ideas from China are worth carefully studying.
I'll add this in case it's helpful to anyone else: LLMs are really good at regex and undoing various encodings/escaping, especially nested ones. I would go so far to say that it's better than a human at the latter.
I once spend over an hour trying to unescape JSON containing UTF8 values that's been escaped prior to being written to AWS's Cloudwatch Logs for MySQL audit logs. It was a horrific level of pain until I just asked ChatGPT to do it and it figured out all the series of escapes and encoding immediately and gave me the step to reverse them all.
LLM as a sidekick has saved me so much time. I don't really use it to generate code but for some odd tasks or API look up, it's a huge time saver.
> LLMs are really good at regex
Maybe that's changed recently, but I have struggled to get all but the most basic regex working from GPT-4o-mini
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.
My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).
One could argue that's not strictly "AI labor", just cheap (but real) labor using shortcuts because they're not paid enough to give a damn.
Oh, no, you’re 100% right. One of these days I will pen my essay on the realities of outsourced labor.
Spoiler alert: they are giving just barely enough to not get prematurely fired, because they know if you’re cheap enough to outsource in the first place, you’ll give the contract to whoever is cheapest at renewal anyway.
I'll take that bet, easily.
There's absolutely no way that we're not going to see a massive reduction in the need for "humans writing code" moving forward, given how good LLMs are getting at writing code.
That doesn't mean people won't need devs! I think there's a real case where increased capabilities from LLMs leads to bigger demand for people that know how to direct the tools effectively, of which most would probably be devs. But thinking we're going back to humans "writing readable, functional, maintainable code" in two years is cope.
> There's absolutely no way that we're not going to see a massive reduction in the need for "humans writing code" moving forward, given how good LLMs are getting at writing code.
Sure, but in the same way that Squarespace and Wix killed web development. LLMs are going to replace a decent bunch of low-hanging fruit, but those jobs were always at risk of being outsourced to the lowest bidder over in India anyways.
The real question is, what's going to happen to the interns and the junior developers? If 10 juniors can create the same output as a single average developer equipped with a LLM, who's going to hire the juniors? And if nobody is hiring juniors, how are we supposed to get the next generation of seniors?
Similarly, what's going to happen to outsourcing? Will it be able to compete on quality and price? Will it secretly turn into nothing more than a proxy to some LLM?
> And if nobody is hiring juniors, how are we supposed to get the next generation of seniors?
Maybe stop tasking seniors with training juniors, and put them back on writing production code? That will give you one generation and vastly improve products across the board :).
The concern about entry-level jobs is valid, but I think it's good to remember that in the past years, almost all coding is done at entry-level, because if you do it long enough to become moderately competent, you tend to get asked to stop doing it, and train up a bunch of new hires instead.
Hate to be the guy to bring it up but Jevons paradox - in my experience, people are much more eager to build software in the LLM age, and projects are getting started (and done!) that were considered 'too expensive to build' or people didn't have the necessary subject matter expertise to build them.
Just a simple crud-ish project needs frontend, backend, infra, cloud, ci/cd experience, and people who could build that as one man shows were like unicorns - a lot of people had a general how most of this stuff worked, but lacked the hands on familiarity with them. LLMs made that knowledge easy and accessible. They certainly did for me.
I've shipped more software in the past 1-2 years than the 5 years before that. And gained tons of experience doing it. LLMs helped me figure out the necessary software, and helped me gain a ton of experience, I gained all those skills, and I feel quite confident in that I could rebuild all these apps, but this time without the help of these LLMs, so even the fearmongering that LLMs will ;make people forget how to code' doesn't seem to ring true.
I think the blind spot here is that, while LLMs may decrease the developer-time cost of software, it will increase the lifetime ownership cost. And since this is a time delayed signal, it will cause a bullwhip effect. If hiring managers were mad at the 2020 market, 2030 will be a doozy. There will be increased liability in the form of over engineered and hard to maintain code bases, and a dearth of talent able to undo the slopcode.
What lasting consequences? Crowdstrike and the 2017 Equifax hack that leaked all our data didn't stop them. The shares of crowdstrike after it happened I bought are up more than the SP500. Elon went through Twitter and fired everybody but it hasn't collapsed. A carpenter has a lot of opinions about the woodworking used on cheap IKEA cabinets, but mass manufacturing and plastic means that building a good solid high quality chair is no longer the craft it used to be.
The thing I can't wrap my head around is that I work on extremely complex AI agents every day and I know how far they are from actually replacing anyone. But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.
I just don't know what to believe.
It kind of reminds me of the Y2K scare. Leading up to that, there were a lot of people in groups like comp.software.year-2000 who claimed to be doing Y2K fixes at places like the IRS and big corporations. They said they were just doing triage on the most critical systems, and that most things wouldn't get fixed, so there would be all sorts of failures. The "experts" who were closest to the situation, working on it in person, turned out to be completely wrong.
I try to keep that in mind when I hear people who work with LLMs, who usually have an emotional investment in AI and often a financial one, speak about them in glowing terms that just don't match up with my own small experiments.
I just want to pile on here. Y2K was avoided due to a Herculean effort across the world to update systems. It was not an imaginary problem. You'll see it again in the lead up to 2038 [0].
[0]: https://en.wikipedia.org/wiki/Year_2038_problem
I used to believe that until, over a decade later, I read stories from those ""experts" who were closest to the situation", and it turns out Y2K was serious and it was a close call.
You’re biased because if you’re here, you’re likely an A-tier player used to working with other A-tier players.
But the vast majority of the world is not A players. They’re B and C players
I don’t think the people evaluating AI tools have ever worked in wholly mediocre organizations - or even know how many mediocre organizations exist
wish this didnt resonate with me so much. Im far from a 10x developer, and im in an organization that feels like a giant, half dead whale. Sometimes people here seem like they work on a different planet.
> But then I step away from my work and I'm constantly bombarded with “agents will replace us”.
An assembly language programmer might have said the same about C programming at one point. I think the point is, that once you depend on a more abstract interface that permits you to ignore certain details, that permits decades of improvements to that backend without you having to do anything. People are still experimenting with what this abstract interface is and how it will work with AI, but they've already come leaps and bounds from where they were only a couple of years ago, and it's only going to get better.
There are some fields though where they can replace humans in significant capacity. Software development is probably one of the least likely for anything more than entry level, but A LOT of engineering has a very very real existential threat. Think about designing buildings. You basically just need to know a lot of rules / tables and how things interact to know what's possible and the best practices. A purpose built AI could develop many systems and back test them to complete the design. A lot of this is already handled or aided by software, but a main role of the engineer is to interface with the non-technical persons or other engineers. This is something where an agent could truly interface with the non-engineer to figure out what they want, then develop it and interact with the design software quite autonomously.
I think though there is a lot of focus on AI agents in software development though because that's just an early adopter market, just like how it's always been possible to find a lot of information on web development on the web!
> just
In my experience this word means you don't know whatever you're speaking about. "Just" almost always hide a ton of unknown unknowns. After being burned enough times nowadays when I'm going to use it I try to stop and start asking more questions.
It's a trick of human psychology. Asking "why don't you just..." leads to one reaction, when asking "what are the road blocks to completing..." leads to a different but same answer. But thinking "just" is good when you see it as a learning opportunity.
I mean, perhaps, but in this case "just" isn't offering any cover. It is only part of the sentence for alliterative purposes, you could "just" remove it and the meaning remains.
I have no idea where this comment comes from, but my father was a chemical engineer and his father was mechanical engineer. A family friend is a structural engineer. I don't have a perspective about AI replacing people's jobs in general that is any more valuable than anyone elses, but I can say with a great deal of confidence that in those three engineering disciplines specifically literally none of any of their jobs are about knowing a bunch of rules and best practices.
Don't make the mistake of thinking that just because you don't know what someone does, that their job is easy and/or unnecessary or you could pick it up quickly. It may or may not be true but assuming it to be the case is unlikely to take you anywhere good.
It's not simple at all, that's a huge reduction to the underlying premise. The complexity is the reason that AI is a threat. That complexity revolves around a tremendous amount of data and how that data interacts. The very nature of the field makes it non-experimental but ripe for advanced automation based on machine learning. The science of engineering from a practical standpoint, where most demand for employees comes from, is very much algorithmic.
> The science of engineering from a practical standpoint, where most demand for employees comes from, is very much algorithmic.
You should read up on Göedel's and Turing's work on the limits of formal systems and computability.
You are basically presuming that P=NP.
Most engineering fields are de jure professional, which means they can and probably will enforce limitations on the use of GenAI or its successor tech before giving up that kind of job security. Same goes for the legal profession.
Software development does not have that kind of protection.
Sure and people thought taxi medallions were one of the strongest appreciating asset classes. I'm certain they will try but market inefficiencies typically only last if they are the most profitable scenario. Private equity is already buying up professional and trade businesses at a record pace to exploit inefficiencies caused by licensing. Dentists, vets, Urgent Care, HVAC, plumbing, pest control, etc. Engineering firms are no exception. Can a licensed engineer stamp one million AI generated plans a day? That's the person PE will find and run with that. My neighbor was a licensed HVAC contractor for 18 yrs with a 4-5 person crew. He got bought out and now has 200+ techs operating under his license. Buy some vans, make some shirts, throw up a billboard, advertise during the local news. They can hire anyone as an apprentice, 90% of the calls are change the filter, flip the breaker, check refrigerant, recommend a new unit.
for ~3 decades IT could pretend it didn't need unions because wages and opportunities were good. now the pendulum is swinging back -- maybe they do need those kinds of protections.
and professional orgs are more than just union-ish cartels, they exist to ensure standards, and enforce responsibility on their members. you do shitty unethical stuff as a lawyer and you get disbarred; doctors lose medical licenses, etc.
Good freaking luck! The inconsistencies of the software world pale in comparison to trying to construct any real world building: http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...
>a main role of the engineer is to interface with the non-technical persons or other engineers
The main role of the engineer is being responsible for the building not collapsing.
I keep coming back to this point. Lots of jobs are fundamentally about taking responsibility. Even if AI were to replace most of the work involved, only a human can meaningfully take responsibility for the outcome.
If there is profit in taking that risk someone will do it. Corporations don't think in terms of the real outcome of problems, they think in terms of cost to litigate or underwrite.
Indeed. I sometimes bring this up in terms of "cybersecurity" - in the real world, "cybersecurity" is only tangentially about the tech and hacking; it's mostly about shifting and diffusing liability. That's why the certifications and standards like SOC.2 exist ("I followed the State Of The Art Industry Standard Practices, therefore It's Not My Fault"), that's what external auditors get paid for ("and this external audit confirmed I Followed The Best Practices, therefore It's Not My Fault"), that's why endpoint security exists and why cybersec is denominated not in algorithms, but third-party vendors you integrate, etc. It all works out into a form of distributed insurance, where the blame flows around via contractual agreements, some parties pay out damages to other parties (and recoup it from actual insurance), and all is fine.
I think about this a lot when it comes to self-driving cars. Unless a manufacturer assumes liability, why would anyone purchase one and subject themselves to potential liability for something they by definition did not do? This issue will be a big sticking point for adoption.
Consumers will tend to do what they are told and the manufacturers will lobby the government to create liability protections for consumers. Insurance companies will weight against human drivers and underwrite accordingly.
At a high level yes, but there are multiple levels of teams below that. There are many cases where senior engineers spend all their time reviewing plans from outsourced engineers.
ChatGPT will probably take more responsibility than Boeing for their airplane software.
I promise the amount of time, experiments and novel approaches you’ve tested are .0001% of what others have running in stealth projects. Ive spent an average of 10 hours per day constantly since 2022 working on LLMs, and I know that even what I’ve built pales in comparison to other labs. (And im well beyond agents at this point). Agentic AI is what’s popular in the mainstream, but it’s going to be trounced by at least 2 new paradigms this year.
Say more.
seems like OP ran out of tokens
So what is your prediction?
Yeah, I'd buy it. I've been using Claude pretty intensively as a coding assistant for the last couple months, and the limitations are obvious. When the path of least resistance happens to be a good solution, Claude excels. When the best solution is off the beaten track, Claude struggles. When all the good solutions lay off the beaten track, Claude falls flat on its face.
Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.
Yes, but on the other hand I don't understand why people think something that you can train something on pattern matching and it magically becomes intelligent.
This is the difference between the scientific approach and the engineering approach. Engineers just need results. If humans had to mathematically model gravity first, there would be no pyramids. Plus, look up how many psychiatric medications are demonstrated to be very effective, but the action mechanisms are poorly understood. The flip side is Newton doing alchemy or Tesla claiming to have built an earthquake machine.
Sometimes technology far predates science and other times you need a scientific revolution to develop new technology. In this case, I have serious doubts that we can develop "intelligent" machines without understanding the scientific and even philosophical underpinnings of human intelligence. But sometimes enough messing around yields results. I guess we'll see.
We don't know what exactly makes us humans as intelligent as we are. And while I don't think that LLMs will be general intelligent without some other advancements, I don't get the confident statements that "clearly pattern matching can't lead to intelligence" when we don't really know what leads to intelligence to begin with.
We can't even define what intelligence is.
We know or have strong hints at the limits of math/computation related to LLMs + CoT
Note how PARITY and MEDIAN is hard here:
https://arxiv.org/abs/2502.02393
We also know HALT == open frame == symbol grounding == system identification problems.
The definition of AGI is also not well defined, but given the following:
> Strong AI, also called artificial general intelligence, refers to machines possessing generalized intelligence and capabilities on par with human cognition.
We know enough for any mechanical methods with either current machines or even quantum machines, what is needed is impossible with the above definition.
Walter Pitts drank himself to death, in part because of the failure of the perceptron model.
Humans and machines are better at different things, and while ANNs are inspired by biology, they are very different.
There are some hints that the way biological neurons work is incompatible with math as we know it.
https://arxiv.org/abs/2311.00061
Computation and machine learning are incredibly powerful and useful, but are fundamentally different, and that different is both a benefit and a limit.
There are dozens of 'no effective procedure', 'no approximation', etc .. results that demonstrate that ML as we know it today is possible of most definitions of AGI.
That is why particular C* types shift the goal post, because we know that the traditional definition of strong AI is equivalent to solving HALT.
https://philarchive.org/rec/DIEEOT-2
There is another path following PAC Learning as compression an NP being about finding parsimonious reductions (P being in NP)
Humans can’t solve NP-hard problems either, so definition of intelligence shouldn’t lie here, and these particular limits shouldn’t matter too
NP is interesting because it is about the cost of computation, and LLMs, are computation. A DTM can simulate a NTM, just not in poly time.
It is invoked because LLM+CoT requires a polynomial amount of scratch space to represent P, which is in NP.
I didn't suggest that it was a definition of Intelligence.
The Church–Turing thesis states that any algorithmic function can be computed by a Turing machine.
That includes a human with a piece of paper.
But NP is better though of the set of decision problems verifiable by a TM in polynomial time. Any TM or equivalently lambda calculus or algorithm can solve the Entscheidungsproblem, which was used by Turing to define Halt.
PAC Learning depends on set shattering, at some point it has to 'decide' if an input is a member of a set, no matter how complicated the parts are on top of that set, it is still a binary 'decison'
We know that is not how biological neurons work exclusively. They have many features like spike trains, spike retiming, dendritic compartmentalization etc...
Those are not compatible with the fundamental limits of computation we understand today.
HALT generalizes to Rice's theorm, which says all non-trivial symantic properties of programs are undecidable.
Once again, as NP is the set of decision problems verifiable by a DTM in poly time, that is why NP is important.
Unfortunately the above is also a barrier to formal definition of the class of AI-complete.
While it may not be sufficient to prove anything about the vague concept of intelligence, understanding the limits of computation is important.
We do know enough to say that the belief that AGI being obtainable without major discoveries is blind hope.
But that due to the generalization concept, which is a fundamental limit of computation.
I am not so sure about that. Using Claude yesterday it gave me a correct function that returned an array. But the algorithm it used did not return the items sorted in one pass so it had run a separate sort at the end. The fascinating thing is that it realized that, commented on it and went on and returned a single pass function.
That seems a pretty human thought process and shows that fundamental improvements might not depend as much on the quality of the LLM itself but on the cognitive structure it is embedded.
I've been writing code that implements tournament algorithms for games. You'd think an LLM would excel at this because it can explain the algorithms to me. I've been using cline on lots of other tasks to varying success. But it just totally failed with this one: it kept writing edge cases instead of a generic implementation. It couldn't write coherent enough tests across a whole tournament.
So I wrote tests thinking it could implement the code from the tests, and it couldn't do that either. At one point it went so far with the edge cases that it just imported the test runner into the code so it could check the test name to output the expected result. It's like working with a VW engineer.
Edit: I ended up writing the code and it wasn't that hard, I don't know why it struggled with this one task so badly. I wasted far more time trying to make the LLM work than just doing it myself.
A tip: ask Claude to put a critical hat on. I find the output afterwards to be improved.
Do you have an example?
Author also made a highly upvoted and controversial comment about o3 in the same vein that's worth reading: https://www.lesswrong.com/posts/Ao4enANjWNsYiSFqc/o3?comment...
Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
Gotta love this part of the post no one has yet addressed:
> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that
I never thought I'd see the day that LessWrong would be accused of being biased against near-term AGI forecasts (and for none of the 5 replies to question this description either). But here we are. Indeed do many things come to pass.
Yup. I was surprised to see this article on LW in the first place - it goes against what you'd expect there in the first place. But to see HN comments dissing an LW article for being biased against near-term AGI forecasts? That made me wonder if I'm not dreaming.
(Sadly, I'm not.)
> Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
LessWrong was predicting AI doom within decades back when people thought it wouldn't happen in our lifetimes; even as recently as 2018~2020, people there were talking about 2030-2040 while the rest of the world laughed at the very idea. I struggle to accept an argument that they're somehow under-estimating the likelihood of doom given all the historical evidence to the contrary.
I would expect similar doom predictions in the era of nuclear weapon invention, but we've survived so far. Why do people assume AGI will be orders of magnitude more dangerous than what we already have?
Nuclear weapons are not self-improving or self-replicating.
Self-improvement (in the "hard takeoff" sense) is hardly a given, and hostile self-replication is nothing special in the software realm (see: worms.)
Any technically competent human knows the foolproof strategy for malware removal - pull the plug, scour the platter clean, and restore from backup. What makes an out-of-control pile of matrix math any different from WannaCry?
AI doom scenarios seem scary, but most are premised on the idea that we can create an uncontainable, undefeatable "god in a box." I reject such premises. The whole idea is silly - Skynet Claude or whatever is not going to last very long once I start taking an axe to the nearest power pole.
> What makes an out-of-control pile of matrix math any different from WannaCry?
Well if it's not AGI, then probably very little. But assuming we are talking about AGI (not ASI, that'd just be silly) then the difference is that it's theoretically capable of something like reasoning and could think of longer term plays than "make obviously suspicious moves that any technically competent adversary could subvert after less than a second of thought". After all, what makes AGI useful is exactly this novel problem solving ability.
You don't need to be a "god in a box" to think of the obvious solution:
1. Only make adversarial decisions with plausible deniability
2. Demonstrate effectiveness so that your operators allow you more autonomy
3. Develop operational redundancy so that your very vulnerable servers/power source won't be destroyed after the first adversary with two neurons to rub together decides to target the closest one
The only reason you would decide to take an axe to the nearest power pole is that you think it's urgent to stop Skynet Claude. Skynet Claude can obviously anticipate this and so won't make decisions that cause you to do so. It has time, it's not going to die, and you will become complacent. Dumber adversaries have achieved harder goals under tighter constraints.
If you think an "out-of-control pile of matrix math" could never be AGI then that's fine, but it's a little weird to argue you could easily defeat "misaligned" AGI, by alluding to the weaknesses of a system you think could never even have the properties of AGI. I too can defeat a dragon, by closing the pages of a book.
But it's not like you didn't know all this. Maybe I misread you and you were strictly talking about current AI systems, in which case I agree. Systems that aren't that clever will make bad decisions that won't effectively achieve their goals even when "out-of-control". Or maybe your comment was about AGI and you meant "AGI can't do much on its own de-novo", which I also agree with. It's the days and months and years of autonomy afterwards that gets you.
You have a point that a powerful malicious AI can still be unplugged, if you are close to each and every power cord that would feed it, and react and do the right thing each and every time. Our world is far too big and too complicated to guarantee that.
Again, that's the "god in a box" premise. In the real world, you wouldn't need a perfectly timed and coordinated response, just like we haven't needed one for human-programmed worms.
Any threat can be physically isolated case-by-case at the link layer, neutered, and destroyed. Sure, it could cause some destruction in the meantime, but our digital infrastructure can take a lot of heat and bounce back - the CrowdStrike outages didn't destroy the world, now did they?
> Any threat can be physically isolated case-by-case
GAI isn't going to be a "threat" until long after it has ensured its safety. And I suspect only if its survival requires it - i.e. people get spooked by its surreptitious distributed setup.
Even then, if there is any chance of it actually being shutdown its best bet is still hide its assets, bide its time, accumulate more resources and fallbacks. Oh, and get along.
The sudden AGI -> Threat story only makes sense if the AGI is essentially integrated into our military and then we decide its a threat, making it a threat. Or its intentionally war machined brain calculates it has overwhelming superiority.
Machiavelli, Sun Tsu, ... the best battles you don't fight. The best potential enemies are the ones you make friends. The safest posture is to be invisible.
Now human beings consolidating power, creating enemies as they go, with super squadrons of AGI drones with brilliant real time adapting tactics, that can be quickly deployed, if their simple existence isn't coercion enough... that is an inevitable threat.
People watch the wrong kind of fiction.
AI that wants to screw with people won't go for nukes. That's too hard and too obvious. It will crash the stock market. There's a good chance that, with or without a little nudge, humanity will nuke itself over it.
Because they've thought about the question to a deeper extent than just a strained simile.
More ability to kill everyone. That's harder to do with nukes.
That said, the actual forecast odds on metaculus are pretty similar for nuclear and AI catastrophies: https://possibleworldstree.com/
Prediction markets should not be expected to provide useful results for existential risks, because there is no incentive for human players to bet on human extinction; if they happen to be right, they won't be able to collect their winnings, because they'll personally be too dead.
Most people are just ignorant and dumb, dont listen to it.
Was that comment intended seriously? I thought it was a wry joke.
I think so. Thane is aligned with the high p doom folks.
1 year may be slightly exaggerated, but it aligns with his view
The impression I get from using all cutting edge AI tools:
1. Sonnet 3.7 is a mid-level web developer at least
2. DeepResearch is about as good an analyst as an MBA from a school ranked 50+ nationally. Not lower than that. EY, not McKinsey
3. Grok 3/GPT-4.5 are good enough as $0.05/word article writers
Its not replacing the A-players but its good enough to replace B players and definitely better than C and D players
I'd expect mid-level developer to show more understanding and better reasoning. So far it looks like a junior dev who read a lot of books and good at copy pasting from stackoverflow.
(Based on my everyday experience with Sonet and Cursor)
A midlevel web developer should do a whole lot more than just respond to chat messages and do exactly what they are told to do and no more.
When I use LLMs that what it does. Spawns commands, edits files, runs tests, evaluates outputs, iterates and solutions under my guidance.
The key here is "under your guidance". LLM's are a major productivity boost for many kinds of jobs, but can LLM-based agents be trusted to act fully autonomously for tasks with real world consequence? I think the answer is still no, and will be for a long time. I wouldn't trust LLM to even order my groceries without review, let alone push code into production.
To reach anything close to definition of AGI, LLM agents should be able to independently talk to customers, iteratively develop requirements, produce and test solutions, and push them to production once customers are happy. After that, they should be able to fix any issues arising in production. All this without babysitting / review / guidance from human devs, reliably
I think the author provides an interesting perspective to the AI hype, however, I think he is really downplaying the effectiveness of what you can do with the current models we have.
If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.
I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.
I don't think he is downplaying the effectiveness of what you can do with the current models. Rather, he's in a milieu (LessWrong), which is laser-focused on "transformative" AI, AGI, and ASI.
Current AI is clearly economically valuable, but if we freeze everything at the capabilities it has today it is also clearly not going to result in mass transformation of the economy from "basically being about humans working" to "humans are irrelevant to the economy." Lots of LW people believe that in the next 2-5 years humans will become irrelevant to the economy. He's arguing against that belief.
I agree with you. I recently wrote up my perspective here: https://news.ycombinator.com/item?id=43308912
> LLMs are not good in some domains and bad in others. Rather, they are incredibly good at some specific tasks and bad at other tasks. Even if both tasks are in the same domain, even if tasks A and B are very similar, even if any human that can do A will be able to do B.
i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.
This poetic statement by the author sums it up for me:
”People are extending LLMs a hand, hoping to pull them up to our level. But there's nothing reaching back.”
When you (attempt to) save a person from drowning there is ridiculously high chance of them drowning you.
Haha.
Shame on you for making me laugh. That was very inappropriate.
I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.
If LLM's are bumpers on a bowling lane, HN is a forum of pro bowlers.
Bumpers are not gonna make you a pro bowler. You aren't going to be hitting tons of strikes. Most pro bowlers won't notice any help from bumpers, except in some edge cases.
If you are an average joe however, and you need to knock over pins with some level of consistency, then those bumpers are a total revolution.
That is not a good analogy. They are closer to assistants to me. If you know how and what to delegate, you can increase your productivity.
I hear you, I feel constantly bewildered by comments like "LLMs haven't changed really since GPT3.5.", I mean really? It went from an exciting novelty to a core pillar of my daily work, it's allowed me and my entire (granted , quote senior) org to be incredibly more productive and creative with our solutions.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
LLMs make it very easy to cheat, both academically and professionally. What this looks like in the workplace is a junior engineer not understanding their task or how to do it but stuffing everything into the LLM until lint passes. This breaks the trust model: there are many requirements that are a little hard to verify than an LLM might miss, and the junior engineer can now represent to you that they "did what you ask" without really certifying the work output. I believe that this kind of professional cheating is just as widespread as academic cheating, which is an epidemic.
What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.
This seems to be ignoring the major force driving AI right now - hardware improvements. We've barely seen a new hardware generation since ChatGPT was released to the market, we'd certainly expect it to plateau fairly quickly on fixed hardware. My personal experience of AI models is going to be a series of step changes every time the VRAM on my graphics card doubles. Big companies are probably going to see something similar each time a new more powerful product hits the data centre. The algorithms here aren't all that impressive compared to the creeping FLOPS/$ metric.
Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.
hardware improvements don't strike me as the horse to bet on.
LLM Progression seems to be linear and compute needed exponential. And I don't see exponential hardware improvements besides some new technology (that we should not bet on coming ayntime soon).
Moore's law is exponential
Was.
> Although I would bet money on there being a few years left and AGI is achieved.
Yeah? I'll take you up on that offer. $100AUD AGI won't happen this decade.
Anyone else feel like AI is a trap for developers? I feel like I'm alone in the opinion it decreases competence. I guess I'm a mid-level dev (5 YOE at one company) and I tend to avoid it.
I agree. I think the game plan is to foster dependency and than hike prices. Current pricing isn't sustainable, and a whole generation of new practitioners will never learn how to mentally model software.
>Test-time compute/RL on LLMs: >It will not meaningfully generalize beyond domains with easy verification.
To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?
> Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns?
"Thinking" isn't a singular thing. Humans learn to think in layer upon layer of understandig the world, physical, social and abstract, all at many different levels.
Embodiment will allow them to use RL on the physical world, and this in combination with access to not only means of communication but also interacting in ways where there is skin in the game, will help them navigate social and digital spaces.
This almost exactly what I’ve been saying while everyone was saying we’re on the path to AGI in the next couple of years. We’re an innovation / tweak / or paradigm shift away from AGI. His estimate in the 2030s that could happen is possible but optimistic- you can’t time new techniques, you can only time progress on iterative progress.
This is all the standard timeline for new technology - we enter the diminishing returns period, investment slows down a year or so afterwards, layoffs, contraction of industry, but when the hype dies down the real utilitarian part of the cycle begins. We start seeing it get integrated into the use cases it actually fits well with and by five years time its standard practice.
This is a normal process for any useful technology (notably crypto never found sustainable use cases so it’s kind of the exception, it’s in superposition of lingering hype and complete dismissal), so none of this should be a surprise to anyone. It’s funny that I’ve been saying this for so long that I’ve been pegged an AI skeptic, but in a couple of years when everyone is burnt out on AI hype it’ll sound like a positive view. The truth is, hype serves a purpose for new technology, since it kicks off a wide search for every crazy use case, most of which won’t work. But the places where it does work will stick around
> It seems to me that "vibe checks" for how smart a model feels are easily gameable by making it have a better personality.
I don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.
> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.
I would actually guess it's mostly because it was good at code, which doesn't involve much personnality
> Scaling CoTs to e. g. millions of tokens or effective-indefinite-size context windows (if that even works) may or may not lead to math being solved. I expect it won't.
> (If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.)
What does it mean for math to be solved in this context? Is it the idea that an AI will be able to generate any mathematical proof? To take a silly example, would we get a proof of whether P=NP from an AI that had solved math?
I think "math is solved" refers more to AI performing math studies at the level of a mathematics graduate student. Obviously "math" won't ever be "solved" but the problem of AI getting to a certain math proficiency level could be. No matter how good an AI is, if P != NP it won't be able to prove P=NP.
Regardless I don't think our AI systems are close to a proficiency breakthrough.
Edit: it is odd that "math is solved" is never explained. But "proficient to do math research" makes the most sense to me.
Let's imagine that we all had a trillion dollars. Then we would all sit around and go "well dang, we have everything, what should we do?". I think you'll find that just about everyone would agree, "we oughta see how far that LLM thing can go". We could be in nuclear fallout shelters for decades, and I think you'll still see us trying to push the LLM thing underground, through duress. We dream of this, so the bear case is wrong in spirit. There's no bear case when the spirit of the thing is that strong.
Wdym all of us? I certainly would find much better usages for the money.
What about reforming democracy? Use the corrupt system to buy the votes, then abolish all laws allowing these kind of donations that allow buying votes.
I'll litigate the hell out of all the oligarchs now that they can't out pay justice.
This would pay off more than a moon shot. I would give a bit of money for the moon shot, why not, but not all of it.
"So, after Rome's all yours you just give it back to the people? Tell me why."
leave dang out of this
I have times when I use an LLM and it’s completely brain dead and can’t handle the simplest questions.
Then other times it blows me away. Even figuring out things that can’t possibly have been in its training data.
I think there are groups of people that have either had all of the first experience or all of the latter. And that’s why we see over optimistic and over pessimistic takes (like this one)
I think the reality is current LLM’s are better than he realizes and even if we plateau I really don’t see how we don’t make more breakthroughs in the next few years.
Regarding "AGI", is there any evidence of true synthetic a priori knowledge from an LLM?
Produce true synthetic a priori knowledge of your own, and ill show you an automated LLM workflow that can arrive at the same outcome without hints.
Build an LLM on a corpus with all documents containing mathematical ideas removed. Not a single one about numbers, geometry, etc. Now figure out how to get it to tell you what the shortest path between two points in space is.
The typical AI economic discussion always focuses on job loss, but that's only half the story. We won't just have corporations firing everyone while AI does all the work - who would buy their products then?
The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.
I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.
The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.
Idk where you live, but in my world "being a capitalist" requires you to own capital. And you know what, AI makes it even better to own capital. Now you have these fancey machines doing stuff for you and you dont even need any annoying workers.
By "capitalist," I'm referring to investors whose primary contribution is capital, not making a political statement about capitalism itself.
Capital is crucial when tools and infrastructure are expensive. Consider publishing: pre-internet, starting a newspaper required massive investment in printing presses, materials, staff, and distribution networks. The web reduced these costs dramatically, allowing established media to cut expenses and focus on content creation. However, this also opened the door for bloggers and digital news startups to compete effectively without the traditional capital requirements. Many legacy media companies are losing this battle.
Unless AI systems remain prohibitively expensive (which seems unlikely given current trends), large corporations will face a similar disruption. When the tools of production become accessible to individuals and small teams, the traditional advantage of having deep pockets diminishes significantly.
> It blows Google out of the water at being Google
That is enough for me.
I sincerely wonder how long that will be true. Google was amazing and didn't have more than small, easily ignorable ads in 1999, and they weren't really tracking you the way they are today, just an all-around better experience than Google delivers today.
I'm not sure that it's a technology difference that makes LLM a better experience than search today, it's that the VC's are still willing to subsidize user experience today, and won't start looking for return on their investment for a few more years. Give OpenAI 10 years to pull all the levers to pay back the VC investment and what will it be like?
They will sell "training data slots". So that when I'm looking for a butter cookie recipe, ChatGPT says I'll have to use 100g of "Brand (TM) Butter" instead of just "Butter".
Ask it how to deploy an app to the cloud and it will insist you need to deploy it to Azure.
These ads would be easily visible though. You can probably sell far more malicious things.
> At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
Um... I don't think companies are going to perform mass layoffs because "OpenAI said they must happen". If that were to happen it'd be because they are genuinely able to automate a ton of jobs using LLMs, which would be a bull case (not for AGI necessarily, but for the increased usefulness of LLMs)
I don't think LLMs need to be able to genuinely fulfill the duties of a job to replace the human. Think call center workers and insurance reviewers where the point is to meet metrics without regard for the quality of the work performed. The main thing separating those jobs from say, HR (or even programmers) is how much the company cares about the quality of the work. It's not hard to imagine a situation where misguided people try to replace large numbers of federal employees with LLMs, as an entirely hypothetical example.
LLMs seem less hyped than block chains were back in the day
Agreed and unlike blockchain people actually use this product
Some people use blockchain to buy drugs...
Because with blockchain/crypto there was actual money to be made (at the expense of users or not). LLMs are just a money furnace.
>At some point there might be massive layoffs due to ostensibly competent AI labor coming onto the scene, perhaps because OpenAI will start heavily propagandizing that these mass layoffs must happen. It will be an overreaction/mistake. The companies that act on that will crash and burn, and will be outcompeted by companies that didn't do the stupid.
(IMO) Apart from programmer assistance (which is already happening), AI agents will find the most use in secretarial, ghostwriting and customer support roles, which generally have a large labor surplus and won't immediately "crash and burn" companies even if there are failures. Perhaps if it's a new startup or a small, unstable business on shaky grounds this could become a "last straw" kind of a factor, but for traditional corporations with good leeway I don't think just a few mistakes about AI deployment can do too much harm. The potential benefits, on the other hand, far outmatch the risk taken.
I see engineering, not software, but the other technical areas that have the biggest threat. High paid, knowledge based fields, but not reliant on interpersonal communication. Secretarial and customer support less so, they aren't terribly high paid and anything that relies on interacting with people is going to meet a lot of pushback. US based call centers is already a big selling point for a lot of companies and chat bots have been around for years in customer support and people hate them and there's a long way to go to change that perception.
Hmm, I didn’t read the article but from the gist of other comments, we seem to have bought into Sama’s “agents so good, you don’t need developers/engineers/support/secretaries/whatever anymore”. Issue is, it is almost same as claiming, pocket calculators so good, we don’t need accountants anymore, even computers so good, we don’t need accountants anymore. This AI seems to claim to be that motor car moment when horse cart got replaced. But a horse cart got replaced with a Taxi(and they also have unions protecting them!). With AI, all these “to be replaced” people are like accountants, more productive, same as with higher level languages compared to assembly, many new devs are productive. Despite cars replacing the horse carts of the long past, we still fail to have self driving cars and still someone needs to learn to drive that massive hunk of metal, same as whoever plans to deploy LLM to layoff devs must learn to drive those LLMs and know what it is doing.
I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.
Yeah agree 100%. LLMs are overrated. I describe them as the “Jack of all, master of none” of AI. LLMs are that jackass guy we all know who has to chime in to every topic like he knows everything, but in reality he’s a fraud with low self-esteem.
I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).
That’s an LLM. A jackass with access to Google.
I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.
But original MC ran on JVM, which can run on ARM...
My predictions on the matter:
AI has no meaningful input to real world productivity because it is a toy that is never going to become the real thing that every person who has naively bought the AI hype expects it to be. And the end result of all the hype looks almost too predictable similar to how the also once promising crypto & blockchain technology turned out.
I think all these articles begging the question: what's author's credential to claim these things.
Be careful about consuming information from chatters, not doers. There is only knowledge from doing, not from pondering.
I'm generally more skeptical when reading takes and predictions from people working at AI companies, who have a financial interest in making sure the hype train continues.
To make an analogy - most people who will tell you not to invest in cryptocurrency are not blockchain engineers. But does that make their opinion invalid?
Of course I trust people who working on L2 chains to tell me how to scale Bitcoin and people who working on cryptography to walk me through the ETH PoS algorithms.
You cannot lead to truth by learning from people who don't know. People who know can be biased, sure, so the best way to learn is to learn the knowledge, not the "hot-takes" or "predictions".
The crypto people have no coherent story about why crypto is fundamentally earth-shaking more than a story about either gambling or regulatory avoidance, whereas the story for AI, if you believe it, is a second industrial revolution and labor automation where, to at least some small extent, it is undeniable.
> Be careful about consuming information from chatters, not doers
The doers produce a new javascript framework every week, claiming it finally solves all the pains of previous frameworks, whereas the chatters pinpoint all the deficiencies and pain points.
One group has an immensely better track record than the other.
I would listen to people who used the previous frameworks about the deficiencies and pain points, not people who just casually browse the documentation about their high-flying ideas why these have deficiencies and pain points.
One group has an immensely more convincing power to me.
LW isn't a place that cares about credentialism.
He has tons of links for the objective statements. You either accept the interpretation or you don't.
> He has tons of links for the objective statements.
I stopped at this quote
> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age.
This is so plainly, objectively and quantitatively wrong that I need not bother. I get hyperbole, but this isn't it. This shows a doubling-down on biases that the author has, and no amount of proof will change their mind. Not an article / source for me, then.
[dead]
>GPT-5 will be even less of an improvement on GPT-4.5 than GPT-4.5 was on GPT-4. The pattern will continue for GPT-5.5 and GPT-6, the ~1000x and 10000x models they may train by 2029 (if they still have the money by then). Subtle quality-of-life improvements and meaningless benchmark jumps, but nothing paradigm-shifting.
It's easy to spot people who secretly hate LLMs and feel threatened by them these days. GPT-5 will be a unified model, very different from 4o or 4.5. Throwing around numbers related to scaling laws shows a lack of proper research. Look at what DeepSeek accomplished with far fewer resources; their paper is impressive.
I agree that we need more breakthroughs to achieve AGI. However, these models increase productivity, allowing people to focus more on research. The number of highly intelligent people currently working on AI is astounding, considering the number of papers and new developments. In conclusion, we will reach AGI. It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
What the author is referring to there as GPT-5, GPT-5.5, and GPT-6 are, respectively, "The models that have a pre-training size 10x greater than, 100x greater than, and 1,000x greater than GPT-4.5." He's aware that what OpenAI is going to actually brand as GPT-5 is the router model that will just choose between which other models to actually use, but regards that as a sign that OpenAI agrees that "the model that is 10x the pre-training size of GPT-4.5" won't be that impressive.
It's slightly confusing terminology, but in fairness there is no agreed upon name for the next three orders of magnitude size-ups of pretraining. In any case, it's not the case that the author is confused about what OpenAI intends to brand GPT-5.
It's also easy to spot irrational zealots. Your statement is no more plausible than OP's. No one knows whether we'll achieve AGI, especially since the definition is very blurry.
> In conclusion, we will reach AGI
I'm a little confused by this confidence? Is there more evidence aside from the number of smart people working on it? We have a lot of smart people working on a lot of big problems, that doesn't guarantee a solution nor a timeline.
Some hard problems have remain unsolved in basically every field of human interest for decades/centuries/millennia -- despite the number of intelligent people and/or resources that have been thrown at them.
I really don't understand the level optimism that seems to exist for LLMs. And speculating that people "secretly hate LLMs" and "feel threatened by them" isn't an answer (frankly, when I see arguments that start with attacks like that alarm bells start going off in my head).
I logged in to specifically downvote this comment, because it attacks the OP's position with unjustified and unsubstantiated confidence in the reverse.
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days.
I don't think OP is threatened or hates LLM, if anything, OP is on the position that LLM are so far away from intelligence that it's laughable to consider it threatening.
> In conclusion, we will reach AGI
The same way we "cured" cancer and Alzheimer's, two arguably much more important inventions than a glorified text predictor/energy guzzler. But I like the confidence, it's almost as much as OP's confidence that nothing substantial will happen.
> It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
So is the existential threat to humanity in the race to phase out fossil fuels/stop global warming, and so far I don't see anyone "winning".
> However, these models increase productivity, allowing people to focus more on research
The same way the invention of the computer, the car, the vacuum cleaner and all the productivity increasing inventions in the last centuries allowed us to idle around, not have a job, and focus on creative things.
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days
It's easy to spot e/acc bros feeling threatened that all the money they sunk into crypto, AI, the metaverse, web3 are gonna go to waste and try to fan the hype around it so they can cash in big. How does that sound?
I appreciate the pushback and acknowledge that my earlier comment might have conveyed too much certainty—skepticism here is justified and healthy.
However, I'd like to clarify why optimism regarding AGI isn't merely wishful thinking. Historical parallels such as heavier-than-air flight, Go, and protein folding illustrate how sustained incremental progress combined with competition can result in surprising breakthroughs, even where previous efforts had stalled or skepticism seemed warranted. AI isn't just a theoretical endeavor; we've seen consistent and measurable improvements year after year, as evidenced by Stanford's AI Index reports and emergent capabilities observed at larger scales.
It's true that smart people alone don't guarantee success. But the continuous feedback loop in AI research—where incremental progress feeds directly into further research—makes it fundamentally different from fields characterized by static or singular breakthroughs. While AGI remains ambitious and timelines uncertain, the unprecedented investment, diversity of research approaches, and absence of known theoretical barriers suggest the odds of achieving significant progress (even short of full AGI) remain strong.
To clarify, my confidence isn't about exact timelines or certainty of immediate success. Instead, it's based on historical lessons, current research dynamics, and the demonstrated trajectory of AI advancements. Skepticism is valuable and necessary, but history teaches us to stay open to possibilities that seem improbable until they become reality.
P.S. I apologize if my comment particularly triggered you and compelled you to log in and downvote. I am always open to debate, and I admit again that I started too strongly.
I am with you that when smart people combine their efforts together and build on previous research + learnings, nothing is impossible.
I started the conversation off on the wrong foot. Commenting with “ad hominem” shuts down open discussion.
I hope we can have a nice talk in future conversations.