Category: AI LLMs Agents

Difficulty: Easy

#123 Why This Is the Decade of Agents (Not the Year) 🧠🚀

We keep hearing that this is “the year of AI agents.” I don’t buy it. If you look a bit closer at where we actually are, it feels much more honest to say: this will be the decade of agents. They’re already useful and impressive—but they’re also obviously incomplete. In this post I’ll unpack why, what’s missing, how we got here, and what it means if you’re building with (or around) agents today. 🧠🤖

Year vs. Decade: Why the timeline matters

“Year of agents” is a vibes-based statement. It’s like calling 2007 “the year of smartphones” because the first iPhone shipped.

You could say that. But if you’d tried to run your entire business, social life, creativity, and payments on a 2007-era mobile stack, you’d be in tears.

That’s roughly where we are with agents:

They’re real and already useful.
They’re nowhere near the end state implied by “this is the year”.

Ask yourself this simple question:

“Would I hire an AI agent today the way I’d hire a junior employee and give it a real chunk of responsibility?”

For most serious workflows, the honest answer is still no.

The “decade of agents” suggests something much bigger:

Agents with persistent identity and memory
That operate across tools, UIs, and modalities
That learn from experience over weeks/months
That can be trusted with real, irreversible actions
That you can actually put “on the org chart” in some sense

That’s not a quarterly release away. That’s years of engineering and research.

A short history of how we got here

We didn’t start with agents. Roughly, the last ~15 years of modern AI went something like this:

1. Per-task deep learning (the “little islands” era)

Image classifiers
Machine translation systems
Speech recognizers
Each model: one job, trained from scratch

This was like inventing highly specialized little brains. Powerful, but isolated.

2. Early agents in games (the “Atari optimism” era)

Then came deep RL success stories:

Atari agents learning to play dozens of games
Go-playing systems and other game benchmarks

This created a wave of optimism:

“If we can train agents to master games, we’ll just scale that to real life!”

The problem: games are too clean. The world is not a reward-shaped Atari screen. Real tasks are messy, sparse, multi-objective, full of social and legal constraints.

3. LLMs and foundation models (the “representation first” era)

The big unlock was realizing:

You need a huge, general world model before you try to stick an “agent loop” on top.

LLMs are basically:

Massive pattern machines trained on internet-scale data
Next-token predictors that accidentally learned:
- In-context learning
- Rough reasoning
- Tool-use via text APIs
- Basic planning traces

Now we’re bolting agents onto these foundation models:

Tool-calling
Browsing
Code editing
Simple computer use

But in some sense, we tried for agents too early (games, web RL), realized we were missing representation power, and swung back to LLMs.

The 2020s are the decade where those two worlds finally merge properly.

What real “agentic” capability will require

Let’s zoom into the big missing ingredients.

1. Continual learning (agents that don’t forget you exist)

Today’s agents are basically stateless:

Each session: fresh brain, same weights
“Memory” is usually:
- A vector store
- Manual notes
- Some clunky RAG hack

You can’t treat them like a colleague who:

Remembers your preferences
Knows your long-term projects
Adapts to your style
Quietly improves over time

True agents will need:

Stable personal memory
- Not just “recall this doc”
- But “I now behave differently because of what we did last week”
Consolidation mechanisms
- Like sleep in humans
- Compress recent interactions into durable internal changes
Safety-aware adaptation
- You don’t want a model fine-tuning itself into madness
- Updates need to be constrained, audited, reversible

Until we have some version of that, agents will feel like really smart amnesiacs.

2. Multimodal understanding that’s operational, not just “cool”

Models can now read and describe images, PDFs, charts, and screenshots. Great.

But operationally, a working agent should be able to:

Understand your real tools:
- Admin dashboards
- Email threads
- Monitoring graphs
- Figma screens
Understand what matters in them:
- “This error rate is spiking”
- “This row violates a constraint”
- “This UI is in a broken state”
Take appropriate action based on that understanding

Right now, most agents are:

Over-confident when they’re wrong
Brittle to small UI or layout changes
Easily confused by cluttered, messy real-world screens

The models are getting better, but “see + act” in real tools is still early.

3. Real computer use, not just “call this tool”

Tool-calling is a massive step forward. But it’s also very curated:

Neatly typed arguments
Clean JSON schemas
Highly abstracted functions

Real knowledge work is not that neat.

A serious “digital worker” agent needs to:

Navigate UIs like a human:
- Sign in, handle 2FA, dismiss popups
- Scroll, click, drag, resize
Deal with weirdness:
- Half-loaded pages
- Stuck spinners
- Partially-saved forms
Combine it all with tools:
- Mix “low-level” UI actions with “high-level” APIs

This is hard because:

The state space is huge
Rewards are sparse and delayed
UX changes constantly

But if you want “agents that can really work,” this is the hill to climb.

4. Better learning signals than “one reward at the end”

Current RL approaches for LLMs often do something like:

Roll out a whole answer / chain-of-thought / program
Check how good it is (maybe via a judge model)
Push all the tokens slightly up or down

This is… not great. It’s like grading a novel with:

“7/10. Do more of… whatever you did.”

Humans do something more like:

“This step was a good idea.”
“These three steps were redundant.”
“Here is where I went off the rails.”

We need richer learning mechanisms:

Process-based feedback
- Grade intermediate reasoning steps, not just the final answer
Self-reflection & review
- Models generating critiques of their own attempts
- Then training on those critiques in a controlled way
Robust reward models
- That aren’t trivial to exploit with adversarial nonsense (“dhdhdhdh” getting full marks)

Until those improve, “agentic RL” will feel noisy, fragile and limited.

5. Separating “cognitive core” from “encyclopedia”

Current LLMs are jammed full of:

World knowledge
Domain trivia
Low-level text patterns

Plus some emergent reasoning ability.

There’s a strong argument that we want to decouple those:

A smaller, sharper cognitive core that:
- Knows how to reason, learn, search, decompose
- Is good at “thinking” under uncertainty
Connected to external knowledge systems:
- Databases, APIs, documents
- Search engines, codebases, wikis

Why?

We don’t want the model hallucinating answers it is unsure about.
We want it to know that it doesn’t know and go look things up.
We want memory to be updatable without retraining the entire brain.

The decade of agents might be the decade where:

Big monolith LLMs slowly break into:
- Cognitive engines
- Knowledge stores
- Tool ecosystems
And agents become the orchestrators of that whole cluster.

Multi-agent systems: culture, self-play, and “AI societies”

Humans didn’t become powerful by being individually smart. We became powerful by:

Sharing knowledge (culture, books, the internet)
Specializing (division of labor)
Competing & cooperating (markets, research communities)
Building institutions (companies, universities, governments)

Most agent work today is single-player: “one agent, one user, one environment”.

The truly interesting frontier is:

Agents with culture

Shared memories, playbooks, and best practices
Agents writing tools, docs, and tutorials for other agents
Long-lived shared artifacts that outlast any single instance

Agents in self-play

Creating problems and challenges for each other
Adversarial training, but at the task/skill level
Competitive & cooperative dynamics that ratchet up capabilities

Agent organizations

Clusters of specialized agents:
- Research agent
- Ops agent
- “Manager” / planner agent
Coordinating and delegating work
With humans in the loop as:
- Product owners
- Ethics & safety oversight
- Tie-breakers and goal-setters

We’re barely scratching the surface here.

The analogy with self-driving: demos vs. reality

Self-driving is a useful metaphor for agents:

Impressive demos came early.
Real, scaled, trustworthy deployment is slow.
Every extra “9” of reliability is a constant amount of painful work.

With agents:

We already have nice demos (video edits, code refactors, web automation).
But:
- Environments are adversarial and changing.
- Costs of certain failures are high (security, safety, money).
- Legal + social + UX constraints matter.

If self-driving taught us anything, it’s this:

“Wow, cool demo” is the beginning, not the end.

Expect:

A march of nines, where:
- Going from 90% → 99% takes a lot of work
- 99% → 99.9% is another full slog
- And so on…
A long period where:
- Agents work most of the time
- Quiet human guardians clean up the 1% failures

This is very compatible with “decade of agents.”

What this means if you’re building now

All of this long-term talk is nice, but what do you do this year?

1. Use agents as power tools, not replacements

Treat agents like:

Very fast, sometimes-brilliant interns
That:
- Make silly mistakes
- Lie confidently sometimes
- Need guardrails and review

Design around that:

Keep humans in high-leverage review positions
Use agents to:
- Draft
- Explore
- Suggest
- Execute repetitive stuff

2. Build infra, not just a flashy demo ✨

Longer-term value will be in:

Evaluation + monitoring
Safety and guardrails
Memory and identity systems
Good feedback loops (what went well / badly)
Hybrid human–agent workflows

Those things will survive the hype cycles.

3. Optimize for learning, not for clairvoyant prediction

We don’t know the exact shape of the agent stack in 5 years.

So:

Don’t over-fit your system to one particular vendor or API shape.
Invest in:
- Clear abstractions
- Loose coupling between “brains” and “tools”
- Observability so you can see what’s going wrong
Design your product so agents can be upgraded or swapped without rewriting the world.

The human side: agents and the future of learning 🎓

One more crucial angle: agents change what education should look like.

If agents get better at:

Automating routine work
Performing knowledge tasks
Acting as junior employees

Then humans need to:

Climb up the abstraction ladder
Focus on:
- Problem selection
- Goal setting
- Strategy and intuition
- Taste, values, and judgment

Education has to shift from:

“Here’s a pile of facts, remember them”
Toward:
- Deep understanding
- Model-building
- Iterative practice
- Tool-using fluency

Two key visions here:

Before AGI: upskilling as survival

In the near term:

Learning AI, stats, systems, etc. is economically important.
We need better “ramps to knowledge”:
- Well-designed courses
- Good capstone projects
- AI-augmented feedback loops

A serious AI course today shouldn’t just be:

“Here’s how to prompt ChatGPT.”

It should look more like:

“Here’s how to train, evaluate, and deploy your own small model or agent end-to-end.”

That kind of hands-on literacy gives you agency in the agentic decade.

After AGI: learning as flourishing

Longer term, if a lot of economically-necessary work is automated, learning becomes:

Less about survival
More about meaning, fun, and growth

Think of:

People going to the gym even though machines can do all the heavy lifting.
People learning instruments even though you can stream any song.

We could have:

Personal AI tutors as good as the best human tutors
Learning experiences that:
- Diagnose your current mental model
- Serve you exactly-right challenges
- Never leave you bored or overwhelmed

If that happens, education turns into:

The cognitive equivalent of fitness culture. 🏋️‍♀️🧠 Not required—but deeply rewarding.

That’s a future worth steering toward.

So: why the decade of agents?

Summing it up:

We already have impressive prototypes.
We’re missing:
- Continual learning
- Robust multimodal computer use
- Richer RL & feedback
- Decent separation of “cognitive core” vs. “knowledge”
- Multi-agent culture & self-play
- A serious agent/human co-working stack

Those are decade-scale problems, not “Q4 roadmap” items.

Calling this the year of agents creates unrealistic expectations:

Over-promising to users & customers
Sloppy deployments
Backlash when things fail in serious ways

Calling it the decade of agents does something healthier:

Acknowledges the progress
Leaves room for the grind
Reminds us this is a long game

And for builders, that’s good news:

There is still a ton of open space to do meaningful, foundational work.

If we do it right, the decade of agents won’t just be about armies of bots clicking around screens. It’ll be about:

Better tools
Better learning
Better collaboration
And, hopefully, better humans on the other side of all that automation. ✨

Written on November 25, 2025