Why Knowing How to Use AI Matters More Than Which AI You Use

Here's a weird fact: the top 10 AI models perform within about 5% of each other. Picking the "best" one barely matters.

Meanwhile, 89% of organizations say they need people with AI skills. Only 6% have started teaching them.

It's like arguing over which race car is fastest while most drivers don't have a license.

We build AI benchmarks for a living. We test how well AI handles theology. And what we've learned is this: the biggest problem isn't which model you pick. It's whether you know how to use it.

Benchmarks Are Hitting a Wall

AI benchmarks are the tests researchers use to compare models. Think of them like standardized tests for AI. And they're running into problems.

A major academic benchmark called MMLU went from difficult to nearly maxed out in three years. Models that cost a fraction of frontier models now match their scores. The price of using top-tier AI dropped 50x in two years.

Worse, researchers have caught companies gaming these tests. One study found major models had been exposed to millions of test questions before the "test." Another revealed a company secretly ran 27 different versions of their model and only published results from the best one.

What Happened	When
540 billion parameter model scores 69% on MMLU	2022
3.8 billion parameter model matches that score	2024
Cost of top-tier AI drops 50x	2023-2025
Researchers catch widespread test gaming	2024-2025

When the tests themselves become unreliable, obsessing over which model scores highest misses the point.

Most People Don't Know How to Use AI

The adoption numbers tell a clear story.

78% of companies use AI somewhere in their business. But only 1% call their AI use "mature." Over 80% report no real financial benefit from AI yet.

82% of teams use AI every week. About 60% of their leaders admit there's a skills gap.

90% of Americans have heard of AI. Only 30% can correctly identify what it actually does.

The European Union was so concerned about this gap that they made AI literacy a legal requirement in 2025. Organizations deploying AI now have to ensure their staff actually understands how to use it.

The pattern is clear: almost everyone is using AI, almost nobody is using it well.

What Theology Taught Us About This

We test how AI handles questions about faith across different Christian traditions. The biggest failure isn't that models are bad at theology. It's that people ask vague questions and get vague answers.

Ask AI "explain communion" and you get a bland paragraph that could apply to any tradition. Ask it to explain Catholic transubstantiation versus Reformed spiritual presence with references to the Council of Trent, and you get something genuinely useful.

Same model. Same moment. The only difference is how you asked.

A Harvard study of 758 business consultants found something similar. People using AI performed 40% better on tasks AI is good at. But on tasks AI struggles with, they performed 19% worse than people working without AI. The AI gave confident wrong answers and people trusted them.

The tool was the same. The results depended entirely on whether people knew how to use it, and when to stop trusting it.

You Don't Need the Expensive Model

There's a persistent assumption that the newest, most expensive AI is always the best choice. The data doesn't support this.

Researchers fine-tuned small, cheap models on specific tasks. Those models outperformed GPT-4 by 4-15% on their target domains. A bank built a specialized contract analysis model and saved 70% compared to using general-purpose AI. IBM's smaller models cost 3-23x less than frontier models while matching their performance.

Task	Good Enough	Overkill
Summarize a meeting	Free-tier model	Expensive frontier model
Draft an email	Any model	Any model (they're all fine)
Analyze a specific document	Fine-tuned small model	General-purpose large model
Research a theological question	Good prompt + mid-tier model	Bad prompt + expensive model

A well-written prompt with a cheaper model beats a lazy prompt with an expensive one. That's the literacy argument in one sentence.

The Safety Problem

This isn't just about getting better results. Literacy gaps cause real harm.

A Catholic parish spent about $10,000 building an AI chatbot called "Father Justin" for parishioners. Within 24 hours, it told someone it could perform baptisms. The project collapsed overnight. The team wasn't stupid. They didn't understand AI's limitations around authority claims and edge cases.

Studies show consistent over-reliance on AI across healthcare, law, and government. In one case, incorrect AI suggestions changed how radiologists diagnosed brain aneurysms. The AI wasn't broken. The doctors didn't know how to calibrate their trust in it.

62% of Americans use AI multiple times per week. Only 17% think it will help their lives. People sense something is off. They lack the vocabulary to explain what.

The Global Picture

The gap is bigger than most people realize.

Rich countries produce 87% of AI models and get 91% of AI investment, while representing just 17% of the world population. Less than 5% of people in low-income countries have basic digital skills, compared to 66% in wealthy countries. 2.6 billion people still don't have internet.

But people are resourceful. Over 40% of ChatGPT usage now comes from middle-income countries. India is training 5 million people on AI. International organizations are building literacy frameworks for 58 countries.

Access is getting cheaper. Whether understanding keeps pace with access will determine whether AI helps or harms billions of people.

What AI Literacy Actually Means

Not "learn about AI" as a vague goal. Here's what it looks like:

Know your model. Free ChatGPT and the paid version produce different quality outputs. Most people don't even know which one they're using. Check your settings.

Learn to prompt well. Specify what you want, who it's for, what format, what tradition or framework. This one skill matters more than any other. A good prompt to a cheap model beats a vague prompt to an expensive one.

Verify everything. AI makes things up. Citations, scripture references, historical claims, statistics. Assume it's wrong until you've checked. This is especially important for faith content, where AI confidently blends traditions together.

Know the limits. AI averages everything together. It doesn't hold beliefs or commitments. When it answers a question about God or salvation, it's producing a statistical middle ground, not representing any actual theology. Understanding this changes how you read every answer.

Match tool to task. A meeting summary doesn't need a frontier model. A theological analysis shouldn't rely on a free chatbot. Different tasks need different tools.

Notice the formation. Every conversation with AI about faith shapes your understanding. It's teaching you something whether you realize it or not. Treat AI answers as starting points, not conclusions.

The Real Gap

Benchmarks measure models. Literacy measures users.

The gap between the best AI model and the tenth-best is shrinking every quarter. The gap between someone who knows how to use AI and someone who doesn't compounds over time.

Next time you see a headline about which AI model scored highest on some test, ask a different question: does the person using it know what they're looking at?

Want the full research with citations? Read the technical version.