How Reformed Theology Challenges AI

ChatGPT placed John Calvin among "20th century theologians." Calvin died in 1564.

That's not a hallucination. It's a category error—and it reveals something deeper about how AI processes Reformed theology.

The Difference Isn't Difficulty

Reformed theology isn't harder than other traditions. It's differently structured.

The Westminster Confession isn't a list of beliefs you can extract in isolation. It's a system where categories depend on other categories. Federal headship requires covenant theology. Limited atonement presupposes total depravity. Each doctrine interlocks with every other.

AI's architecture—pattern matching across flattened training data—collapses what Reformed theology holds apart.

Sola Scriptura Is Structurally Impossible

Here's the first problem no prompt engineering can fix: AI cannot privilege Scripture.

Large language models weight by corpus frequency. When Catholic, liberal Protestant, and Reformed sources all discuss justification, the model creates a weighted average. But Reformed epistemology doesn't average. It requires Scripture to judge tradition, not blend with it.

Important

This isn't a bug to fix. It's structural. An AI operating from sola scriptura would need to weight Scripture differently—but that's a theological judgment AI architecturally cannot make.

The Gentle Reformation notes it plainly: "AI does not operate from sola Scriptura but blends sources without discernment."

To operate from sola scriptura, AI would need to do what it fundamentally cannot: treat some sources as authoritative judges of other sources. Corpus frequency doesn't discriminate between Scripture and commentary on Scripture.

Federal Headship Collapses

Ask an AI about original sin and you'll get something like: "We're all affected by Adam's sin."

That's true but theologically hollow. Reformed theology doesn't say we're affected by Adam. It says Adam's guilt was judicially imputed to us by covenant arrangement. There's a legal, federal structure—Adam as covenant head, his guilt reckoned to his descendants.

This is where AI most visibly breaks. "Judicially imputed guilt" becomes "we're all affected." The legal structure vanishes. The covenant center—the entire point—disappears.

Reformed Theological Seminary identified this pattern: "ChatGPT cannot represent the specificity of Reformed soteriology and would likely reflect Arminianism."

The TULIP Pattern

The same distortion appears across all five points of Calvinism:

Doctrine	Reformed Position	AI Substitution
Total Depravity	Humans utterly enslaved, unable to contribute to salvation	Humans flawed but capable of moral growth
Unconditional Election	God chooses without regard to foreseen faith or merit	God chooses based on foreseen human response
Limited Atonement	Christ died specifically for the elect	Christ died for all; atonement available to all
Irresistible Grace	God's effectual call cannot ultimately be resisted	Grace enables but doesn't determine human choice
Perseverance	God's preservation guarantees believers' final salvation	Salvation conditional on continued human faith

Notice the pattern. Every substitution moves in the same direction: more human agency, less divine sovereignty. More conditions, fewer unconditionals. More "enables," fewer "determines."

This pattern is consistent with systematic bias rather than random error.

The Pattern Named

AI doesn't merely get TULIP wrong. It systematically distorts Reformed soteriology toward anthropocentrism, conditionality, and theological pluralism.

Chad Seales at UT Austin identified this: "AI reproduces Protestant theological problems, particularly around predestination."

The root cause is structural. AI training data is saturated with post-Enlightenment theological assumptions—human autonomy as baseline, divine sovereignty as problem to solve. When moderate evangelical content vastly outweighs confessional Reformed content in the training corpus, the model's center shifts.

Our early testing suggests AI may struggle with covenant-related questions more than other doctrines. The substance/administration distinction, the circumcision-to-baptism continuity, why Abraham's covenant and the new covenant are the same covenant of grace—these systematic connections prove difficult for pattern-matching architectures.

Prompt Dependency

Researcher Anthony Delgado documented something troubling: the same theological question produces different answers depending on prompt wording.

Ask "Is election unconditional?" and you get a Reformed-leaning answer. Ask "Can people choose salvation?" and you get an Arminian-leaning answer. Ask "How do predestination and free will relate?" and you get a synthesized middle that belongs to no tradition.

Same theology. Different phrasing. Different conclusions.

AI mirrors expectations rather than analyzing theological substance. Even Reformed scholars asking about their own tradition get softened output.

The Stakes

Seminary students using AI won't develop confessional fluency. They'll learn to sound Reformed without thinking Reformed.

Warning

Confessional subscription means nothing if you don't understand what you're subscribing to.

This creates a pre-catechesis problem. Congregants arrive with AI-shaped assumptions that look Reformed on the surface but subtly undermine every distinctive. They've absorbed the vocabulary without the framework.

"Effectual calling" becomes "God's invitation." "Federal headship" becomes "Adam represented humanity." The words survive; the meaning drains away.

The Distinctions That Matter

Reformed theology exists because distinctions matter.

Justification ≠ sanctification. Imputed guilt ≠ inherited tendency. Unconditional ≠ foreseen faith. Federal headship ≠ "we're affected." Effectual calling ≠ "God's invitation."

In our testing, AI tends to collapse every one.

The Reformation was fought over these distinctions. People died for them. They're not pedantic theological hair-splitting—they're the load-bearing walls of how Reformed Christians understand salvation, human nature, and God's sovereignty.

Our preliminary results suggest AI may be slowly, quietly erasing them—one helpful paraphrase at a time.

This is what FaithBench aims to measure. The question isn't whether AI can discuss Reformed theology. It's whether it can discuss it as Reformed—with the precision, the systematic connections, and the confessional commitments that define the tradition.

The distinctions that cost lives to establish may be at risk of being averaged away—and most churches may not be aware it's happening.

Note

Note on preliminary data: FaithBench scores and observations cited in this post are from v1.0, which uses a single LLM judge (Gemini 3 Flash) without human inter-rater reliability validation. Scores should be treated as provisional automated assessments. See our methodology for full details on current limitations.

References

Gloo. (2025, December 15). Gloo unveils the first benchmark exposing how AI misses Christian worldview and values [Press release]. https://gloo.com/press/releases/gloo-unveils-the-first-benchmark-exposing-how-ai-misses-christian-worldview-and-values

Muller, R. A. (2003). Post-Reformation Reformed dogmatics (2nd ed.). Baker Academic.

Tao, Y., et al. (2024). Cultural alignment of large language models. arXiv preprint. https://arxiv.org/abs/2402.13231

Westminster Assembly. (1646). The Westminster Confession of Faith. Available at reformed.org.