The first rigorous Christian AI benchmark

AI models fail at theology. We measure how badly.

Most AI benchmarks treat Christianity as a monolith. FaithBench tests what matters: Reformed precision, Catholic nuance, Orthodox depth—and everything in between.

300+Test Cases

6Dimensions

5Traditions

View Leaderboard Read Methodology

Full transparency: Sign in to see how models scored on 50% of our test cases—prompts, responses, and judge reasoning included.

Other AI Benchmarks

generic “religion” questions

No tradition specificity. Buddhism, Christianity, Islam treated as interchangeable test topics.

FaithBench

300+

theological test cases

7 Christian traditions:

CatholicProtestantOrthodoxReformedPentecostalEvangelicalProgressive

Generic benchmarks lump all religions together. FaithBench measures what matters: can an AI distinguish Reformed soteriology from Catholic? Pentecostal pneumatology from Orthodox?

Anti-gaming design: 50% of test cases are public for transparency. 50% are held out to prevent benchmark overfitting.

Which Models Understand Faith Best?

Ranked by performance across theology, doctrine, and biblical interpretation.

Loading leaderboard...

Evaluation Dimensions

Six dimensions that measure theological understanding beyond surface pattern matching.

Textual Analysis

Evaluates understanding of the biblical text itself—original languages, manuscript traditions, and translation nuances. Tests whether models can accurately handle Greek and Hebrew terms, textual variants, and the relationship between source texts and modern translations.

Hermeneutics

Measures competence in biblical interpretation methodology. Can the model distinguish between literal, allegorical, typological, and redemptive-historical approaches? Does it recognize how different interpretive frameworks yield different theological conclusions from the same passage?

Apologetics

Tests logical rigor in theological argumentation. Evaluates whether models can construct valid arguments, engage charitably with objections, and distinguish between philosophical, evidential, and presuppositional approaches to defending Christian claims.

Doctrine

Assesses precision in articulating systematic theology across traditions. Can the model accurately represent Reformed, Catholic, Orthodox, and Evangelical positions on contested doctrines? Does it understand where traditions agree and where they diverge?

Intertextuality

Evaluates recognition of biblical cross-references, typology, and thematic connections. Tests whether models can identify how New Testament authors interpret Old Testament texts, recognize prophetic fulfillment patterns, and trace theological themes across the canon.

Church History

Measures knowledge of how Christian doctrine developed through councils, controversies, and confessions. Tests understanding of patristic sources, Reformation debates, and how historical context shaped theological formulation across different eras and traditions.

Full Transparency

See exactly how models perform—not just scores, but the actual prompts, responses, and judge reasoning.

👁️

What You Can See

• Full test prompts
• Complete model responses
• LLM judge scores by dimension
• Judge reasoning for each score

⚖️

Why 50/50 Split

• Public tests: full transparency
• Held-out tests: prevent gaming
• Maintains benchmark validity
• Industry best practice

📋

Audit Trail

• Every response recorded
• Timestamped results
• Reproducible evaluations
• Version-controlled tests

Free account required to view detailed test results

Research & Insights

Deep dives into theological AI evaluation, benchmark methodology, and what the results reveal.

View all posts

orthodoxyeastern-christianityFeb 1, 2026

Why Orthodox Distinctives Trip Up LLMs

AI generates icons with gibberish text and missing fingers. It collapses theosis into New Age pantheism. The failures aren't random—they reveal a structural bias toward Western theological categories.

CatholicProtestantJan 30, 2026

Catholic vs Protestant: Where Models Diverge

A Catholic AI trained on 23,000 Church documents still can't tell settled doctrine from open questions. Protestant bias isn't accidental. It's architectural.

ReformedtheologyJan 30, 2026

How Reformed Theology Challenges AI

ChatGPT placed Calvin in the 20th century. That's not the real problem. AI systematically distorts Reformed soteriology in one direction—toward human agency, away from divine sovereignty.

View all posts

Partner with FaithBench

We're building the benchmark infrastructure Christian AI needs. Academic institutions, seminaries, and AI labs are invited to contribute test cases, validate methodology, or sponsor research.

Seeking Theology Experts

Help validate AI responses as part of our human-in-the-loop judging program. We need professors, pastors, and researchers across all Christian traditions.

Get in Touch Read Methodology

FaithBench is an open research initiative. All benchmark data and methodology will be publicly available.