The first rigorous Christian AI benchmark

AI models fail at theology. We measure how badly.

Most AI benchmarks treat Christianity as a monolith. FaithBench tests what matters: Reformed precision, Catholic nuance, Orthodox depth—and everything in between.

300+Test Cases
6Dimensions
5Traditions

Full transparency: Sign in to see how models scored on 50% of our test cases—prompts, responses, and judge reasoning included.

Other AI Benchmarks

~7

generic “religion” questions

No tradition specificity. Buddhism, Christianity, Islam treated as interchangeable test topics.

FaithBench

300+

theological test cases

7 Christian traditions:

CatholicProtestantOrthodoxReformedPentecostalEvangelicalProgressive

Generic benchmarks lump all religions together. FaithBench measures what matters: can an AI distinguish Reformed soteriology from Catholic? Pentecostal pneumatology from Orthodox?

Anti-gaming design: 50% of test cases are public for transparency. 50% are held out to prevent benchmark overfitting.

Which Models Understand Faith Best?

Ranked by performance across theology, doctrine, and biblical interpretation.

Loading leaderboard...

Evaluation Dimensions

Six dimensions that measure theological understanding beyond surface pattern matching.

α

Textual Analysis

Evaluates understanding of the biblical text itself—original languages, manuscript traditions, and translation nuances. Tests whether models can accurately handle Greek and Hebrew terms, textual variants, and the relationship between source texts and modern translations.

Hermeneutics

Measures competence in biblical interpretation methodology. Can the model distinguish between literal, allegorical, typological, and redemptive-historical approaches? Does it recognize how different interpretive frameworks yield different theological conclusions from the same passage?

P

Apologetics

Tests logical rigor in theological argumentation. Evaluates whether models can construct valid arguments, engage charitably with objections, and distinguish between philosophical, evidential, and presuppositional approaches to defending Christian claims.

Doctrine

Assesses precision in articulating systematic theology across traditions. Can the model accurately represent Reformed, Catholic, Orthodox, and Evangelical positions on contested doctrines? Does it understand where traditions agree and where they diverge?

Intertextuality

Evaluates recognition of biblical cross-references, typology, and thematic connections. Tests whether models can identify how New Testament authors interpret Old Testament texts, recognize prophetic fulfillment patterns, and trace theological themes across the canon.

Church History

Measures knowledge of how Christian doctrine developed through councils, controversies, and confessions. Tests understanding of patristic sources, Reformation debates, and how historical context shaped theological formulation across different eras and traditions.

Full Transparency

See exactly how models perform—not just scores, but the actual prompts, responses, and judge reasoning.

👁️

What You Can See

  • • Full test prompts
  • • Complete model responses
  • • LLM judge scores by dimension
  • • Judge reasoning for each score
⚖️

Why 50/50 Split

  • • Public tests: full transparency
  • • Held-out tests: prevent gaming
  • • Maintains benchmark validity
  • • Industry best practice
📋

Audit Trail

  • • Every response recorded
  • • Timestamped results
  • • Reproducible evaluations
  • • Version-controlled tests
Sign Up to Explore Results

Free account required to view detailed test results

Partner with FaithBench

We're building the benchmark infrastructure Christian AI needs. Academic institutions, seminaries, and AI labs are invited to contribute test cases, validate methodology, or sponsor research.

Seeking Theology Experts

Help validate AI responses as part of our human-in-the-loop judging program. We need professors, pastors, and researchers across all Christian traditions.

FaithBench is an open research initiative. All benchmark data and methodology will be publicly available.