Appendix C — Practical Verifier Design Checklist
C.1 Purpose
This appendix should read like a field manual that can be pasted into internal docs or shared directly in lab discussions.
C.2 Core Terms
- Verifiable reward: A reward signal derived from an outcome, execution, proof state, trace, or other artifact that can be checked with reasonable reliability.
- Verifier: The mechanism that performs that check, whether symbolic, executable, formal, learned, or hybrid.
- Reward signal: The scalar or graded feedback that learning actually sees after verification.
- Interface: The part of the task exposed to checking, such as a final answer, a program, a proof state, a citation set, or an environment trajectory.
- Outcome verifier: A checker that scores a completed solution rather than its intermediate steps.
- Process verifier: A checker that scores intermediate reasoning, subgoals, or partial traces.
- Programmatic verifier: A rule-based checker implemented through deterministic logic, execution, or formal constraints.
- Learned verifier: A model-based judge that predicts correctness, quality, or consistency.
- Verifier stack: A layered pipeline that combines multiple checks before producing a reward or decision.
- Signal quality: How informative, stable, and hard to game the reward is for the capability of interest.
- Faithfulness: The extent to which an externalized explanation tracks the causal basis of the model’s answer.
- Calibration: The relationship between expressed confidence and actual correctness.
C.3 Checklist
- What exact object is being verified?
- What evidence does the verifier actually consume?
- Which important properties remain off-screen?
- Where are the obvious attack surfaces?
- Which failures are silent rather than visible?
- How will robustness be audited before large-scale optimization?
- What deployment constraints shape the acceptable verifier stack?