Neural Concept Verifier
Scaling Prover-Verifier Games to high-dimensional inputs via concept encodings for verifiable, interpretable AI.
Prover-Verifier Games (PVGs) offer a path toward formally verifiable classifiers, but they have not been applied to complex, high-dimensional inputs like images. Concept-based models handle such inputs interpretably but typically rely on low-capacity linear predictors. Neural Concept Verifier (NCV) bridges both worlds.
NCV combines PVGs with minimally supervised concept discovery. A concept encoder first translates raw high-dimensional inputs into structured concept representations. A prover then selects a subset of concepts; a nonlinear verifier classifies using only those concepts. This gives verifiability at the concept level — not just the pixel level — while supporting expressive predictors.
Experiments show NCV outperforms classic concept-based models and pixel-based PVG baselines on high-dimensional, logically complex datasets and mitigates shortcut learning.
Paper:
- Neural Concept Verifier: Scaling Prover-Verifier Games Via Concept Encodings — ICML 2025 Workshop on Actionable Interpretability