UniSpector: Towards Universal
Open-set Defect Recognition via
Spectral-Contrastive Visual Prompting

CVPR 2026

Geonuk Kim Minhoi Kim Kangil Lee Minsu Kim Hyeonseong Jeon Jeonghoon Han Hyoungjoon Lim Junho Yim

LG Energy Solution

TL;DR

We propose UniSpector, a scalable, retraining-free inspection system. By employing a semantically structured prompt topology, it effectively handles open-set defects in continuously evolving industrial environments while offering insights into the design of generic visual prompting.

Abstract

Although industrial inspection systems should be capable of recognizing unprecedented defects, most existing approaches operate under a closed-set assumption, which prevents them from detecting novel anomalies. While visual prompting offers a scalable alternative for industrial inspection, existing methods often suffer from prompt embedding collapse due to high intra-class variance and subtle inter-class differences. To resolve this, we propose UniSpector, which shifts the focus from naive prompt-to-region matching to the principled design of a semantically structured and transferable prompt topology. UniSpector employs the Spatial-Spectral Prompt Encoder to extract orientation-invariant, fine-grained representations; these serve as a solid basis for the Contrastive Prompt Encoder to explicitly regularize the prompt space into a semantically organized angular manifold. Additionally, Prompt-guided Query Selection generates adaptive object queries aligned with the prompt. We introduce Inspect Anything, the first benchmark for visual-prompt-based open-set defect localization, where UniSpector significantly outperforms baselines by at least 19.7% and 15.8% in AP50b and AP50m, respectively. These results show that our method enable a scalable, retraining-free inspection paradigm for continuously evolving industrial environments, while offering critical insights into the design of generic visual prompting.

Motivation

Inspection comparison: supervised detection, anomaly detection, and visual prompting

(a) While existing supervised detectors perform well in closed-set scenarios with fixed defect categories, such assumptions rarely hold in practice, as new defect types continuously emerge and the definition of “normal” may shift over time. (b) Anomaly detection methods do not support specifying a defect of interest; they only flag generic deviations from normality. (c) Visual prompting enables open-set recognition by aligning unseen defects with exemplar prompts, providing a scalable visual inspection framework.

(b) Existing Visual Prompting approaches treat prompt embeddings merely as implicit representations learned via prompt-to-region matching, which leads to prompt embedding collapse due to high intra-class variance and subtle inter-class difference in industrial domain. (a) To overcome the collapse, we shift the focus from simple prompt-to-region matching to the principled design of a semantically structured and transferable prompt topology.

UniSpector

To mitigate prompt embedding collapse inherent in prompt-to-region matching due to high intra-class variance and subtle inter-class difference in industrial domain, UniSpector employs two critical components::

Spatial-Spectral Prompt Encoder — SSPE enhances inter-class separability by integrating fine-grained spectral descriptors to distinguish subtle textures beyond the pixel domain, while suppressing intraclass variance via orientation-invariant radial frequency cues to handle geometric rotations.
Contrastive Prompt Encoding — CPE explicitly regularizes the embedding manifold to establish a semantically organized angular topology, ensuring intra-class compactness and sufficient inter-class separation.

Experiments

Quantitative Results

We introduce Inspect Anything (InsA), the first benchmark for visual-prompt-based open-set defect detection and segmentation under in-domain and cross-domain settings. For the visual grounding category, we evaluate models using a language description of the form “a {defect_name} defect of the {product_name}”.

Methods	In-domain								Cross-domain						Overall
	GC10		MagneticTile		Real-IAD		MVTec AD		3CAD		VISION		VisA		Avg
	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀	AP^b₅₀	AP^m₅₀
visual grounding
GroundingDINO	9.6	-	26.7	-	0.3	-	1.4	-	0.0	-	0.0	-	0.0	-	5.4	-
GroundingDINO†	8.0	-	11.0	-	1.2	-	18.8	-	0.0	-	0.0	-	0.1	-	5.6	-
YOLO-World	7.0	2.1	30.7	28.6	0.3	0.2	2.7	2.4	0.0	0.0	0.3	0.3	1.2	1.2	6.0	5.0
YOLO-World†	3.4	1.5	29.8	27.5	21.7	16.7	5.4	4.7	0.0	0.0	0.0	0.0	1.7	1.6	8.9	7.4
visual prompting
SEEM	-	0.2	-	0.2	-	0.0	-	0.3	-	0.0	-	0.0	-	0.0	-	0.1
SEEM†	-	0.1	-	0.6	-	0.0	-	0.1	-	0.0	-	0.0	-	0.0	-	0.1
SegGPT	-	16.0	-	20.1	-	3.1	-	6.8	-	1.7	-	0.9	-	1.7	-	7.2
SINE	0.7	0.5	2.0	2.0	0.6	0.5	4.6	4.4	0.1	0.2	0.7	0.6	0.9	0.9	1.4	1.3
SINE†	0.9	0.9	1.6	1.9	1.2	0.5	4.5	4.1	0.1	0.1	0.6	0.2	1.0	0.8	1.4	1.2
DINOv	3.2	0.8	30.0	26.9	2.2	1.4	19.0	15.0	4.1	2.1	4.3	3.8	8.4	7.0	10.2	8.1
DINOv†	16.5	16.6	48.4	39.6	21.0	17.5	15.9	15.2	2.9	1.9	4.6	3.8	10.4	8.5	17.1	14.7
T-Rex2†	32.4	33.9	49.0	38.0	25.1	28.8	24.4	22.4	4.3	2.9	5.4	4.3	7.8	6.7	21.2	19.6
YOLOE	1.6	0.4	48.3	45.4	16.6	13.9	26.9	22.7	4.9	2.0	7.2	5.9	14.4	12.2	17.1	14.6
YOLOE†	10.7	9.5	43.3	41.8	17.2	15.5	25.8	23.9	3.3	1.4	3.5	3.0	17.7	15.3	17.4	15.8
UniSpector (Ours)†	38.2	36.9	63.3	57.7	69.1	56.7	53.5	46.5	14.1	10.0	15.3	12.5	32.8	27.8	40.9	35.4

Open-set detection and segmentation performance on the InsA. † denotes models fine-tuned on in-domain seen sets of InsA.

Qualitative Results

Given a user-specified region in the prompt sample, UniSpector successfully identifies corresponding unseen defect instances in the target image, with DINOv included for comparison.

Qualitative prediction comparison on InsA

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting