Spectral Residual: the intuition

There is a chicken-egg dilemma on object detection and recognition: before an object is IDENTIFIED, it must be DETECTED; while to DETECT an object, we must develop a system to IDENTIFY it.

To crack this problem, I prefer a two-stage architecture proposed by Rensink, and implemented by Walther. During the first stage, "perceptual blobs", or "proto-objects" are detected by plain detection algorithms. Then identification mechanisms work directly on these proto-objects.

This architecture suggests two things:

1.     In order to mimic this early stage visual processing, we may not resort to the information that would be available only in late stages.

2.     The result of the first stage may be crude, in some cases the proto-objects are not actual objects. However, if we expect otherwise, we are asking the detecting system to IDENTIFY - which is obviously unattainable given the computational constrains.

The incentive for Spectral Residual is plain and straightfoward: I am composing an early stage attention model, so this model must be as simple as possible, free of training or hand-lebeling, and most important, no parameter tuning.

Given these extremely tough constrains, I failed to find a common property shared by different targets, but there is one property for backgrounds in spectral domain. By eliminating the homogenous background, the "residual" parts are detected as proto-objects.

Spatial local cues are primary information used in nowadays model. But what makes my spectral residual approach unique, is the SPECTRAL representation. In the field of vision, the spectral representation recieves much less attention than it deserves. As far as I know, the most renowned utilization of the Fourier spectrum is Oliva's Gist of the Scene.

The oblivion of the Fourier spectrum may be due to a long held idea that information behind the amplitude spectrum is only trivial (see Chapter 7 of Computer Vision by Forsyth for reference). But as demonstrated by Spectral Residual, we believe there is gold behind a mess of the Fourier spectrum.

 

5 lines of MATLAB codes to implement Spectral Residual

clear
clc

%% Read image from file
inImg = im2double(rgb2gray(imread('yourImage.jpg')));
inImg = imresize(inImg, [64, 64], 'bilinear');

%% Spectral Residual
myFFT = fft2(inImg);
myLogAmplitude = log(abs(myFFT));
myPhase = angle(myFFT);
mySpectralResidual = myLogAmplitude - imfilter(myLogAmplitude, fspecial('average', 3), 'replicate');
saliencyMap = abs(ifft2(exp(mySpectralResidual + i*myPhase))).^2;

%% After Effect
saliencyMap = imfilter(saliencyMap, fspecial('disk', 3));
saliencyMap = mat2gray(saliencyMap);
imshow(saliencyMap);

 

 

Results on natural images

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree1.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree2.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree3.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree4.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree5.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/multiobj/Tree6.jpg

As demonstrated by these examples, the spectral residual successful detects the suspect proto-objects in images.

Note the efficiency of this method outperforms other spatial-based methods (about 10x to 20x faster).

 

Results on psychological patterns

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/curve.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/density.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/intersection.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/inv-intersection.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/orientation.jpg

curve

density

intersection

inv-intersection

orientation

 

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/closure.jpg

http://www.its.caltech.edu/~xhou/projects/spectralResidual/psycho/curvature2.jpg

 

closure

inv-curvature

Note that the spectral residual fails in detecting odds from the stimuli of the second row. However, it is well-known, certain features, such as closure and inverse-curvature, are too complex for an early visual system. This point has been confirmed by numerous psychological experiments. (J.M.Wolfe's guided search theory for example)

In a word, the failure in closure and inverse-curvature is rather encouraging. They are believed unperceptible by attention of Human Visual System.

 

Misc downloads

The original paper of spectral saliency is available here: Saliency Detection: A Spectral Residual Approach (1.6MB)

The poster presentation I used in CVPR 2007 is available here: CVPR07Poster.pdf (1.7MB)

 

Pictures used in the experiments is availalbe here: CVPR07Supp.rar (11.1MB)