next up previous
Next: Learning Classification Functions Up: Detecting Objects in 3D Previous: Rendering Images from Scan

Feature Detection using Integral Images

There are many motivations for using features rather than pixels directly. For mobile robots, a critical motivation is that feature-based systems operate much faster than pixel-based systems [25]. The features used here have the same structure as the Haar basis functions, i.e., step functions introduced by Alfred Haar to define wavelets [8]. They are also used in [12,15,25]. Fig. [*] (left) shows the eleven basis features, i.e., edge, line, diagonal and center surround features. The base resolution of the object detector is for instance $ 30 \times 30$ pixels, thus, the set of possible features in this area is very large (642592 features, see [12] for calculation details). In contrast to the Haar basis functions, the set of rectangle features is not minimal. A single feature is effectively computed on input images using integral images [25], also known as summed area tables [12]. An integral image $ I$ is an intermediate representation for the image and contains the sum of gray scale pixel values of image $ N$ with height $ y$ and width $ x$, i.e.,

$\displaystyle I(x,y) = \sum_{x'=0}^{x}\sum_{y'=0}^{y} N(x',y').$      

The integral image is computed recursively, by the formulas: $ I(x,y) = I(x,y-1) + I(x-1,y) + N(x,y) - I(x-1,y-1)$ with $ I(-1,y) =
I(x,-1) = I(-1,-1) = 0$, therefore requiring only one scan over the input data. This intermediate representation $ I(x,y)$ allows the computation of a rectangle feature value at $ (x,y)$ with height and width $ (h,w)$ using four references (see Fig. [*] (right)):
$\displaystyle F(x,y,h,w)$ $\displaystyle =$ $\displaystyle I(x,y) + I(x+w,y+h) - $  
$\displaystyle end{tex2html_deferred}$      
$\displaystyle end{tex2html_deferred}$   $\displaystyle I(x,y+h) - I(x+w,y).$  

For computing the rotated features, Lienhart et. al. introduced rotated summed area tables that contain the sum of the pixels of the rectangle rotated by 45$ ^\circ$ with the bottom-most corner at $ (x,y)$ and extending till the boundaries of the image (see Fig. [*] (bottom left)) [12]:
$\displaystyle I_r(x,y) = \sum_{x'=0}^{x}\sum_{y'=0}^{x - \vert x'-y\vert} N(x',y').$      

The rotated integral image $ I_r$ is computed recursively, i.e., $ I_r(x,y) = I_r(x-1, y-1) + I_r(x+1, y-1) + -I_r(x, y-1) + N(x,y) +
N(x, y-1)$ using the start values $ I_r(-1,y) = I_r(x, -1) = I_r(x,-2)
= I_r(-1, -1) = I_r(-1, -2) = 0$. Four table lookups are required to compute the pixel sum of any rotated rectangle with the formula:
$\displaystyle F_r(x,y,h,w)$ $\displaystyle =$ $\displaystyle I_r(x+w-h, y+w+h-1) + $  
$\displaystyle end{tex2html_deferred}$      
$\displaystyle end{tex2html_deferred}$   $\displaystyle I_r(x, y-1) - I_r(x-h,y+h-1) - $  
$\displaystyle end{tex2html_deferred}$      
$\displaystyle end{tex2html_deferred}$   $\displaystyle I_r(x+w,y+w-1).$  

Since the features are compositions of rectangles, they are computed with several lookups and subtractions weighted with the area of the black and white rectangles.

Figure: Top left: Edge, line, diagonal and center surround features are used for classification. Top right: Computation of feature values $ F$ in the shaded region is based on the four upper rectangles. Bottom left: Calculation of the rotated integral image $ I_r$. Bottom right: Four lookups in the rotated integral image are required to compute the feature value of a rotated feature $ F_r$.
\includegraphics[width=36mm]{features}    \includegraphics[width=43mm]{IntegralImage}





\includegraphics[width=37mm]{rsat_table}


\includegraphics[width=48.5mm]{IntegralImage_rot}

To detect a feature, a threshold is required. This threshold is automatically determined during a fitting process, such that a minimal number of examples are misclassified. Furthermore, the return values $ (\alpha, \beta)$ of the feature are determined, such that the error on the examples is minimized. The examples are given in a set of images that are classified as positive or negative samples. The set is also used in the learning phase that is briefly described next.


next up previous
Next: Learning Classification Functions Up: Detecting Objects in 3D Previous: Rendering Images from Scan
root 2005-05-03