Tracking Colors with Ikaros
Introduction
In many cases, the color of an object is its most characteristic property and this suggests that it should be possible to find an object in a visual scene based on its color. In principle, this could be done by searching for all pixels in an image with a particular range of RGB values, but in practice, this does not work very well. There are a number of reasons for this, but the most important is that the RGB representation of color mixes intensity and color in all the three channels.
For example, if the illumination increases, the values in all three channels will increase although perceptually, the color would not have changed. To allow for such changes, the range of RGB values have to be very large and this would include colors that are not necessarily of the target color.
The second problem is that the color of the illumination may change over the day, which will have different effects on the different RGB channels. This is in general a very hard problem to solve and requires color constancy. However, it is possible to get by with simpler heuristics in many cases.
Here, we describe a fast algorithm that can compensate for these variations and find pixels in an image with a particular color. The algorithm has been implemented using a number of Ikaros modules and the processing is done in a number of steps: First, it is useful to map the colors from RGB space to a more useful representation. Here we will map the colors onto the rg-chromaticity plane where the each color is coded as a hue and saturation, but the intensity is discarded.
Second, it is often necessary to compensate for variations in illumination. This can be done by approximating the location of the white point in the color space and relating all colors to this point. Here, we assume that the average color of the scene is gray and use this as a reference.
Third, we need to find the pixels in the image that have the desired color. This is done by first testing if the color of each pixel lays in the target color region.
Fourth, the different objects with the target color are found using a clustering method which groups together pixels with the target color that lies close to each other in the image.
Finally, the clusters can be sorted according to size or matched to the positions of the clusters in the previous image to track objects that are moving.
Color Transformations
The first step in the color tracking algorithm is a transformation of the pixel coordinates in the RGB image to the rg-chromaticity plane (Fig. 1). We will assume that the RGB values are in the range [0..1]. This transformation is obtained by dividing the R and G components of the pixel coordinates with the sum of the three color channels.
r = R/(R+G+B)
g = G/(R+G+B)
The two dimensional rg-plane contains the colors of the original RGB space, but the intensity of each pixel has been discarded. The intensity of the color is calculated as
I = R+G+B.
Note that this is not the best way to estimate perceived intensity since the three colors contribute differently to the perception of intensity, but this approximation is sufficient for our present purposes.
Figure 1 The transformation from RGB-space to the rg-chromaticity plane. A color is defined as a segment of a circle
centered at the white point with the least saturated part removed. The colorregion is specified with the four parameters φ, ω,
and smin and smax.
The above transformation maps red, green and blue onto the three corners of a triangle. Gray and white are both mapped to the same point at <1/3, 1/3>. This is called the white point of the color triangle. Note that the transformation of black in the RGB-space, i. e. <0, 0, 0>, is not defined (but may be mapped onto the white point).
It is possible to construct a color circle around the white point where the angle describes the hue and the distance from the center depends on the saturation of the color. Because of the shape of the color space, the same distance from the white point does not correspond to the same saturation in each direction.
This is one of the transformations availablein the Ikaros module ColorTransform.
Calculation of white point
The white point of the image can be estimated from the image data if we assume that the average color of the image is gray. The average R', G' and B' of each of the three color channels are calculated and projected onto the rg-plane:
r' = R'/(R'+G'+B')
g' = G'/(R'+G'+B')
the white point is set to
w = <r', g'>
Assuming that the illumination is equal in the red, green and blue channels, the white point can be directly set to <1/3, 1/3>.
If the illumination varies over the image it can be advantageous to use different white points for different pars of the image, but in this case it is important that the average color of the different areas is the same. Otherwise, the same color in the different regions will not be mapped to the same point.
Color coordinates
Each color in the rg-plane is mapped to two values. The the angle around the white point
φ = arctan (r-r')/(g-g')
and the saturation
s = hypot(r-r', g-g'),
that is, the distance from the white point (See Fig. 1). This results in the color coordinate
p = < φ, s>.
Figure 2 shows the result of mapping all the pixels of images onto the < φ, s> space. The differently colored object are clearly seen as dots within different regions of the color space.
Figure 2 The algorithm applied to three different scenes. Top row: Six robots with colored markers. Middle row: A simple
scene with dots of different colors. Bottom row: An image of a face. Left. the input image. Middle. The rg-colorspace with the
locations of the color coordinates of the pixels in the image and the target color region highlighted. Right. The detected pixels
in the scene.
Classifying Colors
To find a region of the image with a particular color, we first need to classify the pixels. We define a target color region using four values, the angle φ, and the minimum and maximum saturation smin and smax, and finally the width of the region w. We want to find all pixels, the color or which lies within this region. We can do this directly from the <φ, s> representation by testing
smin < ssmax
and
φ-ω < φ < φ+ω.
However, this method is very time consuming since it involves calculating a square root and arctan for each pixel in the image. There is also a division, that should be avoided if possible. Fortunately, this test can be made much faster by getting rid of all these operations.
A Faster Inclusion Test
To make the color test faster, we map the problem onto a vector representation. Assuming that the color angle for the region is given in degrees, the target color vector c = <cr, cg> for a region is given by
cr = sin (2πφ/360),
cg = cos (2πφ/360).
The width of the region is represented by the value $w$ which is calculated as,
w2 = cos2 (2πω/360).
The above calculations only need to be done once for each target region. To check whether the color of a pixel lays within the target region, we first calculate the square of the length of the color vector <r, g> as
L2 = r2+g2,
and the scalar product of the color and the target color
m = cr r + cg g,
The following conditions are then tested
smin2 < L2 < smax2,
m > 0.
and
m2 > w2 L2.
The color of the pixel lays within the target region if both these conditions are met. Note that this formulation of the condition completely avoids any computationally slow operations such as divisions, square roots or trigonometric functions, except for the three initial calculations of cr, cg and w2, but these are only made once and not for every single pixel.
This algorithm is implemented in the Ikaros module ColorClassifier.
Color Clustering
The second step in finding potential targets is to cluster the points that fulfills the requirement above. A cluster Ci is defined by the set of coordinates of all its included pixels. The center of the cluster ci = <cx, cy> is calculated as the average position of all pixels that have already been included in the cluster. To find the different clusters in the image, we iterate over all pixels that fulfills the requirements above.
Here, we use a simple algorithm that works well when the number of targets are relatively sparse. For each pixels <x, y> with the target color we test whether there is a cluster center within the distance d from it. In that case, the new pixel is added to the cluster. Otherwise, a new cluster is created with this pixel as its center. This procedure continues until all pixels have been assigned to a cluster.
This method works well if the distance d is set to match the approximate size of the targets.
This algorithm is implemented in the Ikaros module SpatialClustering.
Post-Processing
There are a number of methods that can be used after clustering to produce a more stable output. These techniques can be used together but my also hinder each other in some cases.
Noise Reduction After clustering, all clusters with less than N pixels can be discarded as noise. This is useful in many cases but may have the unwanted consequence that a cluster that is temporarily smaller may disappear.
Size Selection If the number of targets are known, for example if we are tracking a fixed number of robots, it is often a good idea to select the $k$ largest clusters and discard any smaller ones.
Tracking If the target objects are known to move continuously in the scene, it is useful to assign the same index for the cluster belonging to the same target at each time step. This can be accomplished either by finding the closest cluster from the previous time step after clustering, or by using the centers of the previous clustering as the starting point for the clusters in the current time step.
We use the former method as the latter is very sensitive to noise. When the new set of clusters have been found, they are matched to the clusters found on the previous time step. This is controlled by a parameter that decides how far a cluster can move and still be matched. Clusters that are not assign a new location are marked as unused and may be reassigned at a later stage.
Although this method is far from perfect and much more advanced tracking methods exist, it works well in situations where the target object are not moving very fast in the image and targets are well separated. In more complicated cases, it is necessary to use a more able tracking system after this post-processing.
Implementation Details
The algorithm described above has been implemented as a set of module in Ikaros. The module ColorTransform transforms the input image from RGB-space to rgI-space, that is, the rg-chromaticity plane together with an intensity channel. The result is sent to the ColorClassifier module which performs the color classification and sends its result to the module SpatialClustering which finds the clusters in the image. This sets up a processing pipeline from the image to the table with the found clusters. The parameters of the modules can be set as follows:
<module class = "ColorTransform" name = "ColorTransform" transform = "RGB->rgI" /> <module class = "ColorClassifier" name = "ColorClassifier" color = "145 degrees" width = "30 degrees" saturation_min = "0.05" saturation_max = "0.35" compensation = "yes" statistics = "yes" /> <module class = "SpatialClustering" name = "SpatialClustering" cluster_radius = "30" min_cluster_size = "50" tracking_distance = "0.1" no_of_clusters = "6" />
Most of the parameters are self explanatory. The compensation parameter decides whether the module should try to compensate for the color of the illumination using the gray-world assumption described above.
The output from the module is a table of coordinates for the found clusters. For debugging purposes, there are also two additional outputs. The first one shows the locations of the input pixels in the color space. The second output shows the spatial location of those pixels that pass the color test (See Fig. 2). These outputs can be turned on or off using the parameter statistics. Full documentation of the module together with the source code can be found at the Ikaros web site.
The performance of the algorithm is reasonably good. We tested an implementation with very little optimization excecpt of the ones described above. On a 2.66 GHz Dual-Core Xeon computer, an image of size 704x480 pixels with the scene shown in Fig. 2 is processed in on the average 7.3 ms. This is fast enough for real-time tracking in a DV-stream.
The functions descibes above were implemented using the following Ikaros modules that will become available in Ikaros version 1.1.
Module | Function |
---|---|
ColorTransform | Converts from RGB to rgI-space |
ColorClassifier | Find the colors in the image. |
SpatialClustering | Find the clusters in the image. |
This work was supported by the EU project MindRaces, FP6-511931.
blog comments powered by Disqus