Elastic Template Matching
in Outdoor Environments

Birger Johansson

1. Introduction

During the last decade, the interest in building autonomous mobile robots has been spread world- wide and all kinds of robots from insect robots (Rus, Kotay, 1998) to humanoid (Brooks, et al. 1999) robots have emerged. Even competitions have been arranged with robots that should navigate over a large area completely autonomously. With this great interest in autonomous mobile robots, there different ways to navigate and determine the position of a mobile robot have been proposed. In many of these, the actual experiments have been carried out in a safe environment such as an indoor laboratory or a hallway (Thrun, 1998). When doing experiments indoors, a lot of possible difficulties could be avoided. For example, floors or walls with solid colour and movement in the background caused by wind.

To be able to build truly autonomous mobile robots, maps are used to represent the surrounding for the mobile robot (Thrun & Bucken, 1996). By using these maps, the robots should be able to plan its route and navigate through the environment. The problem of building maps may seem as a basic procedure, but on closer investigation, we find out that it is much more complex.

Maps can be divided into metric and topological, but the distinction between the two approaches is not always clear (Thrun, 2002). A metric map stores the geometrical properties of the environment. A topological map holds only relation between locations in the world and by using these, a fast, but not as precise map can be built.

1.1 The mapping problem

To acquire a map, robots use sensors to collect data from the surrounding environment such as cameras, range finders using sonar, laser and infrared sensors, radar, compasses, GPS or odometry (Livatino, 2003). Each type of sensor has its own range of benefits and drawbacks. For example, light and sound can not penetrate walls. Therefore, to build maps behind walls, a mobile robot must navigate there.

Map building is a complex procedure and five complicated challenges with robot mapping have been proposed (Thrun, 2002). The first one considers measurement error that could come from any sensor. This error can be difficult to manage because it accumulates over time. Consider for example a robot with a small rotation error. At first the problem does not seem to be significant, but after some navigation without taking the rotation error into account, it could be entirely out of phase. This source of error results in complex mapping algorithms, both from a mathematical and from an implementation point of view, to handle measurement errors (Gutmann, Burgard, Fox, Konolige, 1998). Many approaches to dealing with this problem have been proposed during de last decade. To solve the uncertainty of sensor data when making maps, probabilistic mathematical filters have been used. One of the most common is the Kalman filter.

The next thing to complicate map building is that even the most sophisticated sensors cannot determine all the surrounding information, and even if this was the case, the maps would be enormously large and completely intractable. It is necessary to build small maps to obtain a real-time navigation and map building procedure. This implies that maps should be a representation with a small size for fast computing, but yet as authentic to the surrounding as possible.

The third problem is the correspondence problem, also known as the data association problem. This problem consists of determining if sensor measurements taken at different points in time correspond to the same physical place in the world (Thrun, 2002).

The forth problem is that the environment may change over time. A tree grows or change shape during different seasons, for example. These changes are slow, but yet, an autonomous robot must handle this type of problem. There can also be fast changes in the environment caused by people passing by, or a chair that has been moved from it originally learned location. There are hardly any mapping algorithms today that can build maps of this kind of dynamic environments (Thrun, 2002).

The last major problem focuses on that mobile robots must be able to navigate with incomplete maps and that it is necessary to rebuild the map during navigation. This is often referred to as the SLAM problem, which stands for simultaneous localization and mapping. SLAM is a problem that was presented during the eighties and a complete solution has not yet been found. The problem that SLAM presented was “Is it possible for an autonomous vehicle to start in an unknown location in an unknown environment and then to incrementally build a map of this environment while simultaneously using this map to compute vehicle location?” (Durrant-Whyte, 2002). Different types of map building algorithms have been developed to solve the SLAM problem. Many of them uses Kalman filter approaches to handle the uncertainty in the map building. Other ways which are getting more popular is expectation maximation algorithms. These algorithms use likelyhood estimation where the position of the mobile robot is estimated and also the landmarks location (Thrun, Fox and Burgard, 1998). Other approaches uses hybrid solutions that mix the probability algorithms and expectation maximizing algorithms, with object maps that build maps with basic geometric shapes, such as line and squares to represent walls and objects.

The map building methods presented above use sensors like sonar and laser scanners, but in this thesis only a single vision sensors is used. There exist many variants of vision sensors. For example, some cameras are omnidirectiona (Paletta, Frintrop, Hertzberg, 2001) and some are directional. Omnidirectional means that the vision field is 360 degrees like an ordinary mobile antenna that can receive signal from all directional. Some systems use stereo vision (Zingaretti & Carbonaro, 1999, Davidson, Murray, 1999, Murray & Little, 2000), and some a single camera (Balkenius & Kopp, 1997, Neira, Riberio & Tardos, 1997, Trahanias, Valissaris, Garavelos 1997). By using a stereovision system, the length to a certain object can be calculated by knowledge of the distance between the cameras and the angel present at each camera, but with this property comes the cost of additional computer power.

Pictures are computationally demanding to work with. For example, a picture taken with a size of 176 X 120 has 21120 pixels and allocated as an uncompressed greyscale image takes up 21Kb of memory. If a colour picture is taken with the same sizes, the memory requirement is tripled. And if that is not enough, a stereovision system has two images that need to be processed.

A navigating system needs to be run in real-time, and to manage this, the algorithms must be fast. One way to speed up the image processing many times is to use only parts of the image that are interesting for the moment. An attention window that is much smaller than the original image can be used to use a minimum of computer time to reach the goal.

The purpose of this thesis is to test a specific navigation and positioning method, called elastic template matching, in an outdoor environment. Elastic template matching was developed by Balkenius and Kopp of Lund University Cognitive Science. They built a robot named Polucs (Balkenius, & Kopp, 1997), which used elastic template matching for indoor navigation. Experiments with Polucs have shown that this method for navigation and position estimation was fast and robust. The method uses prediction and therefore a 133 MHz Pentium was able to run the whole vision system with an image size of 64*64 (Balkenius & Kopp 1996a, b, 1997). In this thesis similar experiments as those made to Polucs are done with Lucsor III, a new robot built to manage outdoor environment.

1.2 Polucs

Polucs was constructed by Balkenius and Kopp in 1995 and is the second mobile built within the project A Robot with Autonomous Spatial Learning. Polucs was the first mobile robot in the group to use the XT-1 vision architecture. This architecture was an attempt to design a common model of a number of visual behaviours including object tracking, orienting and anticipatory saccades, place and landmark recognition, as well as visual servoing. Polucs was controlled by a stationary computer communicating via serial communication (RS232) through a long cable.

The experiments made with Polucs was made indoor but in a home where the environment is not as sterile as in a laboratory or a hallway. Two different experiments were made to Polucs and in both case the purpose was to see how good Polucs could find a learned template when moved or rotated. In the moving experiment Polucs was first learned a template at a certain location. Then measurements of how good it could find the same template then moving away from the learned localisation, Polucs managed movement of 50 cm in each direction.

In the second experiment Polucs was rotated to see how much rotation the system could handle and still found the learned template. Rotation less than 15% could be done and Polucs would still locate the learned template.

the robot PULUCS
Figure 1. The robot POLUCS. It performed visual behaviours such as object tracking, orienting and anticipatory saccades, place and landmark recognition

1.3 Ikaros

The Ikaros project started in 2001 at Lund University Cognitive Science and builds upon the previous work done at the department on simulators and control systems for robots. The goal is to develop tools that will allow full-scale cognitive architectures for future autonomous robots to be developed based on insights gained from human cognition (Balkenius, 2003).

By using Ikaros for the new mobile, it becomes very configurable, and the hope is that students will be able to test their models of different brain regions and other algorithms on the robot. Ikaros is divided in a number of modules that handles specific parts of the simulation. So far, a number of different modules have been written for Ikaros in cooperation with scientist all over the world. Free download is available both for Macintosh and Linux at the Ikaros webpage (www.ikaros-project.org).

1.4 Elastic template matching

The elastic template matching algorithm has been developed to try to solve a few of the basic problems with visual localization for autonomous mobile robots. Instead of using artificial landmark- symbols (Adorni, 1996), the algorithm uses natural landmarks. These are found in an image using edge filters. With these, the position of the mobile robot can be calculated using triangulation. Other methods can also be used for exact positioning (Suburo, Shigang, 1993), but these are normally much slower.

The elastic template matching method does not require an environment that generates clean motion vectors (Arkin, 1987) or a floor with homogenous texture (Horswill, 1993). Navigation using elastic template matching differs from the SLAM method described above in that it uses two stages. In the first stage, it learns its surrounding and makes templates, which it uses in the second stage for navigation.

The main idea behind elastic template matching is to extract only the important parts of an image. An image is divided to a number of regions which contains features that are as distinct as possible. Each of this features are stored together with its spatial relationship. This is called a template. The distinct features in a template is automatically learned when the template is created. The spatial relationship can in the simplest case be stored as x and y coordinates for each feature.

A feature is defined f = {Fxy} where F is a small region of the image, in this thesis 10x10 pixels. X and y are the coordinate where the feature has been found. A template is a number of features, T = {f0, f1,...fn} in this thesis the maximum features for each template are 16.

The coordinate stored together with each feature is very important when the vision system tries to identify a template in an image. By using the coordinates, the system attempts to locate each feature at the approximately correct position. (Figure 2) Instead of doing a full search of the image, it starts the search where the feature is supposed to be. If it finds the feature, the algorithm does not have to do a full search and saves a large amount of time. However this can only be done if one of the features already has been found. In the case of finding the first feature a complete search must be done. When the first feature is found the following will use the spatial relationship to speed up the system.

an elastic template
Figure 2. Elastic template matching uses the spatial relationship between the features in the image to code a visual template.

1.5 Overview

To test elastic template matching in an outdoor environment, a new robot has been built. This robot is called Lucsor III. In this thesis it will simply be called Lucsor. Lucsor is bigger and more robust than Polucs. For navigation and position estimation, Lucsor uses a camera mounted onboard but it also has odometry data from motor encoders that can be used for efficient navigation. The new robot is described in section 2. All calculation and estimations are made by Ikaros. Related to Lucsor, five new modules have been written which are described in section 3. Finally a number of experiments were done on visual position estimation, the results of which are presented in section 4.

2. Lucsor III

The robot Lucsor was developed to give students a chance to experiment with an outdoor robot (Figure 3). The core of Lucsor is an onboard computer running Linux. This makes it very configurable and hopefully students can develop new models for Ikaros and control the robots’ behaviour with those models.

Lucsor is bigger and heavier than his precursor. With a size of 48x55x100 cm and a weight of 40 kg it is robust and can manage outdoor environments. Two plastic wheels, 20cm in diameter, are mounted on a single axis and are independently powered and controlled, thus providing both drive and steering. This is often referred to as differential steering. Another two passive coaster wheels are used to provide support.

The motors are controlled by two Technosoft IDM240 servo boxes, connected to both the motors and to the computer onboard the robot. The two servo boxes handle acceleration and speed control (Technosoft uses the term Jogging speed). In this mode, the servo boxes adjust the power sent to motor to attempt to always obtain the desired speed. On both motors, encoders are mounted and by using the servo boxes, changes in the positions of the wheels can be sent to the computer.

The motors are two 80W brushless DC servo motors manufactured by Shinano Kenshi Corporation. Between the motors and the wheels is a gearbox with a ratio of 10:1. For power supply, two 12 volts lead batteries are used and a converter from 24/220 volt is connected between the batteries and computer and camera.

The camera, which is an Axis 3120 PZT, is placed on a pole in the front of the robot, and is directed slightly downward at the ground. The camera is connected to the computer via Ethernet. The camera is placed at a height of 100 cm above the ground to get a better picture of the surroundings.

There is also a battery charger placed onboard the robot. It is also possible to plug in a monitor, keyboard and mouse through outlets in the robots. In addition, a power supply plug for the monitor is available to power a monitor when driving the robot outside. Lucsor is able to move very fast, sometimes to fast for its own best and the author's nerves.

the robot LUCSORIII
Figure 3. LUCSOR III on its first trip outside.

3. Ikaros Modules

The following Ikaros modules have been developed for this thesis: standard input module, a motor module, a servo box module, and elastic template matching module and a navigation module. These modules are described in the following subsections.

3.1 Standard input and motor control module

In the first stage of building the robot, temporary modules were made for testing purposes. Tests like driving the robot to see how it act when running by Ikaros and check the communication between servo boxes and the computer. For this, two test modules were created. One module reads the number pads of the keyboard and sends it forward to a motor control module. This module sends the desired speed for each wheel on the robot to the servo box module. With these modules, the robot could be manually controlled.

3.2 Servo box module

For moving and steering, Lucsor uses two brushless motors and two servo boxes. The servo boxes controls acceleration and sets the speed for each of the motors. By setting different speed of each of the motors, the robot will turn in the direction of the motor with the lowest speed. And by setting an opposed direction of the speed to each motor Lucsor will turn on the spot. Together with encoders mounted on motor shafts, the servo boxes can read the position of the shaft. This is used to hold a constant speed in rough terrain, when the servo boxes adjust the power sent to the motor to maintain the desired speed. The communication between the servo boxes and the computer is made via serial communication (RS232) and by Technosoft own Linux library. The servo box module is essentially an Ikaros front-end for these modules and acts as a driver that hides the details of the communication with the servo boxes from the rest of Ikaros.

3.3 Camera module

Lucsor uses an Axis 2130 PTZ Web camera to collect information about its surrounding area. The camera can tilt and pan and are able to take 30 frames/sec. The camera module was created by Balkenius as part of the standard Ikaros modules and has been used for other experiments like target tracking and target prediction. This module uses HTTP (Hypertext Transfer Protocol) to communicate with the camera over Ethernet. By sending a request it can download the latest picture in raw format. Some modifications can be made by this module. For example get a different colour channel or set the size of the picture taken by the camera. This module was used in Lucsor to grab the images used for navigation.

3.4 Elastic template matching module

The elastic template matching module is the core of the navigation system of Lucsor. By using elastic template matching, Lucsor is able to know its position and to navigate in an efficient way. To be efficient, the system needs to take some intelligent steps to minimize the computer power needed to fulfil its goal. Instead of storing each image in its raw format, which would take a significant amount of memory and processor power, it saves the part of the image that seems most important. A more detailed description follows below, but interested are recommended reading Balkenius and Kopp (1997).

Lucsor can be in two different states when using elastic template matching. In the first state, it learns and creates templates of its surrounding environment. These templates are stored and used in the second state when it compares these templates with the image from the camera.

The first state

To learn the landscape surrounding Lucsor, the robot was manual driven around using the test modules. When driving, the elastic template matching module is set for learning and a number of templates are stored. The numbers of template are stored dynamically depending on the amount of distinct feature in the images. If there are many distinct features, more templates will be created and stored. The reason for this dynamic learning is to store more information if the surrounding is messy than if it is very simple.

In Lucsor, the image captured from the camera and the camera module, is sent forward to filter modules. The filter modules are important for efficient template building. With these filters, the navigation system is less sensitive to light condition. For example, if the templates for learning are created and stored on a sunny day and when running the navigation system with these templates another day less sunny the results of the navigation may be unsatisfactory.

processing model
Figure 4. The visual processing used by the system. The image is first processed by a Gaussian edge filter. The outputs from this filter are used both as input to the elastic template matching module and for the Harris corner detection module that selects interest points in the image.

features in the image
Figure 5. The images at the different stages of the visual processing. A. Original image. B. The image filtered with a Gaussian edge filter. C. Output from the Harris corner detection used to select interest points in the image. D. Filtered image with the found features and a cross showing the learned reference location. E. The features and the reference location found in another image.

After passing, the image from the camera, through a Gaussian edge filter, it is sent to a Harris corner detection module (Figure 4). After the edge detection filter, a search in the whole picture for a distinct feature is performed.

This algorithm checks each pixel and its neighborhood pixels in the edge filtered image. The size of the neighbourhood can be specified in the configuration file for the module. The threshold, for how distinct the feature must be, can also be specified in the configuration file. With a low threshold the system find more distinct features but this does not necessarily mean a better result. In the result section a more detailed description is given of how the configuration can be modified to influence the result. The elastic template module tries to extract as many features as possible, but this could also be controlled in configuration to the module. More features give a more reliable result with the drawback of reducing the capacity of image processing in the system. By setting a maximum number of features, the capacity of the system can be maintained high without much reduction of reliability.

When the features have been found its spatial relationship and an estimated centre point of the pattern of feature is calculated and stored with the template (Figure 5D).

The second state

When the first state is completed the robot is switched over to the second state which is the default state when navigate. In this state it uses the templates, it has learnt in the first state, to navigate. This procedure is very similar to the learning state. First it captures a 176 X 120 pixel image from the camera and camera module. This is filtered to find the edges and distinct features. These features are compared with the stored templates and Lucsor uses the best correlated template to determine the pose. By using the spatial relationship the search for a match can be minimized. If a pattern are found in the captured image the estimated position of the pattern is used to calculate in what angel the robot is standing in aspect of the correlated template.

3.5 Navigation module

This module is maybe the most interesting. By manipulating this navigation module performance can be greatly improved. Due to the limit of time available for this project, Lucsor uses a simple implementation which could be improved. Lucsor builds a chain of templates in the order it learns the templates in the first learning phase, and uses this chain to navigate. For example if the robots goal is to go to template 30 and its current position is template 10 it forwards toward 11 and if it finds 12 during this time it change its direction and start moving towards the new template. The navigation module also uses the odometry from the motors as an input. By using this, it can improve the search of templates, by disabling those that are not supposed to be found at that certain point.

full model
Figure 6. A simplified diagram of the connection between the Ikaros modules used by Lucsor for navigation.

4. Experiments

To see the difference between indoor and outdoor environment, similar indoors experiment as done with Polucs was made outdoor with Lucsor. These experiments examine how successful the elastic template matching was when the robot was moved away from the original learning place and how well it handled if the robot was rotated. The result of those experiments showed that elastic template matching was not very sensitive for rotation and change in position. Movement up until 60 cm and rotation with 15 degrees was possible without the correlation with the learned template was under 0.7 (Balkenius & Kopp, 1997). The pattern size used for the experiments is 10 pixels and 16 patterns. For a match 50 % of the features must be found on each template.

An imported thing to point out is that Polucs used a camera with 110 degrees wide angle lens while Lucsor only has 90 degrees. As mentioned earlier similar experiment was used but also additional parts were added. For example experiments to learn how the elastic template matching was effected by different weather condition.

4.1 Experiment 1: Position

In experiment 1, Lucsor was placed in different positions with the camera pointed always in the same direction. At each of those positions pictures were taken, those pictures was correlated with the centre position image (Figure 7). This test was made with Polucs indoors and the purpose was to see how well Elastic template matching can locate the learned template when moving the robot away from the learned location.

a grid
Figure 7. The different position used in the experiments. The bold dot is the position of the reference image when doing the correlation)

Experiments with Polucs show that it could handle movement within the range of 60cm from were it learned the template when using a threshold of 0.7.

By changing the matching threshold the area around the learning location could be adjusted. Experiment with Lucsor showed that with correlation of 0.8 (Figure 8A) the area, with successful matching, was approximately 1 m2 around the learning point. When lowering the threshold the area increased and with a threshold of 0.5 the area was approximately 4 meters in radius. By decrease the threshold the certainty of the matching will also be reduced. A high threshold depends on many templates to be able to know its position during the navigation because of the limited area from the learning localization it can handle. But too many templates will slow the system down and that is not a wanted feature in real-time navigation.

results plot
Figure 8. Recognition of a template with objects far away when the robot moves away from the learned location with different matching thresholds. The size of the recognized region on the ground ranges from a small region of approximately 1 m2 in (a) to a larger are with a radius of approximately 4 m in (d). Matching thresholds: (a) 0.8, (b) 0.7, (c) 0.6, (d) 0.5.

4.2 Experiment 2: Angle

In experiment 2, the same positions as in the previous experiments were used. But at each position a number of pictures were taken with different camera angles. Then, the correlation calculation was made with the straight forward picture (Figure 9).

Also this experiment was made with Polucs. Indoors elastic template matching was able to manage rotation by 10-15 degrees. In the outdoor experiments a rotation of 10 degrees was successful. It is important to notice that the cameras used with Polucs and Lucsor uses different wide angle. To compensate this more templates must be stored to always know its localisation.

results plot
Figure 9. The recognition surface when panning with the camera. As can be seen, the algorithm accurately handles a rotation of 10 degrees to the left or right, but not more.

4.3 Experiment 3: Distance

An experiment was made to get the relation between the distances to where the features are captured in the image. Two different type of experiment was made. The first was with a long distance to environment (Figure 10A). The second was made with nearer distance to environment (Figure 10B). Experiment environments which constantly change was used, people and cycles were passing by.

As shown (Fig. 11) the matching gets a higher correlation when its reference is far away. The reason is perhaps not that difficult to explain. When the reference point is far away, the angle between the robot and the reference point are changed less, during movement.

two photos
Figure 10. A. Experiment with a far distance to reference object. B. Experiment with a near distance to reference object.

results plot
Figure 11. The size of the recognition region when objects are far (A) compared to when they are closer (B).

4.4 Experiment 4: Change of Weather Condition

The same experiment as above was done in a different weather condition. The idea behind this experiment was to se how well the elastic matching template works with change of light condition in an outdoor environment. The first experiment was made at a cloudy weather condition and the second experiment was made a sunnier day a week later. Templates learn from the first day was used in the later experiment to see how successful elastic template matching could match this templates with the new condition.

different wheathers
Figure 12. The same scene different days with different weather conditions. A. Learn image. B. One of the test images.

Even if the condition was very different, during the two experiments, the system still recognises its surrounding sufficiently. This result comes of the use of filters. By using filters a more robust system can be then using the original image (Cassinis, et al. 1996).

results plot
Figure 13. Generalization of a learned template to a different day. A. The graph shows the matching of the template on the training day. B. Although the weather conditions were very different the two days, the template still matches the environment sufficiently well for the place to be recognized.

5. Discussion

Elastic template matching matched even better in an outdoor environment. If the system can use reference points far away, a higher localization certainly will be present.

Elastic template matching works better in a messy environment (Balkenius, 1998). An outdoor environment has few clean contours and many small objects which result in a very messy environment. This suit the matching method cause it could extract more features to use when matching. But to be able to extract the feature a higher resolution of the picture must be used.

The use of natural landmark is something that many other localization methods use (Zingaretti, Carbonaro, 1999, Paletta, Frintrop & Hertzberg, 2001, Neira,Riberio,Tardos, 1997). The advantage is of course that the system would be able to work in a real environment. The landmarks are where the features are found and with this method each of these features has a size of 10x10 pixels. Other methods suggesting sizes from 128x96 pixles (Zingaretti & Carbonaro, 1999) or 200x20 (Paletta, Frintrop & Hertzberg, 2001), with the omni directional camera approach, to 32x32 (Livantino, 2003) and 15x15 (Davidson & Murray, 1999). Using a larger size of features a less amount of features may be used to get a high certainty of the feature, but a large feature also need more computational power. With small features the certainty may be improved by using more features. In this thesis the features size of 10x10pixles is used. Different sizes were tested but this size seems to be good in a speed and certainty perspective.

One other factor affect the features size is the input image resolution. With Polucs an input resolution of 64x64 pixels was used and a feature size of 7x7 pixels. In Lucsors case we raised the resolution to 176x120, to handle the outdoor environment, and the feature size to 10x10.

As Elastic template matching use two phase when navigating and learning. It does not solve the SLAM problem. This is a drawback for the method in an outdoor environment because autonomous mobile robots most continually navigate and update its maps to handle new places. Elastic template matching uses a learning phase and suitable indoor with a limited surrounding. Outdoor the learning needs to be made during navigation because a complete learning of all surrounding is impossible.

An advantage is that elastic template matching and methods using topological maps do not have to know it exact position. This differs from other methods which uses metrics maps (Neira,Riberio & Tardos, 1997, Livantino, 2003, Davidson & Murray, 1999) . With this property the system does not have to use probabilistic mathematical filters to handle the measurements error from the sensors.

Elastic template matching is only capable to manage fewer changes in templates. For example when people stops and stands to close to the robot and cover its sight the matching naturally fails. Or as in one of the experiments, some dancing Norwegians celebrating it national day. A couple of hundred people with colourful costumes and Norwegian flags are moving 2 meters in front of the robot. The result from the experiment was not great.

To make the system more interesting, the learning state could be automatically set when Lucsor does not recognise its surrounding.

Much improvement could be made on the navigate module. For example the camera used on this robot is equipped with motors to pan the camera. By using this, the robot could pan the camera to find the next template in the chain or even template way ahead of the current template. Instead of only use a chain a more sophisticated mapping of the environment could be created. For example it could be able to handle many paths to the same goal and use the most suitable to get to a certain point instead of just follow the path it has learned in the first learning state. Hopefully in a near feature more work would be put in the project to get the robot using this sort of navigating.


Adorni, G., Destri, G., Mardonini, M., Zanichelli, F. (1996). Robot self-localization by means of vision. In Proceedings of the first euromicro workshop on advanced mobile robots. Los Alamitos, CA: IEEE Computer Society Press. 160- 165.

Arkin, R. C. (1987). Motor schema based navigation for a mobile robot: an approach to programming by behaviour. In Proceedings of the 1987 IEEE international conference on robotics and automation. 264-271.

Balkenius, C., and Kopp, L. (1996a). Visual tracking and target selection for mobile robots. In Jörg (ed.). In Proceedings of EUROBOT. IEEE Press.

Balkenius, C., Kopp, L. (1996b). The XT-1 Vision Architecture. In Linde, P., and Sparr, G. (eds) Symposium on image analysis. Lund University. Dept. of Mathematics.

Balkenius, C., Kopp, L. (1997). Elastic template matching as basis for visual landmark recognition and spatial navigation.

Balkenius, C. (1998). Spatial learning with perceptually grounded representations. Robotics and Autonomous Systems. 25. 165-175.

Balkenius, C. (2003). From isolated components to cognitive systems. In ERCIM News. 53.

Brooks, R., Breazeal, C., Marjanovic, M., Scassellati, B., Williamson, M. (1999). The Cog project: Building a humanoid robot, lecture notes in computer science. 1562. 52-87.

Cassinis, R., Grana, D., Rizzi, A. (1996). Using colour information in an omnidirectional perception system for autonomous robot localization. In proceedings of the first euromicro workshop on advanced mobile robots. 172-176. Los Alamitos. CA: IEEE Computer Society Press.

Davidson, A.J., Murray, D.W. (1999). Mobile localization using active vision. In proceedings of European conference on computer vision (ECCV).

Durrant-Whyte, H., Majumderr, S., Thrun, S., de Bastiam, M., Scheding, S. (2001). A Bayesian algorithm for simultaneous localization and map building. In proceedings of the 10th International Symposium of Robotics Research (ISRR '01). Lorne. Australia.

Gutmann, J.-S,, Burgard, W., Fox, D., Konolige, K. (1998). An experimental comparison of localisation methods. In Proceedings of the 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Victoria. Canada.

Horswill, I. (1993). A simple, cheap, and robust visual navigation system. In Mayer, J.-A., Roitblat, H. L., Wilson, S. W. (eds) From animals to animats 2. Cambridge. MA: MIT Press.

Livatino, S. (2003). Acquisition and recognition of natural landmarks for vision-based autonomous robot navigation, Aalborg University

Murray D., Little J. (2000). Using real-time stereo vision for mobile robot navigation, In Autonomous Robots, 8(2). 161-171.

Neira J., Ribeiro M.I., Tardos T.D. (1997). Mobile robot localization and map building using monocular vision. In Symposium on Intelligent Robotics System (SIRS). Stockholm. Sweden.

Paletta L., Frintrop S. (2001). Robust localization using context in omni-directional imaging. In IEEE International Conference on Robotics and automation (ICRA), Seoul, South Korea.

Rus, D., Kotay, K. (1998). Versatility for unknown worlds: Mobile sensors and self- reconfiguring robots. In Zeli, A., editor, Field and Service Robotics.

Suburo, T., Shigang, L. (1993). Memorizing and representing route scenes. In Mayer, J.-A., Roitblat, H.L., Wilson, S. W. (eds) From animals to animats 2. Cambridge. MA: MIT Press.

Thrun, S., Bucken, A. (1996). Integrating grid- based and topological maps for mobile robot navigation. In Proceedings of the AAAI 13th national conference on artificial Intelligence. 2. 944-950.

Thrun, S., Fox, D., Burgard, W. (1998). A probabilistic approach to concurrent mapping and localization for mobile robots, In machine Learning 31. 29-53 and Autonomous Robots 5. 253-271.

Thrun, S. (1998). Learning metric-topological maps for indoor mobile robot navigation. In Artificial Intelligence. 99(1). 21-71.

Thrun, S. (2002). Robotic mapping: A survey. Exploring Artificial Intelligence in the New Millennium

Trahanias, P.E., Valissaris, S., Garavelos T. (1997). Visual landmark extraction and recognition for autonomous robot navigation. In international conference on intelligent robots and system. (IROS). IEEE/RSJ Intl. Conf. on Intell. Robots and Systems. 1036-1042. Atria. Grenoble. France.

Zingaretti, P., Carbonaro, A. (1999). Learning to acquire and select useful landmarks for route following. In 3rd European workshop on advanced system (Eurobot). 160-168.

The author would like to thank Christian Balkenius for the support during this work and for making me realise my interest in robotics and cognitive science. I would also like to thank Lars Kopp for choosing servo boxes and motors suitable for the project, and Jan Morén for some Linux programming support.

blog comments powered by Disqus