This work was developed in conjunction with Carlos Trapiello
During my Master’s degree I was lucky enough to work on several projects which turned out to be both interesting and fun. One of these projects was to develop a simple road lane detector via the Hough transform. When I first read about the Hough transform I was truly amazed by the simplicity and cleverness of the idea. My hope is that, after reading this post, we will both agree on that.
Although the final code is available on GitHub it is not a piece of ( warning: MATLAB) code I am proud of. This is why we will mainly focus on the algorithm and not on its implementation.
Jump to the end of the post if you want to see how the algorithm performs.
Before diving into the actual Hough transform and how can it be used to detect lanes, it is worth stopping for a second to think about the general picture and the different parts that will be included in our solution to get the best results we can. Since we already know which method we will be using to detect lanes, we can use this information to design and implement a pre-processing step that takes a raw frame and:
- Prepares it to be fed to the Hough transform.
- Tries to get rid of any irrelevant information and unwanted noise that may affect negatively the performance of the Hough transform.
This will be the first step of our solution. After that, the pre-processed image will be fed to the Hough transform and, hopefully, lanes will be detected. This two-step process (pre-processing + Hough transform) will be repeated iteratively until the end of the video. A part from that, we can expect some sort of relation/continuity between consecutive frames. This two facts can be used in our favor. It enables us to include, in the lane detection calculations for the current frame, information of where lanes were located in past frames. This should improve, at least in theory, the accuracy of our algorithm. Hence, the last step of our lane detection algorithm will be a tracking system that helps in deciding, from all the detected lines, which are the actual road lanes.
In the light of the above, we decided that our lane detector would consist in the three stages already listed:
- Frame pre-processing
- Lane detection via Hough transform
- Lane tracking
The rest of the post provides an explanation of each of this three stages. We will start with the Hough transform and see how it works, how can we use it to detect lanes and which is its output and expected input. Once we know the expected input of the Hough transform, we will move on to the pre-processing stage and see how can we convert a raw frame into something the Hough transform can work with. Last but not least, we will figure out one of many ways in which a tracking system can be implemented to make use of past lane location information.
As with many Computer Vision tasks, the main problem we face for successfully implementing our lane detection algorithm is getting the computer to make sense of the information contained in the image which is relevant to achieve the desired goal. In other words, we should devise a way of extracting from the image those features that give us the information needed to solve the task at hand.
Since road lanes are indeed lines, it makes sense to think that the information we need to extract from the image in this particular case are lines. Once the lines present in our image, as well as their position, have been detected it will be easier to decide which of those lines are the actual road lanes. Hence, the features we have to get from the video frames are lines. We need a method that detects lines in an image.
The Hough transform is based on a rather simple but genius idea. You will probably know that a straight line can be characterized by the equation:
y = m·x+n
Where m and n are the parameters of the straight line, m being the slope and n the y-intercept (the value of y at the point where the line intercepts the y-axis). Given the values of m and n we are able to know all the points that lie in the line. Its as easy as insterting the specific values of x and y in the equation and checking if the equality is satisfied. If it is satisfied, the point is part of the line. If it is not, the point is not part of the line. As simple as that. What we just said is that any line can be characterized with this two parameters, m and n.
The Hough transform uses this basic idea to detect lines. The basic procedure of the algorithm (which can be further optimised) is as follows: Given a set of interest points from an image, it takes two at a time and assumes they are part of a straight line. It computes m and n for that match and places a vote for them in the parameter space. This procedure is repeated for all possible matches. After that it checks which pairs of parameters have more than t votes, where t is a threshold value, and returns those that meet the criteria as the detected lines.
A simplified version of the algorithm with a small number of points and a discretised parameter space is ilustrated (poorly) in the image below.
This approach has just one flaw, which is that vertical lines have an infinite slope (m equals inifnity). To solve this, instead of using m and n, parameters are represented in polar coordinates.
The basic idea explained above can be optimised by only checking, for each point, a subset of all its possible matches. Since we are looking for lines, which have a continuous nature, it only makes sense to check matches in a small neighbourhood.
The two images seen below are examples of the output obtained when applying the Hough Transform to two already pre-processed frames of a dash cam video. The white squares mark the pairs of parameters above the defined threshold.
In our case, the input of the Hough Transform will be a black and white image where white pixels are the interest points that will be processed by the algorithm.
As a final note, I just wanted to point out that the Hough transform is not only restricted to detect straight lines. It can be used to detect more general shapes. If interested, you can find more here.
Okay, now we know what the Hough Transform is and how it works. What we should cover now is how we prepare a raw frame so that it can be fed to the algorithm.
When developing a Computer Vision algorithm (with the permission of machine learning), it is normally required to implement a first pre-processing stage that takes a raw video frame or image as input and transforms it in such a way that it becomes a suitable input for the core algorithm. At least from my experience, image pre-processing is not a mechanical task but rather an applied art. Although a series of general criteria exist, much is left to the developer when it comes to finding the optimal pre-processing procedure for the task at hand.
Four our particular case, we have already seen that the Hough Transform will perform at its best when its input is a binarized image that only contains the edges of the figures found in the original image. Our pre-processing step should then aim to produce this kind of image that can be used as the input of the Hough Transform, being the binarization and edge detection the key steps of the process. The pre-processing routine we developed consisted in the 5 steps listed below:
- Grayscaling: The image is transformed from RGB to grayscale space so it is easier to work with.
- Region of Interest (ROI) selection: There are some regions of the image were we can be almost a 100% sure that road lanes will not be found. For this specific application, the top half of the frames will rarely contain road lanes (see Figure 1). This is why a Region Of Interest (ROI) is defined and selected and only this region will be further processed.
- De-noising: A median filter is applied to remove noise and reduce sharpness. This will hopefully improve the results of the edge detector.
- Binarization: The ROI is binarized to a black and white image. The key parameter here is the threshold value that determines if a pixel will be mapped to black or white.
- Edge detection: A Sobel operator is used to detect the edges in the image. There many other methods to perform edge detection, but the Sobel operator is efficient and yields good enough results.
This pre-processing routine applied to each frame is exemplified in the images below.
After the edge detection has been performed the resulting image is ready to be processed with the Hough Transform. After applying the Hough transform we will have a set of lines that have been detected in the frame and we will have to, somehow, decide which of them are the actual road lanes. This where a tracking system becomes handy.
As previously explained, the main idea behind the lane mark tracking system is to help in the decision of selecting which line should be chosen as the line that best represents the lane mark for a given frame k. (The method explained below is just one of many. There are other methods, such as Kalman filtering, which would probably yield best results.)
We implemented a really simple method – with many disadvantages -. Keeping a record of which was the line selected as the lane mark on frame k − 1 and boldly assuming this line is actually the real lane mark. This information is then used to predict where will be the lane mark of frame k. Once the prediction of the lane mark position of frame k is computed then it is checked which is the line among all the obtained candidate lines of frame k that best fits the prediction. The set of candidate lines is the set of lines obtained after performing the Hough transform on the frame.
This line that best ressembles the predicted lane mark is then selected as the actual lane mark of frame k and will be used to predict the lane mark position of frame k + 1. Since, as already stated, we are using the straight lane mark assumption it has been decided to predict the position of the lane mark on frame k with the line that was selected as the lane mark on frame k − 1. Once knowing which is the predicted lane mark of frame k the best fit for that same frame is calculated by
The line that satisfies the above condition will be selected then as the line which best represents the road lane. It has to be pointed out here that, since there are two lane marks to be detected (left and right) and the Hough transforms returns the set of all detected lines in the frame there is a previous step that has to be done before applying the lane tracking algorithm. It consits in dividing the set of lines in two subsets: left lane candidates and right lane candidates. The filtering method that we applied is, for each candidate line
Where the notation used is the same as the one used in the expression that computes the best fit line. Note that, since there is no equal sign in any of the two expressions the previous filtering method automatically discards vertical lines. It is true that this method will yield wrong results if both lanes point in the same direction. However, due to camera perspective this is not a common phenomenon and to keep things simple we skipped handling this special case. Maybe it would have been better to divide lines in left and right candidates based on x,y coordinates. Another drawback of this simple method that only checks for the past frame is that if one estimation is completely off it is hard to drive the system back to correctly estimating road lanes.
There is one last thing we have to address. The the tracking system works with the information provided from past frames but at frame 0 there is no past information. Hence, the tracking system needs some sort of initialization when it is executed for the first time. It basically needs a guess of where the road lines are supposed to be to get things rolling. We did not overthink it too much. We assumed lanes met at the top of the ROI and at the center of the frame. We also assumed that both lanes started at the bottom of the ROI. The left lane, at 1/4 of the frame width and the right lane at 3/4 of the frame width.
By applying the simple steps explained above we can get a simple lane detection and tracking system running. It is by no means perfect and can be improved in many ways, but it is enough to get us started in the topic. Sample results in different situations are presented in the videos below:
- Straight line:
- Turn with all detected lines:
- Tunnel exit (notice how the change in contrast makes the algorithm go crazy):
- Tunnel exit with all detected lines:
To see the performance of the algorithm in real time I decided to test how well it performed on a driving simulator. On the left screen you can see me uncautiously driving and, on the right screen, the performance of the algorithm in real time. I captured the game window and used the frames to feed the algorithm. Enjoy!
My current work is making me have to deal pretty intensively with Autodesk’s Forge. I plan on writting about it in the near future. If you are also interested in the topic let me know!