How to compile opencv 2.4.3 x86 and x64 using Visual Studio 2012.

Share Button
  1. Download and extract opencv 2.4.3 for windows. (I extracted it to E:/Program Files/opencv)
  2. Download and install CMake for windows.
  3. I’ll show you how to build x86 version as following, but x64 version is almost the same steps. I’ll point out those steps different than x86.
  4. Open CMake and select source code directory as E:/Program Files/opencv and binary directory as E:/Program Files/opencv/build/x86/vc11. You may need to create this directory yourself. If you look into E:/Program Files/opencv/build directory, you will notice there are x86 and x64 which indicate different versions. In those directory, there are pre-builded vc9 and vc10 libs, but there are no vc11 corresponding to Visual Studio 2012, so we have to build it ourselves. At the end, what we need is the same as those in vc9/vc10 directory, which are bin and lib files.
  5. Click “Configure” and select “Visual Studio 11″.
  6. Select whatever you want to build and configure, or leave it by default, and click “Configure” again and then click “Generate”. You will see “Configuring done” and “Generating done” in the log.
  7. You can close CMake now. Go to your build directory, you will see a Visual Studio 2012 project named OpenCV, open it with Visual Studio 2012.
  8.  Wait until it says “Ready” on the lower left corner.
  9. We will build debug libs first, so select “Debug”, later we will build release libs.
  10. Select the solution and click “Build Solution”, and wait until it’s done. There should be no fails.
  11. Now we’ve built the libraries and binaries, let’s install them which means collect them into install directory later we will use. You can skip this step and move them into some directories by yourself. To install, by clicking “Build” on “INSTALL”. And you should not see fails.
  12. Now we’ve done building and installing debug version of opencv, let’s do the same with release version.
  13. Ok! Now we’ve built opencv using Visual Studio 2012 successfully! You can delete everything except install directory. In install directory, you should see at least bin and lib directories. Now let’s test it with hello world program. You need to add binaries into system path in order for windows to pick up those binaries while running program. For me I added “E:Program Filesopencvbuildx86vc11installbin” into my path.
  14. Create a new win console project and paste some opencv hello world programs, for example:
  15. Select “Property Manager” tab at the bottom, and click a property for example “Microsoft.Cpp.Win32.user”.
  16. Add additional include directories to “C/C++” -> “All Options” -> “Additional Include Directories”. In my case, I added “E:Program Filesopencvbuildx86vc11installinclude”, “E:Program Filesopencvbuildx86vc11installincludeopencv” and “E:Program Filesopencvbuildx86vc11installincludeopencv2″.
  17. In “Linker” -> “All Options” -> “Additional Library Directories”, add where your libraries are, in my case it’s “E:Program Filesopencvbuildx86vc11installlib”. In ”Linker” -> “All Options” -> “Additional Dependencies”, add libraries you use. In my case, I added all debug libraries. Libraries end with “d” are debug libraries, without “d” are release libraries.
  18. Build and run your program!

Black board rectification and text reconstruction

Share Button

The goal of this project is to convert board writing to electronic version in order to use on web as open course material . What I have done is a part of this project.

The first task is to detect text on board. I use different ways to extract text, such as canny, SUSAN, simple gradient, and Laplacian edge detectors. Finally I choose to use Laplacian edge detector for that performance of other detectors depend on thresholds. After using Laplacian detector, I simply use zero as fixed threshold to decide it’s text or not. As you know, Laplacian detector is very noisy, on the contrary, a good thing is it preserves text the best compared with other detectors. I use morphological transform to deal with the noise. I tested with white boards and ‘green’ boards as you will see at school, and the results are satisfying.

Another task is to determine the four corners of a board. It is not an easy task as it sounds. I cannot use corner detector for that some corners of a board may be not sharp enough, and there are many corners in an image other than board corners. What I use here is first use canny to detect edges, and then take hough transform to find every possible line in an image. As we will expect, there will be lots of lines in an image. It is very difficult to reject those lines from edges of a door or a desk.
After tried many ways, I decided to assume a board is in the middle of an image, and I divided an image into four quadrant. I assume one corner of a board will be in one quadrant respectively. So the problem is to find a corner most  probably belongs to a board in each quadrant. First of all, I compute every intersection of two lines I found. Let’s take the top left  quadrant as an example. I have many intersections in this quadrant, and if a corner is belong to a board, the gradient along the edges of the board will be very large. So what I did is to sum up the gradient(or more generally, difference orthogonal to the direction of a line) along those two lines which intersect on this point to the right and bottom direction(for the top left quadrant), and find the point with maximum gradient sum as the board corner. And repeat this for other quadrants.
By doing so, I can find robust corners of a board and eliminate those corners not belong to board. As you will see in the result images, I can find the corner even though the board has round corners.

Text detection
Corner detection. Two lines in the
same color indicate along which
lines I got maximum gradient sum.
(The gray lines are discarded by my algorithm)
I can find the right corner
even something blocked the boarder.

3D reconstruction using opencv and matlab

Share Button

I’v worked on 3D face reconstruction project last semester, it gave me some taste about camera calibration and 3D geometry.

There are some major step:
1. Use chessboard to calibrate each camera.
2. Use calibrated cameras to get rotation and translation between these two cameras.
3. Rectify two images to make them as they are taken from two horizontal frontal aligned cameras. (Correspondences are aligned on the horizontal epilines)
4. Find correspondences using epilines,(I use BM method here) and get disparity map.
5. Reproject disparity map into 3D space.
6. Done.
I tested it using my own face. The result highly depends on finding correspondence, which highly depends on calibration errors. So the first thing to make sure is minimize the calibration error, and another is to choose proper parameter to find correspondences.
Here are some results:
Calibrate using chessboard
Disparity map of chessboard
Rectified my face
Disparity map of my face
3D map of my face using color to indicate distance
Origin color 3D face

Google Challenge: Video Classification by Genre

Share Button

Above is a presentation about Google challenge. Its a final project of video processing worked with Meng Wang. We are trying to extract a character graph from a movie and then use this to classify movies.

Copy detection

Share Button

I did this project last semester. I was trying to detect copies of a master piece ‘Monalisa’ among thousands of pictures. I use two different method, one is DCT-based, another is using covariance matrix. It turns out that using covariance matrix is more robust but time consuming. Compression, rotation, resizing can all be detected. Result below is showing that using covariance matrix, we only missed one copy that modified by non-linear luminance change.

Jitter cancelation

Share Button

I tried phase correlation to cancel jitter of a camera. For now I use full-pixel precision. It turns out to be not bad.

Original video

After compensation

Motion detection using Markov Random field

Share Button

Comparing to fixed-threshold hypothesis test, using variable-threshold hypothesis test with MRF is more effective due to the variable-threshold depending on neighbors.

Original video
Variable-threshold with first-order MRF
Variable-threshold with second-order MRF

Although we can use opening and closing to remove stick outs and filling holes, it makes more sense to use MRF because which use neighbors to adjust the threshold.

Connected object labeling

Share Button

This project is about label connected objects and then use moments or orientation to find interesting objects.
First one is to find the two tumors in the picture of the lung (on the right).

Second one is to find out directions of motion of the bats:

Feature Tracking

Share Button

The method to detect an object is to calculate normalized correlation coefficient with a template and pick the largest one (above a threshold) as a true detection. As to motion, I calculate the difference between previous true detection coordinate and the current one, and use threshold to determine it is horizontal or vertical motion.


  • My method of using static template and no pyramid is quite successful to detect my feature. I set the threshold to 0.9, so only strong match will be claimed as a true detection, because prefer low true positive rate than high false positive rate.
  • Then I tried dynamic template. When I get a true detection, I’ll update my template. But the main weakness of this method is that it will drift slowly. The advantage of this method is that it will improve the true positive rate.
  • Then I use pyramids to improve my results. Pyramids are from downsampled version of the original template. The main advantage is that I can detect features that are smaller than the original one. It means I can detect the feature when I move backward.(see results below)
  • Another weakness of my method is that I use grayscale images to calculate the NCC, so that those different color regions with similar grayscale values will be false positive. To get rid of this problem, I try to use CbCr channel to calculate the NCC separately, and if they are all above thresholds, I will claim this region is a true detection. I have not completely finished this part.

Face and Motion Detection

Share Button

I use simple thresholding to do face and motion detection, and the results are acceptable.

  • The color-based method is a simple method to detect face. It is very easy to implement and work very fast. But it is not very accurate to detect face for the reason that those pixels similar to face color will be considered as face no matter what color space are you using. As you can see, the color of the door behind me is quite similar to my face color, so it is considered as face in result images.
  • For motion, I subtract pixel values in each RGB channel from different frames first, and set thresholds of each channel. If the absolute difference of any channel is not below my threshold, I consider it is motion.
  • The results generally reach the goals to detect face and motion. It can detect large portion of the face. There are some false positive or false negative detections, such as some parts of the door in background are considered as face and some parts on my face are considered as non-face. It is difficult to use this algorithm to obtain high true detection and low false detection.
  • The limitation of my algorithm is that it cannot distinguish color which is similar to face and when there is a slight motion, only the outline will be considered as motion which is because those pixels inside the outline are similar.
  • There are several ways to improve my face detection algorithm. First of all, I can use several points on my face as reference color, so I can narrow my threshold to increase accuracy. In addition, using transform to get the best match in the transform space is a good way to get rid of color confusion. Another way is using statistics such as covariance matrix to explore unique features of a face. The features are so unique that there is a small chance to get false detection (either false positive or false negative detection). Also, machine learning is a good approach to increase accuracy of detection.
  • To improve motion detection, Markov random field is an excellent way to detect motion. It depends on neighbors and statistics rather than single pixel value.