Computer Vision Fundamentals and OpenCV Overview

Kerem Kargın
8 min readMay 4, 2021

In this blog post, I’ll try to explain the working principle of Computer Vision and, the OpenCV library. Throughout the article I’ll cover:

  • What is Computer Vision?
  • How does Computer Vision Work?
  • Applications of Computer Vision
  • What is OpenCV?
  • A Brief History of OpenCV
  • Quick-Start with OpenCV

Let’s start step by step. Keep working!

Resource: https://medium.com/analytics-vidhya/introduction-to-computer-vision-with-opencv-part-1-3dc948521deb

What is Computer Vision?

Computer Vision is a workplace that allows us to digitally detect images and perform operations on these images. Computer Vision is an artificial intelligence workspace where we can collect information and extract features by accessing the features of images in digital media. In other sources, it has been defined as follows:

Wikipedia:

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do.

IBM:

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand.

The main purpose of Computer Vision is to understand images and interpret them for our purposes. As humans, we can easily perceive moving objects on any street with our eyes. Computers also use many different algorithms to understand this. However, computers may still not give very high accuracy results using these algorithms.

How does Computer Vision Work?

Computers use some algorithms to detect images in digital media. Images in digital media are made up of pixels. Pixels in any image have both color and a coordinate. Imagine that each pixel has its own identity. On its ID, its writes coordinate and color information. This is how computers can detect and identify images.

The coordinates and color information of the pixel are expressed numerically. Numbers are defined according to the RGB format. Since this information in the identity of the image is expressed numerically, computers can understand this. Each image can contain thousands of pixels. These pixels are also kept on the image as a matrix. So if we’re going to operate on a visual, we do it through matrices.

Computer vision works in three basic steps:

1- Acquiring an image

Images, even large sets, can be acquired in real-time through video, photos or 3D technology for analysis.

2- Processing the image

Deep learning models automate much of this process, but the models are often trained by first being fed thousands of labeled or pre-identified images.

3- Understanding the image

The final step is the interpretative step, where an object is identified or classified.

Resource: https://www.weareworldquant.com/en/thought-leadership/understanding-images-computer-vision-in-flux/

Applications of Computer Vision

Examples of the most popular Computer Vision applications:

  • Cancer Detection
  • COVID-19 diagnosis
  • Mask Detection
  • Vehicle Classification
  • Traffic Flow Analysis
  • Parking Occupancy Detection
  • Automated License Plate Recognition
  • Customer Tracking
  • People Counting
  • Social Distance
  • Ball Tracking
  • Goal-Line Technology

What is OpenCV?

OpenCV i.e. Open Source Computer Vision Library. As will be understood, it is an open-source computer vision library. Nowadays, it is used very popularly in the field of Image Processing. You can work on OpenCV with Java, C ++, or Python languages.

By using OpenCV, one can process images and videos to identify objects, faces, or even the handwriting of a human. When it is integrated with various libraries, such as Numpy, python is capable of processing the OpenCV array structure for analysis. To Identify image patterns and their various features we use vector space and perform mathematical operations on these features.

A Brief History of OpenCV

OpenCV was started at Intel in 1999 by Gary Bradsky, and the first release came out in 2000. Vadim Pisarevsky joined Gary Bradsky to manage Intel’s Russian software OpenCV team. In 2005, OpenCV was used on Stanley, the vehicle that won the 2005 DARPA Grand Challenge. Later, its active development continued under the support of Willow Garage with Gary Bradsky and Vadim Pisarevsky leading the project. OpenCV now supports a multitude of algorithms related to Computer Vision and Machine Learning and is expanding day by day.

OpenCV supports a wide variety of programming languages such as C++, Python, Java, etc., and is available on different platforms including Windows, Linux, OS X, Android, and iOS. Interfaces for high-speed GPU operations based on CUDA and OpenCL are also under active development.

OpenCV-Python is the Python API for OpenCV, combining the best qualities of the OpenCV C++ API and the Python language.

Quick-Start with OpenCV

After talking so much about Computer Vision and OpenCV, I want to show you what we can do with a few simple applications. In this way, you can both practice better and step into the learning process.

Reading an image

First, we will do the process of reading an image and displaying it on the screen with OpenCV. If OpenCV is not installed in the IDE where you use Python, you must install it first.

pip install opencv-python

After installing, first of all, you have to import the library. You can call OpenCV library as cv2.

You need to save the image that you will read into an object.

You can read the image with the cv2.imread( ) function. This function takes as an argument the path to the file from which you got the image. Since my python work file is in the same folder as the image, I directly typed the name of the image. The point to note here is to write the extension of the visual. Do not forget this.

When we run the code, we give a name to this window because it will open in a visual window. We do this with cv2.namedWindow( ). This function takes the name of the window as its first argument. Actually, this is enough. But since I want to be able to change the size of the opened window, I add the argument cv2.WINDOW_NORMAL.

When the code runs, the functioncv2.imshow( ) is used to display the current image on the screen. It takes two arguments. The first is the name of the visual we are going to show, and the second is the object it is registered in. Here I saved the image as img. So I give img to the second argument.

Finally, I write the function cv2.waitKey(0)because I want it to be closed whenever we want on the opened visual screen. This function takes the number value in milliseconds. When we write 0 here, it means we can close the window at any time.

In addition, we add the function cv2.destroyAllWindows(). It is good that this function becomes a habit. When we do advanced projects, we can forget to close many windows that open on the screen. This function avoids this.

You can find all the codes below.

Reading a video from webcam

Now let’s examine how to read video from a computer camera.

First, we import the OpenCV library.

Then we write the image we will take from the computer camera on an object. I set this object as capture. We use cv2.VideoCapture(0) function to capture video from a computer camera. The value 0 here is for accessing the camera connected to your computer. If you have a camera, you can use it as 0. If you have more cameras, you can try 1,2 .. to access the corresponding camera.

As you know, videos consist of frames. In order to see the images we captured in the video, we have to print them on the screen in a loop. So we make the definition that will read the captured image and then return that image to us.

ret, frame = capture.read()

Then we adjust. To see the captured images as we see ourselves in the mirror, we need to invert them for the y-axis. That’s why we write the code frame = cv2.flip(frame, 1). Here, when we enter the argument after frame as 1, it gives the inverse of the image for the y axis.

Then we write the code cv2.imshow("Webcam", frame) to show the frames taken from the camera.

Then we determine how many milliseconds the captured images will remain on the screen. And in addition to this, when we press the q key on the keyboard, we write the following code to stop receiving the image.

cv2.imshow("Webcam", frame)
if cv2.waitKey(30) & 0xFF == ord("q"):
break

Here, the equivalence 0xFF == ord("q") means pressing the q key on the keyboard.

Finally, after working with videos, when we’re done, we need to write some code to release the image. It is as follows.

capture.release()

You can find all the codes below.

Aspect Ratio Application

Now I’m going to show you an aspect ratio app. In some cases, we may not know the dimensions of the image. In such a case, this kind of application can be used to automate this by avoiding manual calculations.

We define a function called resizewithAspectRatio. We set 4 arguments to this function. These are;

  • The name of the variable for which the image is kept.
  • Width
  • Height
  • And a function to avoid interpolation in resizing.

Let’s move on to the steps to be applied.

First, we define a dimension variable whose pre-definition is empty. Then we save the first two dimensions of the image in the original dimensions as h and w, height and width, into a tuple.

If the width and height are not given, we want the image to return to its original state.

If the width is not given, we want the following operations to be done.

r = height / float(h)
dimension = (int(w*r),height)

Let me tell you what this means. We calculate the ratio of the height given to the size in the original image. Then we multiply this ratio by the original width and reach the new dimension. Thus, there is no distortion of the dimensions visually.

Otherwise, if the height is not given, we apply similar procedures again.

As a result, we want to output the resized visual according to the function we wrote with the following code.

return cv2.resize(img, dimension, interpolation= inter)

Then we do the reading of the original image and the resized image according to the function we wrote.

img = cv2.imread("klon.jpg")
img1 = resizewithAspectRatio(img,
width = None,
height = 600,
inter = cv2.INTER_AREA)

Finally, we complete the application by writing some codes we know.

cv2.imshow("Original",img)
cv2.imshow("Resized",img1)

cv2.waitKey(0)
cv2.destroyAllWindows()

You can find all the codes below.

While doing these applications, I was inspired by the course in the link below.

Computer Vision Course

Resources

  1. https://docs.opencv.org/master/d0/de3/tutorial_py_intro.html
  2. https://www.geeksforgeeks.org/opencv-python-tutorial/
  3. https://www.geeksforgeeks.org/opencv-overview/
  4. https://www.geeksforgeeks.org/introduction-to-opencv/
  5. https://www.mygreatlearning.com/blog/what-is-computer-vision-the-basics/
  6. https://www.ibm.com/topics/computer-vision
  7. https://en.wikipedia.org/wiki/Computer_vision
  8. https://www.sas.com/en_us/insights/analytics/computer-vision.html

--

--

Kerem Kargın

BSc. Industrial Eng. | BI Developer & Machine Learning Practitioner | #BusinessIntelligence #MachineLearning #DataScience