Computer vision

From Simple English Wikipedia, the free encyclopedia

Computer Vision is the subfield of Computer Science that studies ways to use a computer to process visual information contained in an image or video. The goal is for computers to understand and interpret what they see. Unlike Computer Graphics (which uses computers to make new images), Computer Vision takes visual information from the real world and then makes predictions about the contents those images or videos.

How it works[change | change source]

While it is easy for humans to look at an image and understand it without any effort, this is very difficult for computers. A computer image is actually just a grid of pixels, so a computer can only interpret an image as a long list of colors like "red red blue black white red...". There is no simple way for a computer to read this list of colors and determine that it contains a picture of a flower, or a dog, or a school bus. Computer Vision is the series of techniques to make sense out of this long list of colors.

At first, the field of Computer Vision focused on making computers understand images so that they could describe what they see. For example, scientists wanted a computer to be able to say "rabbit" when it was given a picture of a rabbit. It was challenging for scientists to devise ways to help computers understand images, and it was especially hard to get computers to do give the correct answers reliably. Scientists discovered that even when a computer could accurately identify a "normal" rabbit in an image, the computer failed to understand that an upside-down rabbit was still a rabbit.

Early methods that scientists developed are called conventional techniques. They are still useful today. Some examples are:

  • Edge detection: picking out the edges between light and dark areas
  • Corner detection: picking out corners of sharp shapes
  • Blob detection: picking out curvy shapes
  • Filtering: ways to adjust an image that makes it easier to pick out shapes, such as blurring it or changing the brightness
Example of Edge Detection applied to a video
Example of Edge Detection applied to a video

Today, Computer Vision has improved dramatically thanks to advancements in Machine Learning. Computers can now easily identify objects in images and video, and they do it reliably without making many mistakes. Computers can even understand the context of an image, such as "two rabbits sitting under a bench in a forest at night". This is a much harder problem to solve than just being able to recognize a rabbit, and machine learning enables computers to solve this problem.

Objects detected in an image using a neural network called YOLO.
Objects detected in an image using a neural network called YOLO.

The key tool that enabled this progress is the neural network, which is a way for humans to train a computer to do Computer Vision tasks. Recent discoveries in better types of neural networks and deep learning has improved Computer Vision very fast since 2012.

These newer methods are called modern techniques. There are many different techniques in this category, but the most important examples are:

  • Convolutional Neural Network: a kind of neural network specialized for finding objects in images
  • Transformer: a very large and smart type of neural network that is extremely good at Computer Vision tasks
  • Recurrent Neural Network: a type of neural network that can work with video, but not images

Applications[change | change source]

Computer Vision is very helpful to humans because computers can process a lot of information very quickly. Because they are so fast, computers can identify problems faster than humans can, or make measurements more accurately than humans. With Computer Vision, computers can also perform boring tasks that humans don't want to do, like monitoring surveillance video for hours at a time.

Optical Character Recognition[change | change source]

Google Books used scanned images of pages of books to convert those pages to a text format. This allowed users to search the text in the book without needing to read it page by page.

Medical Imaging[change | change source]

Computer systems use the X-Ray and MRI imaging of hospital patients to make a diagnosis as to whether or not they have cancer. In some instances, the computers have outperformed the doctors who make diagnoses on the same patient.https://www.bbc.com/news/health-50857759[1]

Self Driving Cars[change | change source]

Autonomous Cars use a combination of Computer Vision and LiDAR to detect pedestrians and Stop Signs when driving passengers.

References[change | change source]

  1. Walsh, Fergus (January 2, 2020). "AI 'outperforms' doctors diagnosing breast cancer". BBC News. BBC News. Retrieved 24 March 2021.