Background differentiation

Original VideoFeed.

When discussing how video content analysis works it is also important to shed light on the separation of objects from the background in a video stream. There are large differences in details when it comes to the implementation of background differentiation technology, especially the methods used to filter out image noise and image disturbances caused by changing lighting conditions and weather effects such as wind, rain and snow. The basic steps of a general background differentiation are described below.


Segmentation.

Segmentation
The system maintains a slowly varying image of what the system perceives as static background. For each new frame in the video stream, the background image is updated to adjust for changes in light levels etc. and pixels in the current frame that are different from the background image are segmented out. Different methods are used to reduce the effect of changing light, poor contrast and natural movements caused by weather. The types of algorithms used separates advanced technologies from simpler ones.



Clustering

Clustering
Clusters of segmented pixels are grouped together to form objects. Advanced systems are also capable of removing cast shadows and able to track objects that are partly obscured or where they blend in with the background. Different systems show significant differences in the accuracy of the segmentation.



Classification

Classification
Far from all VCA systems are capable of performing object classification, but a range of techniques exist. One approach is to estimate a number of parameters for each object, including size, speed and position. The parameter values are compared with pre-defined value sets, and the objects are assigned to the corresponding object class. This helps eliminating false alarms from e.g. cloud shadows and birds.

Another type of approach is to use what is commonly known as template matching techniques. Common for these techniques are that they compare the object to a library of templates and calculating the likelihood that the object is of the similar class as the template. This method can be further developed to extract an object from single images making it unnecessary to perform background differentiation. However, to our knowledge these methods are only tested in research projects and not yet implemented in commercially available video analytics products.

The disadvantages of the direct classification techniques are that they require more computing power and higher resolution (more pixels) across the object. This means that some VCA systems are able to detect and classify objects at long distances (200 m with 640x480 resolution) whereas other systems often stop at 50-75 m due to fundamental technological limitations.



Tracking – Man Upright.

Tracking
The last step in the process is tracking, where the task is to assign a unique ID and keep track of it as long as it is in the cameras field of view. Different techniques exist to track objects that are partly or fully obscured part of the time, and there are large differences in the different systems’ ability to track an object under difficult conditions, e.g. poor image quality.

Distinction between Features and Functions

So what type of events can be detected? BSIA (British Security Industry Association) puts it this way:
In theory any “behaviour” that can both be seen and accurately defined on a video image can be automatically identified and an alert raised.

OK, so the “behaviour” you are looking for has to be seen. This means that if you can’t see if a person is carrying a gun, you can not detect it using video analytics. It is actually a common misconception that this is possible. It is also often heard that “we don’t know what the military is capable of” as a way of justifying a belief that video analytics can do magic.

A more everyday example is the “left item detection”. This feature is meant to detect potentially dangerous objects, e.g. a bomb left behind at an airport or other places where many people travel. The only problem is that a video camera cannot see what’s behind a litter bin or see objects obscured by people passing by.
This brings us to the distinction between feature and function.

With feature (or more precisely, capability feature) we mean that a system is capable of certain things. Function is a more complex matter as it relates both to how the feature is implemented and to what extent it functions, its usability. A commonly used example is a car’s ability to stop. This is a feature every car must have. However, the braking function is implemented as a pedal, not a push-button in the glove compartment, in order for the function to be usable. Further, a sports car has the ability to stop faster than a family car. There are differences in functions.

If we look at all the features that a collection of VCA vendors claim to have we get a very confusing list:

• Asset Protector
• Loitering
• Left Item Detection
• Tracking
• Tailgating
• Intelli-Search
• Removed Item Detection
• Perimeter Defence
• Traditional Video Motion
Detection (VMD)

• Camera Obstruction
• Slip & Fall Detection
• Virtual Fence
• Wrong Direction Detection
• Suspicious Directional Movement
• Unusual Crowd Formation
Detection
• People Counting

• Intrusion Detection
• Crowd & Queue Management
• Tripwire Detection
• Unauthorised Activity Detection
• Running Detection

We should now ask, can the “behaviour” be seen, and can it be accurately described? Let’s take a look at “left item detection” again. We already discussed that this “behaviour” can only be seen if the line of sight to the object is not obscured. In other words, it can be seen, but only sometimes. Is the behaviour possible to accurately be described? How do we define a left item? How long must it be left for, and how far from a person does it need to be? If one person leaves a suitcase close to another person, is the suitcase a “left item”? Can it be accurately described? Yes, sometimes.

So, many VCA vendors claim that their system can detect left items, a “behaviour” that can sometimes be described and sometimes be seen. Let’s be honest, this is a feature with poor function.
What then about the other features listed on page 9? What is “unusual crowd formation” and “suspicious directional movement”? How well do the systems live up to these features?

Prevous Page Page 2 (4) Next Page
© 2009 AR Media International AB