Working principles of VCA

As previously discussed, an important distinguishing feature of a VCA-system is its ability to generate and analyse meta-data. By meta-data we mean, in this context, data describing the contents of an image or video stream in such a way that a comprehension of the ongoing events becomes possible. For example, if the system is to be able to inform the operator whenever a person is running it needs to extract at least the following data from the video stream (the “analytic part”):
• Extract foreground objects

• Estimate size and other distinguishing properties of the foreground objects

• Estimate speed of each object (observe that in order to do this we need a framework for tracking of objects in subsequent image frames)

This extracted data is the meta-data. In the next step the VCA system analyses the meta-data, rather than the raw video data. In this case the meta-data analysis (the intelligent part) consists of:

• Classifying foreground objects into humans and non-humans

• Somehow “knowing” how to distinguish running from walking, standing still and so on.


Figure 1. Stereo sensors from Saab are used in the football tracking system from Tracab.


Business possibilities for intelligent video

There are several examples of how businesses can profit by using video analytics in their security installations. Detecting theft and other unwanted behaviour in a supermarket is obvious, but at the same time it could be used to follow up campaigns in the store to see which products are attracting the most customers.
Another example could be an installation of video analytics in a banking environment where the security function would be to prevent theft and fraud but simultaneously could be used to optimise customer queuing by alerting staff of increasing queues.

ROI – important

When procuring a CCTV system today the discussion is all about security, and maybe it should be. There may come a time when business analysts and statistics enter the CCTV scene and claim ROI from security installations. If this time is now, later or never, we do not know. What we do know is that the entire industry should learn and apply the same ideas and fundamental value to any CCTV installation. Return on Investment is a critical tool in sustaining the right to exist in any free economy.
So how do we measure return on investment from a video analytics perspective on a security application? Like any business case we crunch numbers, calculate cost versus efficiency and cost reductions.

Typical applications

A school implementing video analytics surveillance may be looking to lessen vandalism, decrease the cost of security personnel or simply stop having interruptions in their day to day activities. Their ROI could be calculated by investigating the total cost of vandalism in the school compared to the savings a public alarm using intelligent video would give.
In today’s modern society schools are often the target of youth vandalism costing large sums of public funds. The alarm and security installations of today rarely stop any vandalism but merely act as an alarm for security personnel to seek out who may have disturbed the peace within that public facility, often being inno-cent bystanders or kids playing sports on school grounds. With intelligent video each alarm could be filtered to only alert security forces of qualified alarms making sure no false alarms take the time or resources of public personnel.

Another typical area for intelligent video use is with compliance. Today we have a growing number of regulatory bodies setting up rules for different businesses and organisations and many of these rules have a security focus. During manufacturing of biological substances the American FDA sets regulations for security on site even though the factory may reside outside the borders of the United States. FDA regulates for instance how traffic within the compound area may flow and what types of traffic is allowed. Compliance with these rules is usually not an option for any medical company as the US market is far too valuable to even consider surrendering, leaving the manufacturer with complex solutions for measuring speed and direction of vehicles as their only option. In this case a high quality video analytics solution could solve the entire issue in a very cost effective manner making ROI calculations even management consultants would adore.

Calculating ROI

Calculating ROI for video analytics solutions is usually a simple and non complex task; increasing security can often be calculated in a decreased cost of vandalism or theft. But like any good businessman would point out, time is also money, and security personnel definitely require time to investigate false alarms, even though no vandalism or theft occurred. If intelligent video could be used to lessen any false alarm rate, there would definitely exist a business case for making such an installation. A power transformation station is usually an unmanned high security installation somewhere in a populated area with lethal currents running in relatively open areas. Sending security to such an installation is usually costly but necessary since trespassing could lead to fatal injuries. Using video analytics in such a place could lessen false alarm rates making sure only real trespassers would set off alarms and in such a manner save time and money for security forces.

Similar applications exist worldwide at any unmanned installation, like a mobile network base station, or a train traffic tunnel entrance.

Stereo imaging

In conventional surveillance installations usually a single camera covers a particular area, i.e. the video sensors are monocular. In contrast, humans, as well as many other animals, employ binocular vision to watch and understand their surroundings. Monocular vision delivers a 2D projection of a 3D world, i.e. a flat description. Obviously, such a projection limits the kind of information that can be extracted from the scene. For example, any information regarding distance or depth is lost in the projection, making estimation of object size impossible without a terrain model. Binocular or, equivalently, stereo vision does not suffer from this limitation.
However, video sensors can be configured to deliver 3D information as well. Like the eye configuration of humans, this can be achieved by mounting two video sensors together with some in-between distance (say 50 cm) and looking in the same direction.

Saab’s stereo sensor is an example of such a compound imaging sensor which is able to provide stereo imaging in real time. The cameras are set at a small, known, distance from each other and observe the same scene. Signal processing takes the input from the two cameras and creates the depth map of the scene. See figure 1, page 10.
A stereo sensor can be based on almost any video cameras, as long as their camera parameters and their internal distance is known. For surveillance of dynamic scenes it is also important that the cameras are synchronised, the precision of synchronisation needs to be somewhat better than the typical time constant of the moving objects.

The availability of the extra distance (depth) information, as compared to a normal video sensor, is highly useful in a VCA system. For example, this information can be used to accurately measure the actual physical properties of an object as well as easily separate partially occluding objects from each other. Furthermore, this can be achieved without the need of a terrain model. Beside this great advantage, the stereo imaging substantially improves the robustness in the basic processing chain. Since stereo enables an object location in 3D, birds and other distractions can easily be separated out and thereby reduces the false alarm rate. An object can also be easily separated from its cast shadow, which is a challenge in the single camera case. Detection and clustering is also significantly less dependent on varying lighting conditions.

Prevous Page Page 3 (4) Next Page
© 2009 AR Media International AB