Working principles of VCA
As previously discussed, an important distinguishing feature of a VCA-system is its ability to generate and analyse meta-data. By meta-data we mean, in this context, data describing the contents of an image or video stream in such a way that a comprehension of the ongoing events becomes possible. For example, if the system is to be able to inform the operator whenever a person is running it needs to extract at least the following data from the video stream (the “analytic part”):• Extract foreground objects
• Estimate size and other distinguishing properties of the foreground objects
• Estimate speed of each object (observe that in order to do this we need a framework for tracking of objects in subsequent image frames)
This extracted data is the meta-data. In the next step the VCA system analyses the meta-data, rather than the raw video data. In this case the meta-data analysis (the intelligent part) consists of:
• Classifying foreground objects into humans and non-humans
• Somehow “knowing” how to distinguish running from walking, standing still and so on.

Figure 1. Stereo sensors from Saab are used in the football tracking system from Tracab.
Business possibilities for intelligent video
There are several examples of how businesses can profit by using video analytics in their security installations. Detecting theft and other unwanted behaviour in a supermarket is obvious, but at the same time it could be used to follow up campaigns in the store to see which products are attracting the most customers.
Another example could be an installation of video analytics in a banking environment where the security function would be to prevent theft and fraud but simultaneously could be used to optimise customer queuing by alerting staff of increasing queues.
ROI – important
When procuring a CCTV system today the discussion is all about security, and maybe it should be. There may come a time when business analysts and statistics enter the CCTV scene and claim ROI from security installations. If this time is now, later or never, we do not know. What we do know is that the entire industry should learn and apply the same ideas and fundamental value to any CCTV installation. Return on Investment is a critical tool in sustaining the right to exist in any free economy.
So how do we measure return on investment from a video analytics perspective on a security application? Like any business case we crunch numbers, calculate cost versus efficiency and cost reductions.
Typical applications
A school implementing video analytics surveillance may be looking to lessen vandalism, decrease the cost of security personnel or simply stop having interruptions in their day to day activities. Their ROI could be calculated by investigating the total cost of vandalism in the school compared to the savings a public alarm using intelligent video would give.
In today’s modern society schools are often the target of youth vandalism costing large sums of public funds. The alarm and security installations of today rarely stop any vandalism but merely act as an alarm for security personnel to seek out who may have disturbed the peace within that public facility, often being inno-cent bystanders or kids playing sports on school grounds. With intelligent video each alarm could be filtered to only alert security forces of qualified alarms making sure no false alarms take the time or resources of public personnel.
Calculating ROI
Calculating ROI for video analytics solutions is usually a simple and non complex task; increasing security can often be calculated in a decreased cost of vandalism or theft. But like any good businessman would point out, time is also money, and security personnel definitely require time to investigate false alarms, even though no vandalism or theft occurred. If intelligent video could be used to lessen any false alarm rate, there would definitely exist a business case for making such an installation. A power transformation station is usually an unmanned high security installation somewhere in a populated area with lethal currents running in relatively open areas. Sending security to such an installation is usually costly but necessary since trespassing could lead to fatal injuries. Using video analytics in such a place could lessen false alarm rates making sure only real trespassers would set off alarms and in such a manner save time and money for security forces.
Similar applications exist worldwide at any unmanned installation, like a mobile network base station, or a train traffic tunnel entrance.
Stereo imaging
In conventional surveillance installations usually a single camera covers a particular area, i.e. the video sensors are monocular. In contrast, humans, as well as many other animals, employ binocular vision to watch and understand their surroundings. Monocular vision delivers a 2D projection of a 3D world, i.e. a flat description. Obviously, such a projection limits the kind of information that can be extracted from the scene. For example, any information regarding distance or depth is lost in the projection, making estimation of object size impossible without a terrain model. Binocular or, equivalently, stereo vision does not suffer from this limitation.
However, video sensors can be configured to deliver 3D information as well. Like the eye configuration of humans, this can be achieved by mounting two video sensors together with some in-between distance (say 50 cm) and looking in the same direction.
Saab’s stereo sensor is an example of such a compound imaging sensor which is able to provide stereo imaging in real time. The cameras are set at a small, known, distance from each other and observe the same scene. Signal processing takes the input from the two cameras and creates the depth map of the scene. See figure 1, page 10.
A stereo sensor can be based on almost any video cameras, as long as their camera parameters and their internal distance is known. For surveillance of dynamic scenes it is also important that the cameras are synchronised, the precision of synchronisation needs to be somewhat better than the typical time constant of the moving objects.
| Page 3 (4) |


