In this, the first of several educational articles, the Detektor Security Academy, sponsored by Axis Communications, will explain how to get an optimal image quality and a cost-effective video surveillance solution for different video surveillance applications.
Intelligent Video Analysis
In this Detektor Security Academy article Detektor, together with OPAX and Saab, takes a closer look at intelligent video (or Video Content Analysis) in order to raise the security industry’s awareness and knowledge of this relatively new and potentially powerful technology. What can you expect from intelligent video? How does it work? What are the ROI-aspects of a VCA-installation? These and many other questions are answered in this article.
By Matts Lilja, Frode Berg Olsen (OPAX), Leif Haglund, Amritpal Singh (Saab).
GIn the security industry the theme under discussion has several names, two of the most frequently used are, IVA (Intelligent Video Analysis) and, VCA (Video Content Analysis). There are many variations to these, but VCA is the most accurate term to describe this evolving security field, as it is difficult to label a technical system as intelligent in the true sense of the word. Common for all the different definitions of VCA is that they relate to a technology that is used to analyse video for specific data, behaviour, objects or attitude.
Huge amounts of video data today exists in consumer and enterprise applications, but interaction with video stored in huge data banks requires better tools to describe, organise and manage that video data. For this reason, private companies and research institutes have joined forces in research projects exploring the possibilities of automatically describing and categorising the content of multimedia and video in particular. Manually describing the content of video is of course very time consuming and automated methods are needed. The goal is to automatically create video abstracts as structured media is more suitable for search and retrieval.
Imagine if all the video on YouTube could be automatically described by a software application. Then the act of searching in the video archive would no longer be a result of the subjective tags each person manually uses to describe the video when it is uploaded. However, these application types are commercially distant.
Large amounts of video are of course also to be found in security installations involving cameras. The interest in VCA has increased during recent years, and will be the focus of this article. Within security, VCA is mainly used to analyse real-time video. However, it can also be used to scan recorded video by setting certain parameters for the software to look for, potentially a very time saving function that further adds value in a VCA-installation.
To shed light on what makes up a typical video installation and why it can not be classified as intelligent or analytic, we use an example of a shopping mall. Basically, a number of cameras are more or less directly connected to a number of displays, which sometimes are watched by operators and perhaps also recorded. Usually, the video processing performed, if any, extends to mere image enhancement for display purposes. Any analysis and comprehension of what is going on in the surveyed areas is on the part of the operators. Fortunately, most of the time nothing special or threatening happens. Unfortunately, research shows that even trained operators lose up to 90 % of their attention within just 22 minutes. In the case of an illegal event of some sort, the video surveillance system itself provides no more support than making available the recorded video footage. If we take into account the previously mentioned operator attention span statistics, the probability that a critical event is stopped or handled thanks to the camera is less than satisfying, at least for the most part. The investigation of what happened prior, during and after the event must be made by manually scanning the recorded video, a potentially cumbersome and time-wasting task. Since no real support is offered by the system, neither in real time or offline, such a system can not be classified as analytic or intelligent.
Before any analysis takes place, meta-data has to be extracted from the video stream.
The meta data contains information such as the speed, size and position of objects
What is VCA?
In contrast to the shopping mall example, video content analysis is expected to provide a number of automatic functions that simplify and at least to some extent overcome the limitations of the operator, both in real-time and in off-line investigations.
Before going into depth about the technology of video content analysis, a more basic type of video analytics has been available for more than 20 years, namely video motion detection. Motion detection is today an integrated feature of many digital video cameras and video management systems. VMD looks at changes in pixel value (motion) in an image and triggers an event if the change is above a user-defined threshold. More advanced implementations allow the user to trigger events only if the motion appears within a given zone of the image or if the size of the moving area is above a certain limit.
Because the VMD technology is quite simplistic, its use is limited by a large amount of false alarms. With this we mean events triggered even if there is no real object moving in the scene. The most important benefit of VMD is that it is often used to control when video footage is recorded and not, since a static picture seldom is of interest, and for that reason saves storage space.
So, VMD doesn’t appear very intelligent. What is it then that makes a video content analysis system intelligent, and does it make sense at all to call a video analytics system intelligent?
If we return to the shopping mall example again, and now equip it with a video content analysis system we can see a number of improvements:
• Since the shopping mall consists of a number of shops, entrances/exits, open areas and hall ways it is a challenge in the conventional system to get a good overview of the entire mall. However, a VCA-system does “know” the relation between the physical network of entrances/exits, hall ways etc to the network of cameras deployed in the mall. Furthermore, the VCA-system can track the people and objects through this network in a consistent way. In this way the operators see a significantly improved overview of the mall. For example, it becomes possible to ask the VCA-system where a culprit or adversary has come from and, perhaps more importantly, where he can run and where he can be intercepted.
• From time to time it may be necessary to pay special attention to certain areas of the mall. A conventional setup will require that an operator manually watches this area carefully, which will either require him to lower his attention to other areas or will require an additional operator. In a VCA-solution, the operator can define a virtual security area and associated rules to trigger alerts. He can attend his regular duties while the VCA-system monitors the defined areas and alerts only at specified events.
• In the event of a rule breakage, the operator is alerted but also a pan/tilt/zoom capable HDTV camera can automatically be directed towards the location of the event to acquire high quality pictures of the event.
• A VCA-system might also contain modules for biometric identification, for example for the identification of the operators themselves as they enter sensitive areas.
How does VCA work?
VCA is made possible through the creation and analysis of meta-data. Typical examples of meta-data are:
• Object size and position in an image
• Object Speed
• License plate number
In other words, meta-data is data about data, or data describing the contents of an image. Since these data are then put forward for further analysis it is obvious that the nature and the quality of the data are of importance.
First, it is important that only real objects are described so that not vegetation moving in the wind or a shadow on the ground is falsely reported to be a real object.
Secondly, it is important that the meta-data are sufficiently accurate. The obvious example is Automatic Number Plate Recognition (ANPR). If the accuracy of the license plate ID is poor, then the ANPR system is rendered useless. Also, if the size of an object is inaccurately estimated it is a risk that the analysis result is wrong. Let’s say that a system is configured to raise an event if an object above a given size enters the scene. If the object size is inaccurately estimated it is a risk that a person can enter a forbidden area without triggering an alarm. The more advanced meta-data analysis the data is subject to the more important accuracy is.
Third, the nature of the data is important. It is sufficient to know the position of an object in the picture in order to implement a sterile zone or trip wire alarm functionality. However, imagine the possibilities that open up if the objects’ positions are given in a geographical (map) coordinates and the objects are classified as human, car, animal etc.
When it comes to the end user value, the meta-data analysis is of interest. It is the result of this process that is presented to the user as events.
The analysis of a person’s position over time can result in a perimeter breach alert or a loitering alert depending on the movement of the person and the type of analysis that is performed. If the meta-data contains information about the objects’ geographical position, a meta-data analysis can decide if a car is speeding or a person is running. Information about each car’s speed can again be used to calculate average speed on a road or to raise an alert in the event of congestion.
| Page 1 (4) |


