Automatic Security and Surveillance System for Video Sequences:
The Islamic republic of Pakistan is the world's sixth most populous country and is placed 5th on the list of strongest nations militarily. In addition, Pakistan in Asia has much geographical importance because of its strategic location. Firstly, Pakistan is fully nourished with natural resources. Pakistan enjoys all the features of nature like sea, desert, mountains, and rivers. Secondly, in this region there are four seasons which results in variety of natural and synthetic products. Pakistan shares its border with very prominent countries of this world. Pakistan also provides sea transportation facilities to many landlocked countries. Despite these blessings, still Pakistan is a developing nation. It has not been able to utilize its potential. There is no wonder that it is engulfed with a number of social and economic problems like poverty, lawlessness, inequality, illiteracy etc. but terrorism is proving to be the biggest threat to Pakistan’s progress. This problem became severe after 9/11 and now it is a menace for our motherland. There are a number of dimensions of this problem that require thorough treatment with political, economic, social, justice, security and technology setups of the country. From technological perspective, one tested technology that can provide some protection against terrorism is video surveillance. They are quite common these days , resulting in sharp rise in surveillance data during last few years. This rate of increase has been gathering speed in underdeveloped and developed countries alike. Only in United Kingdom, there are 4.2million closed circuit TV cameras installed, one per every 14 people.
This procedure is becoming common in Pakistan as well. For example CPO office Lahore has more than 100 surveillance cameras while Punjab institute of Cardiology has 80 cameras to monitor activities of patients and their relatives. A huge amount of data is collected by these cameras every day, but question remains how to make meaningful use of this data. It seems futile to store the whole day data especially if it was a serene day, or even if some suspicious activity was observed in some part of the day. The storage mechanism should be intelligent enough to log only the most relevant coverage of the video stream. Even then the problem remains of storing huge amount of data which is expensive with respect to the storage media required .
This project addresses this problem by using image processing methods for automatic understanding of video streams while keeping a check on storage needs of streaming data. The system will use surveillance cameras installed in security sensitive areas which will capture and store videos of daily life activities happening around in that area. Automatic understanding of these videos will be performed using image processing methods which can result in the development of several useful applications. In addition, textual description of the video sequence will be generated using natural language processing which will save huge storage requirement. A web based utility for videos searching and summarization will be provided to search most relevant videos in accordance with the security needs. This project when completed will provide great support to law enforcement agencies in combating events of terrorism and capturing culprits.
Technically, this project mainly encompasses automatic understanding of visual scenes based on individual contents/High Level Features (HLFs)  present in the videos. By extracting high-level features, such as the human’s face, age, gender, emotions, actions and objects, human-human interaction, human-object interaction, scene settings etc., we can easily obtain the interpretation of a video stream. The peaceful purposes of this project are monitoring crowd, restricted access points and counting occurrence of specific persons. The project provides substitutes for human monitoring of surveillance cameras. Such human monitoring can be compromised with drowsiness or over work scenarios; therefore an automated solution is highly desirable since it would run all days and nights and on virtually any number of cameras without human intervention.
Currently, image processing community is focused on identification of individual objects, their properties and events in a video stream [1,4, and 11]. This project extends this work to the next level. It is a step ahead of keyword based tagging as it captures relations between key-words associated with videos, thus clarifying the context between them. Initially, HLFs are identified from a video sequence. Secondly, these HLFs are combined together using natural language processing techniques to generate smooth descriptions of a videos sequence. Finally,this automatically generated language description is summarized to present most important contents in the provided video sequence.
Successful implementation of this project can lead to useful applications related to video searching, retrieval, mining and warehousing. Given a textual description or a keyword, user of the system can search a video. In this way, user can save his time while retrieving only desired events instead of watching whole video. In addition, proposed research work open new horizons for statistical explanation of the video sequences, e.g. no of persons in video sequence, unusual events occurring in a complete day, counting humans with different age, gender, emotions and actions, list of common objects occurring in most of the videos etc. These statistics will be helpful in taking precautionary measures for eradicating terrorism activities being faced by our nation.
Finally, a web based searching and summarization tool will be generated for the end users of this research work. Providing a secured web interface for geographical distant user is most efficient way of ensuing security and eradicating risks. It will also reduce time wastage for high level authorities who have to visit different areas for security inspections, e.g. headquarter operators may inspect any particular event by entering a query or a keyword on the provided web interface. Summaries of events are also available to the end users to provide a glimpse of important contents that may be present in the video sequence.