Talk Title: Natural Language Descriptions For Video Sequences
Date and Time: Tuesday, 3rd Nov 2015 14:00 hrs - 15:00 hrs
Venue: Seminar Hall, KICS, UET, GT Road, Lahore
(Focal Person: Dr. Usman Ghani, 031-3619-2000)
(Focal Person: Mr. Kashif Bashir, 032-3448-0000)
Speaker: Dr. Usman Ghani
Co Event: Prize Distribution Ceremony By More Magazine (1500-1530 Hrs)
This work is concerned with the automatic generation of natural language descriptions that can be used for video indexing, retrieval and summarization applications. It is a step ahead of keyword based tagging as it captures relations between keywords associated with videos, thus clarifying the context between them. Initially, we prepare hand annotations consisting of descriptions for video segments crafted from a TREC Video dataset. Analysis of this data presents insights into humans interests on video contents. For machine generated descriptions, conventional image processing techniques are applied to extract high level features (HLFs) from individual video frames. Natural language description is then produced based on these HLFs. Although feature extraction processes are erroneous at various levels, approaches are explored to put them together for producing coherent descriptions. For scalability purpose, application of framework to several different video genres is also discussed. For complete video sequences, a scheme to generate coherent and compact descriptions for video streams is presented which makes use of spatial and temporal relations between HLFs and individual frames respectively. Calculating overlap between machine generated and human annotated descriptions concludes that machine generated descriptions capture context information and is in accordance with human’s watching capabilities. Further, a task based evaluation shows improvement in video identification task as compared to keywords alone. Finally, application of generated natural language descriptions, for video scene classification is discussed.
Muhammad Usman Ghani is an associate professor in the Department of Computer Science, University of Engineering and Technology, Lahore. His PhD (Sheffield University, Uk) study was concerned with statistical modeling for machine vision signals, specifically language descriptions of video steams. He has been studying on spoken language processing using statistical approaches, with applications such as information extraction from speech and speech summarization. His recent work is concerned with multimedia, incorporating text, audio and visual processing into one frame work.
Speaker's Profile: http://www.kics.edu.pk/staff/Dr.M.UsmanGhaniKhan