Discovering Video Semantics

We propose to automatically generate richer video semantics where semantics are defined as the theme of the video based on objects and their activities. The video semantics have great potential to help video organization, indexing and searching with focus on user needs. The semantics task is performed by aligning audio and visual information where both modalities complement each other, so missing information can be inferred and repetitions are removed with enhanced flexibility and efficiency of video processing. The semantic generation is realized through topic discovery and building hierarchies containing information related to objects, their locations and actions both in audio and visual domain. This semantic learning task is similar to human as they can understand and unveil relevant information from videos using single or multiple modalities neglecting irrelevant information.