Smart Phone based Natural Language Description of Visual Scene

This project proposal implements a visual scene understanding system on a mobile phone platform. It builds on our recent work, a bottom-up approach to describing video contents in natural language. It is a part of a three-year project, and the following themes will be investigated in the first year:

(1) implementation of a compact, functional system on a Android OS, where memory usage is a likely issue;

(2) feasibility study for realtime processing and the extent of natural language description -- they are two sides of one coin; video processing involves non trivial amount of computation, meaning that, to limit the computation, we require a careful selection of visual features that can be extracted; the variety of visual features defines the extent of natural language description that can be created;

(3) study on a theoretical framework for accommodating image processing errors, and for incorporating spatial and temporal relations among multiple objects.