Deep Learning Models for Scene Understanding

Date:

Seminar / Webinar (due to Covid-19 lockdown) held at ESA / ESRIN Earth Observation Programme Department.

Abstract: Scene understanding aims to answer the question: how to build a model of a real-world region in order to act and interact with it? It is therefore necessary to extract the semantics and geometry of the available data: images, 3D point-clouds, etc. For this purpose, several machine learning approaches are presented: they differ in the proportion of prior assumptions and learning introduced throughout the algorithms. Three aspects of the problem are envisaged. The first works aim at understanding the semantic content of images, through classification, object detection and semantic segmentation. Then, several learning approaches are proposed for Earth observation and remote sensing, notably for interactive learning, multimodal semantic classification and semantic change detection. Finally, the focus is on 3D vision, with depth estimation from a single image and classification of 3D point-clouds by neural networks.

These various approaches are based on common underlying mechanisms that are becoming increasingly important. They perform a multimodal analysis in order to benefit from the available, complementary data, obtained from different sensors but also from heterogeneous sources and meta-data. Symmetrically, joint optimization of multiple objectives helps to regularize the learning of efficient models. Moreover, they increasingly rely on a multiplicity of points of view on thescene to relate, in both learning and inference, spatial invariances that serve a local analysis and a global semantic reconstruction. This is made possible by a growing integration of the appearance and 3D structure, and leads to a better semantic understanding ofthe scene.