IEEE MIPR 2026

IEEE 9th International Conference on Multimedia Information Processing and Retrieval (MIPR 2026) IEEE MIPR 2026

August 9–11, 2026 | Chatrium Grand Bangkok Hotel, Thailand
Mahidol University
IEEE Computer Society
IEEE

Design and Optimization for Interactive Spatial-Temporal Multi-Modal Gaussian Splatting

Abstract

With the advances of 3D representations, Gaussian Splatting has quickly evolved into the most promising volumetric video representation. A great amount of research efforts has been conducted in many perspectives for better spatial-temporal multi-modal scene reconstruction based on Gaussian Splatting. In this tutorial, we will first introduce the fundamental representation of Gaussian Splatting. Then, we will overview the recent development of Gaussian Splatting along the end-to-end ecosystem from content capture, content creation, content delivery, to content consumption. On the content capture stage, we will address critical issues and solutions to help good quality model building. On the content creation stage, we will discuss methods to enrich the multi-modal experience via embedding multimodal info into the Gaussian Splatting primitives. On the content delivery stage, we will discuss methods to reach high photorealistic quality and streamable bit rate will be discussed. On the content consumption as language and semantics guided navigation, interaction with multi-surface-part articulable 3D Gaussian Splatting. At the end of this talk, we will present the latest international standardization efforts and highlight future research trends.

Motivation and relevance to MIPR and the multimedia community

In the past, MIPR conferences have been a great platform to exchange knowledge from different domains of multimedia topics. MIPR has enabled rich discussions for future promising and diversified research. Besides, MIPR always invites excellent keynote talk speakers and industry innovation forum experts to present the state-of-the-art trends in both academia and industry. To continue the growth of MIPR, having an in-depth tutorial to introduce the basics and advances of 3D/4D scene representation and reconstruction in MIPR should let MIPR attendees to learn those new tools and blend into their own researches in multimedia community, more specifically for research direction for immersive and interactive applications.

Detailed outline of the tutorial content, including time allocation

Topic 1: Fundamentals of Gaussian Splatting

Time: 30 minutes

  • Introduction and Motivations
  • Gaussian Splatting Representations
  • Gaussian Splatting Optimizations

Topic 2: Gaussian Splatting Content Capture

Time: 30 minutes

  • Introduction to Multi-View Camera Capture Systems
  • Temporal Synchronization of Cameras
  • Spatial Coverage of Cameras

Topic 3: Gaussian Splatting Content Creation

Time: 30 minutes

  • Gaussian Splatting Multi-Modal Attributes
  • Gaussian Splatting Editing

Topic 4: Gaussian Splatting Content Delivery

Time: 30 minutes

  • Structured Representation of Gaussian Splatting
  • Compression of Gaussian Splatting

Topic 5: Gaussian Splatting Content Consumption

Time: 30 minutes

  • Explicit Geometry Info for Gaussian Splatting
  • Interaction of human and Gaussian Splatting

Learning objectives and target audience

The learning objective of this tutorial is to provide the MIPR attendees with sufficient background knowledge on Gaussian Splatting and learn the current state-of-the-art research directions. Hope to stimulate MIPR attendees to contribute more future research in the 3D/4D scene representations and reconstructions topic. This tutorial targets audience is graduate students, researchers, and industry practitioners who are interested in this emerging area. We will introduce sufficient background knowledge for attendees who are new to this area so they will understand the fundamentals of Gaussian Splatting. Then, we will dive into details for each component and topic along the entire end-to-end ecosystem.

Short biographies of the presenters

Guan-Ming Su

Guan-Ming Su received the Ph.D. degree from the University of Maryland, College Park. He is currently the Director of Research with the Dolby Laboratories, Sunnyvale, CA, USA. He is the inventor of more than 230 U.S./international patents and pending applications. He is one of the recipients of 2020 (72nd) Technology and Engineering Emmy Award and 2021 (73rd) Engineering Emmy Award Philo T. Farnsworth Award for the contribution to high dynamic range (HDR) and wide color gamut (WCG) video as Dolby Vision format. He received 2025 University of Maryland ECE Distinguished Alumni Award and 2025 APSIPA Industrial Distinguished Leader Award. His co-authored paper won the best industry paper award in IEEE ICIP 2025. He served in multiple IEEE international conferences, such as the TPC Co-Chair in ICME 2021, the Industry Innovation Forum Chair in ICIP 2023 and 2025, and the General Co-Chair in MIPR 2024 and 2025. He served as a VP for industrial relations and development in APSIPA, from 2018 to 2019. He has been serving as the Vice Chair for Conference in IEEE Technical Committee on Multimedia Computing (TCMC), since 2021. He served as an Associate Editor for APSIPA Transactions on Signal and Information Processing, IEEE MultiMedia Magazine, and now IEEE Transactions on Circuits and Systems for Video Technology.

Personal website: https://sites.google.com/site/wwwgmsu/

He led a tutorial with topic "Advances and Challenges in Real-World Video Restoration" in ICVGIP 2025.