List of Tutorials

✏️ Synthetic Data and Multimedia

Time: December 13, 1:00 pm - 3:00 pm


Image and video processing has seen a rapid growth in the last decade, with remarkable improvements made possible thanks to the availability of ever-increasing computing power as well as deep learning-based frameworks that now allow human-like and beyond performances in many applications, including detection, classification, segmentation, to name a few. However, it is to be noted that the development of novel algorithms and solutions is strictly bound to the availability of a relevant amount of data, which must be representative of the task that needs to be addressed. With this respect, the literature has shown a rapid proliferation of datasets, tackling a multitude of problems, from the simplest to the most complex ones. Some of them are largely adopted and are currently recognized as the reference benchmark against which all newly proposed methods need to compete. Still, there is an ever growing demand for data, to which researchers respond with larger and larger datasets, at a huge cost in terms of acquisition, storage, and annotation of images and clips, often facing inconsistencies in annotations. The use of synthetically-generated data can overcome such limitations, as the generation engine can be designed to fulfill an arbitrary number of requirements, all at the same time. The tutorial will take a holistic view on the ongoing research, the relevant issues, and the potential application of using synthetic data for multimedia data processing, as a standalone resource or in combination with real data. In particular, the attention will be focused on the domain of images and videos, where the lack of representative data for specific problem categories has let emerge the possibility of relying on machine-generated contents.


Nicola Conci, University of Trento, Italy

Nicola Conci is Associate Professor at the Department of Information Engineering and Computer Science, University of Trento, where he teaches Computer Vision and Signal Processing. He received his Ph.D in 2007 from the same University. In 2007 he was a visiting student at the Image Processing Lab. at University of California Santa Barbara. In 2008 and 2009 he was post-doc researcher in the Multimedia and Vision research group at Queen Mary University of London. Prof. Conci has authored and co-authored more than 130 papers in peer-reviewed journals and conferences. His current research interests are related to video analysis and computer vision applications for behavioral understanding and monitoring. At the University of Trento he coordinates the M.Sc. Degree in Information and Communications Engineering, and he is delegate for the department of the research activities related to the Winter Olympic Games Milano-Cortina 2026. He has served as Co-chair of the 1st and 2nd International Workshop on Computer Vision for Winter Sports, hosted at IEEE WACV 2022 and 2023, General Co-Chair of the International Conference on Distributed Smart Cameras 2019, General Co-Chair of the Symposium Signal Processing for Understanding Crowd Dynamics, held at IEEE AVSS 2017, and Technical Program Co-Chair of the Symposium Signal Processing for Understanding Crowd Dynamics, IEEE GlobalSip 2016.

Niccolò Bisagno, University of Trento, Italy

Niccolò Bisagno received his Ph.D in 2020 from the ICT International Doctoral School of the University of Trento, Italy, for the thesis “On simulating and predicting pedestrian trajectories in a crowd”. In 2019, he was visiting PhD student at the University of Central Florida, Orlando, USA. In 2018, he was visiting Ph.D student at the Alpen-Adria-Universität, Klagenfurt, Austria. His research area focuses on crowd analysis with a focus on pedestrian trajectory prediction and crowd simulation in virtual environments. He is also interested in machine learning and computer vision, with special focus on biologically-inspired deep learning architectures and sports analysis applications.

✏️ Human-centric Visual Understanding

Time: December 13, 3:00 pm - 5:00 pm


Human-centric visual understanding is one of the fundamental problems of computer vision and multimedia understanding. With the development of deep learning and multi-modalities analysis techniques, researchers have strived to push the limits of human-centric visual understanding in a wide variety of applications, such as intelligent surveillance, retailing, fashion design, and services. This tutorial will present recent advances under the umbrella of human-centric visual understanding, ranging from the fundamental problems of gait recognition, monocular real-time 3D human recovery, human action understanding, and motion prediction, human action analysis in surveillance videos, finally to multimedia event analysis and understanding in complex scenarios and industrial applications. We will discuss the key problems, common formulations, existing methodologies, real industrial applications, future directions, etc. The views of our tutorial not only come from the research field but also combine the real-world requirements and experiences in the industrial community. Therefore, this tutorial will inspire audiences from the research and industrial community, and facilitate research in computer vision and multimedia for human behavior analysis and human-centric analysis modeling. We held a tutorial on the same topic for the first time in ACM Multimedia Asia 2019, so we mainly introduce our work from 2019 to 2022 in this tutorial.


Kun Liu, JD Logistics, Beijing, China

Kun Liu is currently an algorithm researcher at JD Logistics, Beijing, China. He received a Ph.D. degree in computer science from the Beijing University of Posts and Telecommunications in 2021. His current research interests include multimedia content analysis and computer vision, especially human behavior analysis in surveillance videos. He has authored or co-authored more than 10 papers in journals and conferences. Dr. Liu has won 1st Place in Step Ordering Track of CVPR 2020 YouMakeUp VQA Challenge, 1st Place in General Anomaly Detection Track of ACM MM 2020 CitySCENE Anomaly Detection Challenge, 2nd in Part-level Action Parsing Track of ICCV 2021 DeeperAction Challenge, and 2nd prize in No Interaction Track of SAPIEN Open-Source Manipulation Skill Challenge.

Hongsong Wang, Southeast University, Nanjing, China

Hongsong Wang is an Associate Professor with the Department of Computer Science and Engineering, Southeast University, Nanjing, China. He received a Ph.D. degree in Pattern Recognition and Intelligent Systems from the Institute of Automation, University of Chinese Academy of Sciences in 2018. He was a postdoctoral fellow at National University of Singapore in 2019. He was a research associate at Inception Institute of Artificial Intelligence, Abu Dhabi, UAE in 2020. His research interests include action recognition, human motion modeling, object detection, and multi-object tracking.

Xinchen Liu, JD Explore Academy, Beijing, China

Xinchen Liu is a Senior Researcher at JD Explore Academy. His research interests include human-centric computer vision and its application in retail. He received a Ph.D. degree in computer science from Beijing University of Posts and Telecommunications in 2018. He received IEEE TMM 2019 Prize Paper Award, IEEE ICME 2016 Best Student Paper Award, and the Outstanding Doctoral Dissertation Award of CSIG in 2019. Dr. Liu has won 1st Place in AI+Person Re-identification Track of National AI Challenge 2020, 2nd Place in Multi-Person Human Parsing Track of CVPR 2019 Look-into-Person Challenge, and 2nd in Single-Person Human Parsing Track of CVPR 2018 Look-into-Person Challenge. He served as a PC member of ACM MM, AAAI, ICME, etc.

Qian Bao, JD Explore Academy, Beijing, China

Qian Bao is currently an algorithm researcher at JD Explore Academy, Beijing, China. She received a B.E. degree from the Harbin Institute of Technology, Harbin, China, in 2012, and a Ph.D. degree in signal and information processing from the University of Chinese Academy of Sciences, Beijing, China, in 2017. Her current research interests include human behavior analysis and 3D human recovery. She has authored or co-authored more than 30 papers in journals and conferences.

Cheng Zhang, Carnegie Mellon University, Pittsburgh, USA

Cheng Zhang is a Postdoctoral Fellow at the Robotics Institute, Carnegie Mellon University. Prior to that, he obtained a Ph.D. in Computer Science and Engineering (CSE) from The Ohio State University (OSU). His research interests are in machine learning with applications to computer vision and multimodal sensing. His works have been published at ICCV/CVPR/ECCV/BMVC, NeurIPS, EMNLP, INFOCOM, etc. He has also spent time building real-world perception systems at NVIDIA AV, Volvo Cars, FX Palo Alto Laboratory, and Alibaba. He is a recipient of the Graduate Research Award from the OSU CSE department in 2022, and an outstanding reviewer of ICML 2022 and BMVC 2021.

Wu Liu, JD Explore Academy, Beijing, China

Wu Liu is a Senior Researcher at JD Explore Academy, China. His research topic is Multimedia Analysis and Search. He has defined the Progressive Search Paradigm, developed many applications thereof, and participated in building the National Open AI Platform for Intelligent Supply Chain. He has published more than 90 papers and received the IEEE Trans. On Multimedia 2019 Prize Paper Award, IEEE Multimedia 2018 Best Paper Award, IEEE ICME 2016 Best Student Paper Award. He also received the 2021 Tianjin Science and Technology Progress Special Award, ACM China Rising Star Award, and Chinese Academy of Sciences Outstanding Ph.D. Thesis Award, Dean’s Special Award of Chinese Academy of Sciences. He has served as the Technical Program Chair of IEEE ICME 2022 and ACM MM Asia 2021, Associate Editor of IEEE Trans. on Multimedia, Founding Committee Member of ACM Future Computing Academy, and Area Chairs of ACM MM, AAAI, CIKM, ACL.