List of workshops:



Multimedia Understanding with Pre-trained Models

http://staff.ustc.edu.cn/~zhwg/MMAsia_2022_workshop/index.html

Overview

Multi-modal understanding plays a crucial role in enabling the machine to perceive the physical world with multiple sensor cues as humans. Recently, large-scale pre-trained models (PTMs) has become a research hotspot in the field of artificial intelligence. Existing techniques follow the self-supervised learning paradigm achieve great success on the uni-modal scenes, such as computer vision (CV) and natural language process (NLP). The recent advances in large-scale pre-trained models inspire the researchers to explore more and deeper pre-training techniques for the multi-modal understanding problem. In this workshop, we aim to bring together researchers from the field of multimedia to discuss recent research and future directions on pre-trained models with self-supervised learning for multimedia understanding.

In recent years, we have witnessed the great success of pre-trained models (PTM) in natural language processing (NLP), such as GPT3, BERT, Roberta, DEBERTA, etc. It motivates the researchers in the multimedia community to leverage the idea of PTM to address multi-modal tasks. The scope of this workshop is focused on pre-trained models with self-supervised learning for multimedia understanding. The potential topics include architecture design for multi-modal PTM, pre-text task design for self-supervised learning, multi-modal data modeling, efficiency enhancing for PTM, interpretability of PTM, etc.

Call for Papers

Multi-modal understanding plays a crucial role in enabling the machine to perceive the physical world with multiple sensor cues as humans. Recently, large-scale pre-trained models (PTMs) has become a research hotspot in the field of artificial intelligence. Existing techniques follow the self-supervised learning paradigm achieve great success on the uni-modal scenes, such as computer vision (CV) and natural language process (NLP). The recent advances in large-scale pre-trained models inspire the researchers to explore more and deeper pre-training techniques for the multi-modal understanding problem. In this workshop, we aim to bring together researchers from the field of multimedia to discuss recent research and future directions on pre-trained models with self-supervised learning for multimedia understanding.

  • Unified PTM strategies for multi-modal understanding
  • PTM for cross-modal matching and retrieval
  • PTM for audio-visual understanding
  • PTM for video captioning
  • PTM for sign language translation
  • Leveraging off-the-shelf PTM for multi-modal understanding
  • Interpretability in self-supervised PTM

Paper Submission Guideline

To be announced.



The 7th International Workshop on Affective Social Multimedia Computing

http://asmmc22.ubtrobot.com/#

Call for Papers

Affective social multimedia computing is an emergent research topic for both affective computing and multimedia research communities. Social multimedia is fundamentally changing how we communicate, interact, and collaborate with other people in our daily lives. Comparing with well-organized broadcast news and professionally made videos such as commercials, TV shows, and movies, social multimedia media computing imposes great challenges to research communities. Social multimedia contains much affective information. Effective extraction of affective information from social multimedia can greatly help social multimedia computing (e.g., processing, index, retrieval, and understanding). Although much progress have been made in traditional multimedia research on multimedia content analysis, indexing, and retrieval based on subjective concepts such as emotion, aesthetics, and preference, affective social multimedia computing is a new research area. The affective social multimedia computing aims to proceed affective information from social multi-media. For massive and heterogeneous social media data, the research requires multidisciplinary understanding of content and perceptual cues from social multimedia. From the multimedia perspective, the research relies on the theoretical and technological findings in affective computing, machine learning, pattern recognition, signal/multimedia processing, computer vision, speech processing, behavior and social psychology. Affective analysis of social multimedia is attracting growing attention from industry and businesses that provide social networking sites, content-sharing services, distribute and host the media. This workshop focuses on the analysis of affective signals in interaction (multimodal analyses enabling artificial agents in Human-Machine Interaction, social Interaction with artificial agents) and social multimedia (e.g., twitter, wechat, weibo, youtube, facebook, etc).

The 1st, 2nd, 3rd, 4th, 5th , 6th ASMMC workshop have been successfully held in Xi’an, China on September 21, 2015, Seattle, USA on July 15, 2016, Stockholm, Sweden on August 25, 2017, Seoul, Korea on October 26, 2018, and Cambridge, UK on July 2, 2019, Virtual conference (Montreal, Canada) on October 11, 2021 respectively. We take the 7th ASMMC to ACM Multimedia Asia 2022 come back again to Affective Computing & Intelligent Interaction for investigating affective computing technology to become available and accessible to education, health, transport, cities, home and entertainments.

Workshop Scope

The workshop will address, but is not limited to, the following topics:

  • Affective human-machine interaction or human-human interaction
  • Affective/Emotional content analysis of images, videos, music, metadata (text, symbols, etc.)
  • Affective indexing, ranking, and retrieval on big social media data
  • Affective computing in social multimedia by multimodal integration (face expression, gesture, posture, speech, text/language)
  • Emotional implicit tagging and interactive systems
  • User interests and behavior modeling in social multimedia
  • Video and image summarization based on affect
  • Affective analysis of social media and harvesting the affective response of crowd
  • Affective generation in social multimedia, expressive text-to-speech and expressive language translation
  • Zero/One/Few-shot learning for emotion recognition
  • Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction
  • Social Interaction with Artificial Agents
  • Applications of affective social multimedia computing

Paper Submission Guideline

To be announced.