Program at a glance
December 13
09:15
Registration
09:45
11:45
Workshop: Multimedia Understanding with Pre-trained Models
13:00
15:00
Tutorial 1: Synthetic Data and Multimedia
15:00
17:00
Tutorial 2: Human-centric Visual Understanding
December 14
09:45
10:00
Opening
10:00
11:40
Oral 1-1 Award Session
13:00
14:00
Keynote 1: Machine Learning for Creative Workflow
14:00
14:30
Demo Spotlight
14:30
15:45
Poster+Demo
16:00
17:00
Oral 1-2 Text, Speech, and Vision
Program
December 13
Registration
December 14
Opening
Oral 1-1 Award Session
89 TFM a Dataset for Detection and Recognition of Masked Faces in the Wild
93 Deep Image and Kernel Prior Learning for Blind Super-Resolution
42 Asymmetric Label Propagation for Video Object Segmentation
39 Informative Sample-Aware Proxy for Deep Metric Learning
83 Federated Knowledge Transfer for Heterogeneous Visual Models
Demo Spotlight
100 A Music Loop Sequencer with User-adaptive Music Loop Selection
105 Action Detection System based on Pose Information
106 DeepHair: a DeepFake-based Hairstyle Preview System
107 Emotional Talking Faces: Making Videos More Expressive and Realistic
109 FoodLog Athl: Multimedia Food Recording Platform for Dietary Guidance and Food Monitoring
110 Rubber material retrieval system using electron microscope images for rubber material development
104 JamSketch Deep α: A CNN-based Improvisation System in Accordance with User's Melodic Outline Drawing
108 GSTH266enc: A GStreamer plugin for VVC encoder
101 Intelligent Video Surveillance Platform Based On FFmpeg And Yolov5
Poster+Demo
100 A Music Loop Sequencer with User-adaptive Music Loop Selection
105 Action Detection System based on Pose Information
106 DeepHair: a DeepFake-based Hairstyle Preview System
107 Emotional Talking Faces: Making Videos More Expressive and Realistic
109 FoodLog Athl: Multimedia Food Recording Platform for Dietary Guidance and Food Monitoring
110 Rubber material retrieval system using electron microscope images for rubber material development
104 JamSketch Deep α: A CNN-based Improvisation System in Accordance with User's Melodic Outline Drawing
108 GSTH266enc: A GStreamer plugin for VVC encoder
101 Intelligent Video Surveillance Platform Based On FFmpeg And Yolov5
89 TFM a Dataset for Detection and Recognition of Masked Faces in the Wild
93 Deep Image and Kernel Prior Learning for Blind Super-Resolution
42 Asymmetric Label Propagation for Video Object Segmentation
39 Informative Sample-Aware Proxy for Deep Metric Learning
83 Federated Knowledge Transfer for Heterogeneous Visual Models
1 An End-to-End Scene Text Detector with Dynamic Attention
50 Self-Attentive CLIP Hashing for Unsupervised Cross-Modal Retrieval
66 Affective Embedding Framework with Semantic Representations from Tweets for Zero-shot Visual Sentiment Prediction
16 SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision
6 Robust Learning with Adversarial Perturbations and Label Noise: A Two-Pronged Defense Approach
70 Enhancing the Robustness of Deep Learning Based Fingerprinting to Improve Deepfake Attribution
65 Disentangled Image Attribute Editing in Latent Space via Mask-based Retention Loss
36 ObjectMix: Data Augmentation by Copy-Pasting Objects in Videos for Action Recognition
20 CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection
Oral 1-2 Text, Speech, and Vision
66 Affective Embedding Framework with Semantic Representations from Tweets for Zero-shot Visual Sentiment Prediction
16 SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision
50 Self-Attentive CLIP Hashing for Unsupervised Cross-Modal Retrieval
1 An End-to-End Scene Text Detector with Dynamic Attention
December 15
Oral 2-1 Video Compression, Broadcasting, and Analysis
11 Human-Avatar Interaction in Metaverse: Framework for Full-body Interaction
49 Parallel Queries for Human-Object Interaction Detection
95 Sequential Frame-Interpolation and DCT-based Video Compression Framework
25 360BroadView: Viewer Management for Viewport Prediction in 360-Degree Video Live Broadcast
72 Two-Layer Learning-based P-Frame Coding with Super-Resolution and Content-Adaptive Conditional ANF
74 Learned Bi-Directional Motion Prediction for Video Compression
Short Spotlight
9 A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition
34 SLGAN: Style- and Latent-guided Generative Adversarial Network for Desirable Makeup Transfer and Removal
64 Popularity-aware Graph Social Recommendation for Fully Non-Interaction Users
28 Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images
45 Zero-shot Font Style Transfer with a Differentiable Renderer
98 Wearable Camera Based Food Logging System
90 Graph Neural Network Based Living Comfort Prediction Using Real Estate Floor Plan Images
48 Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices
24 A Reality Check of Positioning in Multiuser Mobile Augmented Reality: Measurement and Analysis
59 Towards High Performance One-Stage Human Pose Estimation
85 Singing Voice Detection via Similarity-based Semi-supervised Learning Method
Poster+Demo
100 A Music Loop Sequencer with User-adaptive Music Loop Selection
105 Action Detection System based on Pose Information
106 DeepHair: a DeepFake-based Hairstyle Preview System
107 Emotional Talking Faces: Making Videos More Expressive and Realistic
109 FoodLog Athl: Multimedia Food Recording Platform for Dietary Guidance and Food Monitoring
110 Rubber material retrieval system using electron microscope images for rubber material development
104 JamSketch Deep α: A CNN-based Improvisation System in Accordance with User's Melodic Outline Drawing
108 GSTH266enc: A GStreamer plugin for VVC encoder
101 Intelligent Video Surveillance Platform Based On FFmpeg And Yolov5
9 A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition
34 SLGAN: Style- and Latent-guided Generative Adversarial Network for Desirable Makeup Transfer and Removal
64 Popularity-aware Graph Social Recommendation for Fully Non-Interaction Users
28 Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images
45 Zero-shot Font Style Transfer with a Differentiable Renderer
98 Wearable Camera Based Food Logging System
90 Graph Neural Network Based Living Comfort Prediction Using Real Estate Floor Plan Images
48 Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices
24 A Reality Check of Positioning in Multiuser Mobile Augmented Reality: Measurement and Analysis
59 Towards High Performance One-Stage Human Pose Estimation
85 Singing Voice Detection via Similarity-based Semi-supervised Learning Method
11 Human-Avatar Interaction in Metaverse: Framework for Full-body Interaction
49 Parallel Queries for Human-Object Interaction Detection
95 Sequential Frame-Interpolation and DCT-based Video Compression Framework
25 360BroadView: Viewer Management for Viewport Prediction in 360-Degree Video Live Broadcast
72 Two-Layer Learning-based P-Frame Coding with Super-Resolution and Content-Adaptive Conditional ANF
74 Learned Bi-Directional Motion Prediction for Video Compression
55 Deep Enhancement-Object Features Fusion for Low-light Object Detection
7 Image Compression for Machines Using Boundary-Enhanced Saliency
30 Deep Weighted Guided Upsampling Network for Depth of Field Image Upsampling
57 Multispectral Image Denoising Via Structural Tensor Sparsity Promoting Model
53 Multi-scale Channel Transformer Network for Single Image Deraining
67 Remote sensing image colorization based on Joint Stream Deep Convolutional Generative Adversarial Networks
86 On the Robustness of 3D Object Detectors
December 16
Oral 3-1 Low-level Vision and Image Processing
55 Deep Enhancement-Object Features Fusion for Low-light Object Detection
7 Image Compression for Machines Using Boundary-Enhanced Saliency
30 Deep Weighted Guided Upsampling Network for Depth of Field Image Upsampling
57 Multispectral Image Denoising Via Structural Tensor Sparsity Promoting Model
53 Multi-scale Channel Transformer Network for Single Image Deraining
67 Remote sensing image colorization based on Joint Stream Deep Convolutional Generative Adversarial Networks
Oral 3-2 Robustness, Data Augmentation and Disentangling
86 On the Robustness of 3D Object Detectors
6 Robust Learning with Adversarial Perturbations and Label Noise: A Two-Pronged Defense Approach
70 Enhancing the Robustness of Deep Learning Based Fingerprinting to Improve Deepfake Attribution
65 Disentangled Image Attribute Editing in Latent Space via Mask-based Retention Loss
36 ObjectMix: Data Augmentation by Copy-Pasting Objects in Videos for Action Recognition
20 CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection
Closing
