4-8 DECEMBER 2022
Macau SAR, China

MAIN CONFERENCE PAPER LIST

 

Paper ID

Paper Title

4

Skin tone Diagnosis in the Wild: Towards More Robust and Inclusive User Experience Using Oriented Aleatoric Uncertainty

6

Spatial Temporal Network for Image and Skeleton Based Group Activity Recognition

9

Calibrated Face Image Forgery Detection with Contrastive Representation Distillation

22

Causes of Catastrophic Forgetting in Class-Incremental Semantic Segmentation

27

RepF-Net: Distortion-aware Re-projection Fusion Network for Object Detection in Panorama Image

28

Spatio-channel Attention Blocks for Cross-modal Crowd Counting

29

Revisiting Image Pyramid Structure for High Resolution Salient Object Detection

31

CLUE: Consolidating Learned and Undergoing Experience in Domain-Incremental Classification

35

D^3: Duplicate Detection Decontaminator for Multiple Object Tracking on Sports Video

38

DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation

39

RA Loss: Relation-Aware Loss for Robust Person Re-identification

40

From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross-Modal Retrieval

50

A Differentiable Distance Approximation for Fairer Image Classification

52

HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining

56

Class Specialized Knowledge Distillation

60

Modular Degradation Simulation and Restoration for Under-Display Camera

61

APAUNet: Axis Projection Attention UNet for Small Target in 3D Medical Segmentation

65

UHD Underwater Image Enhancement via Frequency-Spatial Domain Aware Network

66

EAI-Stereo: Error Aware Iterative Network for Stereo Matching

68

Boosting Dense Long-Tailed Object Detection from Data-Centric View

77

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-modal Hashing

79

Towards Real-time High-Definition Image Snow Removal: Efficient Pyramid Network with Asymmetrical Encoder-decoder Architecture

88

Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition

90

Temporal-aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking

92

Neural Plenoptic Sampling: Learning Light-field from Thousands of Imaginary Eyes

97

3D-Yoga: A 3D Yoga Dataset for Visual-based Hierarchical Sports Action Analysis

98

Continuous Self-Study: Scene Graph Generation with Self-Knowledge Distillation and Spatial Augmentation

101

Uncertainty-Based Thin Cloud Removal Network via Conditional Variational Autoencoders

111

NEO-3DF: Novel Editing-Oriented 3D Face Creation and Reconstruction

115

Learning Internal Semantics with Expanded Categories for Generative Zero-Shot Learning

120

Group guided data association for multiple object tracking

125

GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework

126

LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing

128

Meta-Prototype Decoupled Training for Long-tailed Learning

135

Point Cloud Upsampling via Cascaded Refinement Network

136

Exposing Face Forgery Clues via Retinex-based Image Enhancement

137

Foreground-Specialized Model Imitation for Instance Segmentation

138

TriMix: A General Framework for Medical Image Segmentation from Limited Supervision

139

CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization

144

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

146

Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in CNN

147

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

148

Efficient Hardware-aware Neural Architecture Search for Image Super-resolution on Mobile Devices

150

PEDTrans: A fine-grained visual classification model for self-attention patch enhancement and dropout

153

Causal-SETR: a SEgementation TRansformer Variant based on Causal Intervention

157

Learning Using Privileged Information for Zero-Shot Action Recognition

160

SEIC: Semantic Embedding with Intermediate Classes for Zero-Shot Domain Generalization

161

Domain Generalized RPPG Network: Disentangled Feature Learning with Domain Permutation and Domain Augmentation

164

Gestalt-Guided Image Understanding for Few-Shot Learning

166

Fine-Grained Image Style Transfer with Visual Transformers

167

Diffusion Models for Counterfactual Explanations

169

PhyLoNet: Physically-Constrained Long Term Video Prediction

182

Blind Image Super-Resolution with Degradation-Aware Adaptation

186

Depth Estimation via Sparse Radar Prior and Driving Scene Semantics

189

CrossUFormer: A Cross Attention U-Shape Transformer for Low Light Image Enhancement

194

Dynamic Feature Aggregation for Efficient Video Object Detection

205

Soft Label Mining and Average Expression Anchoring for Facial Expression Recognition

206

Image Retrieval with Well-Separated Semantic Hash Centers

210

Multi-View Coupled Self-Attention Network for Pulmonary Nodules Classification

214

ADEL: Adaptive Distribution Effective-matching Method for Guiding Generators of GANs

221

Three-stage Training Pipeline with Patch Random Drop for Few-shot Object Detection

223

Vectorizing Building Blueprints

224

Complex Handwriting Trajectory Recovery: Evaluation Metrics and Algorithm

233

Occluded Facial Expression Recognition using Self-supervised Learning

246

Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation

247

Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling

248

Unsupervised 3D Shape Representation Learning using Normalizing Flow

250

Cluster Contrast for Unsupervised Person Re-Identification

253

EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network

255

MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning

264

SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model

268

Enhancing Fairness of Visual Attribute Predictors

269

Adaptive FSP : Adaptive Architecture Search with Filter Shape Pruning

273

Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

275

Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition

284

Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation

288

Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer

292

3D Pose Based Feedback For Physical Exercises

295

Multi-Branch Network with Ensemble Learning for Text Removal in the Wild

296

Few-shot Adaptive Object Detection with Cross-Domain CutMix

297

Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning

298

Cross-View Self-Fusion for Self-Supervised 3D Human Pose Estimation in the Wild

300

AutoEnhancer: Transformer on U-Net Architecture search for Underwater Image Enhancement

301

PromptLearner-CLIP: Contrastive Multi-Modal Action Representation Learning with Context Optimization

302

3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction

304

Heterogeneous Avatar Synthesis Based on Disentanglement of Topology and Rendering

307

A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation

310

Flare Transformer: Solar Flare Prediction using Magnetograms and Sunspot Physical Features

311

Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective

317

A Diffusion-ReFinement Model for Sketch-to-Point Modeling

320

TCVM: Temporal Contrasting Video Montage Framework for Self-supervised Video Representation Learning

323

Causal Property based Anti-Conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation

325

Multi-granularity Transformer for Image Super-resolution

326

MGTR: End-to-End Mutual Gaze Detection with Transformer

328

AONet: Attentional Occlusion-aware Network for Occluded Person Re-identification

329

FFD Augmentor: Towards Few-Shot Oracle Character Recognition from Scratch

330

Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation

333

Consistent Semantic Attacks on Optical Flow

336

DecisioNet - A Binary-Tree Structured Neural Network

339

Fully Transformer Network for Change Detection of Remote Sensing Images

342

Generating Multiple Hypotheses for 3D Human Mesh and Pose using Conditional Generative Adversarial Nets

344

SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis

348

'Labelling the Gaps': A Weakly Supervised Automatic Eye Gaze Estimation

352

Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection

356

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection

357

Physical Passive Patch Adversarial Attacks on Visual Odometry Systems

358

Staged Adaptive Blind Watermarking Scheme

363

Learnable Orthogonal Transformed Projection for Semi-supervised Image Classification

366

An RNN-Based Framework for the MILP Problem in Robustness Verification of Neural Networks

367

Multi-Scale Wavelet Transformer for Face Forgery Detection

368

Few-Shot Metric Learning: Online Adaptation of Embedding for Retrieval

376

Adaptive Range guided Multi-view Depth Estimation with Normal Ranking Loss

378

Slice-mask based 3D Cardiac Shape Reconstruction from CT volume

379

Self-Supervised Augmented Patches Segmentation for Anomaly Detection

381

TeCM-CLIP: Text-based Controllable Multi-attribute Face Image Manipulation

382

gScoreCAM: What objects is CLIP looking at?

384

Is an Object-Centric Video Representation Beneficial for Transfer?

385

Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data

386

Training-free NAS for 3D Point Cloud Processing

389

Gated cross word-visual attention-driven generative adversarial networks for text-to-image synthesis

393

Structure Representation Network and Uncertainty Feedback Learning for \\Dense Non-Uniform Fog Removal

395

Rethinking Low-level Features for Interest Point Detection and Description

396

DILane: Dynamic Instance-Aware Network for Lane Detection

398

BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models

400

CIRL: A Category-Instance Representation Learning Framework for Tropical Cyclone Intensity Estimation

406

Explaining Deep Neural Networks for Point Clouds using Gradient-based Visualisations

417

DualBLN: Dual Branch LUT-aware Network for Real-time Image Retouching

419

SCOAD: Signal-frame Click Supervision for Online Action Detection

420

AFF-CAM: Adaptive Frequency Filtering based Channel Attention Module

422

CSIE: Coded Strip-patterns Image Enhancement Embedded in Structured Light-based Methods

423

What Role Does Data Augmentation Play in Knowledge Distillation?

424

Inverting Adversarially Robust Networks for Image Synthesis

426

Multispectral-Based Imaging and Machine Learning for Noninvasive Blood Loss Estimation

440

Improving the Quality of Sparse-view Cone-Beam Computed Tomography via Reconstruction-Friendly Interpolation Network

441

Co-Attention Aligned Mutual Cross-Attention for Cloth-Changing Person Re-Identification

442

Rethinking Online Knowledge Distillation with Multi-Exits

447

Style Image Harmonization via Global-Local Style Mutual Guided

448

Image Denoising using Convolutional Sparse Coding Network with Dry Friction

454

Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization

459

End-to-end Surface Reconstruction For Touching Trajectories

461

3D Shape Temporal Aggregation for Video-Based Clothing-Change Person Re-identification

463

Robustizing Object Detection Networks Using Augmented Feature Pooling

466

A General Divergence Modeling Strategy for Salient Object Detection

469

Re-parameterization Making GC-Net-style 3DConvNets More Efficient

471

AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports

476

Teacher-Guided Learning for Blind Image Quality Assessment

478

PU-Transformer: Point Cloud Upsampling Transformer

482

Multi-scale Residual Interaction for RGB-D Salient Object Detection

484

Unreliability-aware Disentangling for Cross-Domain Semi-supervised Pedestrian Detection

488

Patch Embedding as Local Features: Unifying Deep Local and Global Features Via Vision Transformer for Image Retrieval

489

MSAF-Net: Multi-modal Semantic Adaptive Fusion network for 3D object detection

491

DCVQE: A Hierarchical Transformer for Video Quality Assessment

498

ElDet: An Anchor-free General Ellipse Object Detector

499

BorderNet: An Efficient Border-Attention Text Detector

508

Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition

513

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification

514

Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning

518

Feature Selective Transformer for Semantic Image Segmentation

522

QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation

524

Improving Surveillance Object Detection with Adaptive Omni-Attention over both Inter-Frame and Intra-Frame Context

526

Semi-supervised Breast Lesion Segmentation using Local Cross Triplet Loss for Ultrafast Dynamic Contrast-Enhanced MRI

530

Video Object Segmentation via Structural Feature Reconfiguration

535

MatchFormer: Interleaving Attention in Transformers for Feature Matching

537

SG-Net: Semantic Guided Network for Image Dehazing

538

DIG: Draping Implicit Garment over the Human Body

543

Synchronous Bi-Directional Pedestrian Trajectory Prediction with Error Compensation

544

DENet: Detection-driven Enhancement Network for Object Detection under Adverse Weather Conditions

545

Neural Puppeteer: Keypoint-Based Neural Rendering of Dynamic Shapes

548

Decanus to Legatus: Synthetic training for 2D-3D human pose lifting

556

Coil-Agnostic Attention-Based Network for Parallel MRI Reconstruction

557

Lightweight Image Matting via Efficient Non-Local Guidance

559

IoU-Enhanced Attention for End-to-End Task Specific Object Detection

560

CSS-Net: Classification and Substitution for Segmentation of Rotator Cuff Tear

567

Pyramidal Signed Distance Learning for Spatio-Temporal Human Shape Completion

569

MSF$^2$DB:Multi Scale Feature Fusion Dehazing Network with Dense connection

576

Rove-Tree-8: The not-so-Wild Rover, A hierarchically structured image dataset for deep metric learning research

577

Robust Human Matting via Segmentation Guidance

580

Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image

584

Self-Supervised Dehazing Network Using Physical Priors

588

SWPT: Spherical Window-based Point Cloud Transformer

591

RGB Road Scene Material Segmentation

592

Self-Distilled Vision Transformer for Domain Generalization

594

Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

595

CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition

597

Exemplar Free Class Agnostic Counting

599

Looking from a Higher-level Perspective: Attention and Recognition Enhanced Multi-scale Scene Text Segmentation

603

LHDR: HDR Reconstruct for Legacy Content using a Lightweight DNN

604

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

626

Conditional GAN for Point Cloud Generation

628

Progressive Attentional Manifold Alignment for Arbitrary Style Transfer

629

From Within to Between: Knowledge Distillation for Cross Modality Retrieval

633

DreamNet: A Deep Riemannian Manifold Network for SPD Matrix Learning

636

Domain Generalized Person Re-identification by Locating and Eliminating Domain-Sensitive Features

639

ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning

649

PointFormer: A Dual Perception Attention-based Network for Point Cloud Classification

651

FunnyNet: Audiovisual Learning of Funny Moments in Videos

652

UTB180: A High-quality Benchmark for Underwater Tracking

653

Affinity-Aware Relation Network for Oriented Object Detection in Aerial Images

654

HAZE-Net: High-Frequency Attentive Super-Resolved Gaze Estimation in Low-Resolution Face Images

655

LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

656

Cross-Architecture Knowledge Distillation

660

Cross-Domain Local Characteristic Enhanced Deepfake Video Detection

665

Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation

671

Neural Residual Flow Fields for Efficient Video Representations

673

CV4Code: Sourcecode Understanding via Visual Code Representations

676

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval

677

A simple strategy to make neural networks provably invariant

681

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

682

Visual Explanation Generation Based on Lambda Attention Branch Networks

686

SCFNet: A Spatial-Channel Features Network based on Heterocentric Sample Loss for Visible-Infrared Person Re-Identification

690

NoiseTransfer: Image Noise Generation with Contrastive Embeddings

694

The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization

695

PathTR: Context-Aware Memory Transformer for Tumor Localization in Gigapixel Pathology Images

699

Scale Adaptive Fusion Network for RGB-D Salient Object Detection

700

Decoupling identity and visual quality for image and video anonymization

704

Thinking Hallucination for Video Captioning

706

Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection

712

MVFI-Net: Motion-aware Video Frame Interpolation Network

714

Region-of-interest Attentive Heteromodal Variational Encoder-Decoder for Segmentation with Missing Modalities

739

CCLSL: Combination of Contrastive Learning and Supervised Learning for Handwritten Mathematical Expression Recognition

744

PatchFlow: A Two-Stage Patch-Based Approach for Lightweight Optical Flow Estimation

745

Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis

746

Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects

747

Learning Texture Enhancement Prior with Deep Unfolding Network for Snapshot Compressive Imaging

751

Exp-GAN: 3D-Aware Facial Image Generation with Expression Control

753

PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search

762

Light Attenuation and Color Fluctuation for Underwater Image Restoration

765

Weighted Contrative Hashing

767

Neural Network Panning: Screening the Optimal Sparse Network Before Training

768

COLLIDER: A Robust Training Framework for Backdoor Data

769

Unsupervised Online Hashing with Multi-Bit Quantization

774

Action Representing by Constrained Conditional Mutual Information

784

Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement

791

Unified Learning of Multipurpose Energy Based Generative Hashing Network via Dual-Buffer MCMC Teaching for Supervised Image Hashing

793

DHG-GAN: Diverse Image Outpainting via Decoupled High Frequency Semantics

796

Learning and Transforming General Representations to Break Down Stability-Plasticity Dilemma

802

A Compressive Prior Guided Mask Predictive Coding Approach for Video Analysis

804

BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation

806

Network Pruning via Feature Shift Minimization

808

PPR-Net: Patch-based multi-scale pyramid registration network for defect detection of printed labels

814

A^2: Adaptive Augmentation for Effectively Mitigating Dataset Bias

818

Exploring Adversarially Robust Training for Unsupervised Domain Adaptation

824

RDRN: Recursively Defined Residual Network for Image Super-Resolution

826

MUSH: Multi-Scale Hierarchical Feature Extraction for Semantic Image Synthesis

832

Two-stage Multimodality Fusion for High-performance Text-based Visual Question Answering

848

Shape Prior is Not All You Need: Discovering Balance between Texture and Shape bias in CNN

860

Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action Recognition

862

A Prototype-Oriented Contrastive Adaption Network For Cross-domain Facial Expression Recognition

865

Content-Aware Hierarchical Representation Selection for Cross-View Geo-Localization

867

Learning to Predict Decomposed Dynamic Filters for Single Image Motion Deblurring

880

HaViT: Hybrid-attention based Vision Transformer for Video Classification

882

Learning Common and Specific Visual Prompts for Domain Generalization

884

Training Dynamics Aware Neural Network Optimization with Stabilization

891

Filter Pruning via Automatic Pruning Rate Search

894

Spotlights: Probing Shapes from Spherical Viewpoints

895

SAC-GAN : Face Image Inpainting with Spatial-aware Attribute Controllable GAN

897

Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search

914

DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving

916

PBCStereo: A Compressed Stereo Network with Pure Binary Convolutional Operations

918

A Lightweight Local-Global Attention Network for Single Image Super-Resolution

926

MGRLN-Net: Mask-guided Residual Learning Network for Joint Single-Image Shadow Detection and Removal

929

Social Aware Multi-Modal Pedestrian Crossing Behavior Prediction

935

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition

938

Structure Guided Proposal Completion for 3D Object Detection

942

KinStyle: A Strong Baseline Photorealistic Kinship Face Synthesis with An Optimized StyleGAN Encoder

946

Comparing Complexities of Decision Boundaries for Robust Training: A Universal Approach

953

Tracking Small and Fast Moving Objects: A Benchmark

956

Deep Active Ensemble Sampling For Image Classification

957

Super-attention for exemplar-based image colorization

962

Multi-stream Fusion for Class Incremental Learning in Pill Image Classification

963

Compressed Vision for Efficient Video Understanding

980

Multi-modal Characteristic Guided Depth Completion Network

994

Bright as the Sun: In-depth Analysis of Imagination-driven Image Captioning

1000

Semi-Supervised Semantic Segmentation with Uncertainty-guided Self Cross Supervision

1011

A Cylindrical Convolution Network for Dense Top-View Semantic Segmentation with LiDAR Point Clouds

1016

Heterogeneous Interactive Learning Network for Unsupervised Cross-modal Retrieval

1041

Decision-Based Black-box Attack Specific to Large-Size Images

1044

Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks

1052

Energy-Efficient Image Processing Using Binary Neural Networks with Hadamard Transforms