MAIN CONFERENCE PAPER LIST

Paper ID	Paper Title
4	Skin tone Diagnosis in the Wild: Towards More Robust and Inclusive User Experience Using Oriented Aleatoric Uncertainty
6	Spatial Temporal Network for Image and Skeleton Based Group Activity Recognition
9	Calibrated Face Image Forgery Detection with Contrastive Representation Distillation
22	Causes of Catastrophic Forgetting in Class-Incremental Semantic Segmentation
27	RepF-Net: Distortion-aware Re-projection Fusion Network for Object Detection in Panorama Image
28	Spatio-channel Attention Blocks for Cross-modal Crowd Counting
29	Revisiting Image Pyramid Structure for High Resolution Salient Object Detection
31	CLUE: Consolidating Learned and Undergoing Experience in Domain-Incremental Classification
35	D^3: Duplicate Detection Decontaminator for Multiple Object Tracking on Sports Video
38	DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation
39	RA Loss: Relation-Aware Loss for Robust Person Re-identification
40	From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross-Modal Retrieval
50	A Differentiable Distance Approximation for Fairer Image Classification
52	HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining
56	Class Specialized Knowledge Distillation
60	Modular Degradation Simulation and Restoration for Under-Display Camera
61	APAUNet: Axis Projection Attention UNet for Small Target in 3D Medical Segmentation
65	UHD Underwater Image Enhancement via Frequency-Spatial Domain Aware Network
66	EAI-Stereo: Error Aware Iterative Network for Stereo Matching
68	Boosting Dense Long-Tailed Object Detection from Data-Centric View
77	Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-modal Hashing
79	Towards Real-time High-Definition Image Snow Removal: Efficient Pyramid Network with Asymmetrical Encoder-decoder Architecture
88	Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition
90	Temporal-aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking
92	Neural Plenoptic Sampling: Learning Light-field from Thousands of Imaginary Eyes
97	3D-Yoga: A 3D Yoga Dataset for Visual-based Hierarchical Sports Action Analysis
98	Continuous Self-Study: Scene Graph Generation with Self-Knowledge Distillation and Spatial Augmentation
101	Uncertainty-Based Thin Cloud Removal Network via Conditional Variational Autoencoders
111	NEO-3DF: Novel Editing-Oriented 3D Face Creation and Reconstruction
115	Learning Internal Semantics with Expanded Categories for Generative Zero-Shot Learning
120	Group guided data association for multiple object tracking
125	GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework
126	LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing
128	Meta-Prototype Decoupled Training for Long-tailed Learning
135	Point Cloud Upsampling via Cascaded Refinement Network
136	Exposing Face Forgery Clues via Retinex-based Image Enhancement
137	Foreground-Specialized Model Imitation for Instance Segmentation
138	TriMix: A General Framework for Medical Image Segmentation from Limited Supervision
139	CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization
144	GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification
146	Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in CNN
147	Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos
148	Efficient Hardware-aware Neural Architecture Search for Image Super-resolution on Mobile Devices
150	PEDTrans: A fine-grained visual classification model for self-attention patch enhancement and dropout
153	Causal-SETR: a SEgementation TRansformer Variant based on Causal Intervention
157	Learning Using Privileged Information for Zero-Shot Action Recognition
160	SEIC: Semantic Embedding with Intermediate Classes for Zero-Shot Domain Generalization
161	Domain Generalized RPPG Network: Disentangled Feature Learning with Domain Permutation and Domain Augmentation
164	Gestalt-Guided Image Understanding for Few-Shot Learning
166	Fine-Grained Image Style Transfer with Visual Transformers
167	Diffusion Models for Counterfactual Explanations
169	PhyLoNet: Physically-Constrained Long Term Video Prediction
182	Blind Image Super-Resolution with Degradation-Aware Adaptation
186	Depth Estimation via Sparse Radar Prior and Driving Scene Semantics
189	CrossUFormer: A Cross Attention U-Shape Transformer for Low Light Image Enhancement
194	Dynamic Feature Aggregation for Efficient Video Object Detection
205	Soft Label Mining and Average Expression Anchoring for Facial Expression Recognition
206	Image Retrieval with Well-Separated Semantic Hash Centers
210	Multi-View Coupled Self-Attention Network for Pulmonary Nodules Classification
214	ADEL: Adaptive Distribution Effective-matching Method for Guiding Generators of GANs
221	Three-stage Training Pipeline with Patch Random Drop for Few-shot Object Detection
223	Vectorizing Building Blueprints
224	Complex Handwriting Trajectory Recovery: Evaluation Metrics and Algorithm
233	Occluded Facial Expression Recognition using Self-supervised Learning
246	Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation
247	Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling
248	Unsupervised 3D Shape Representation Learning using Normalizing Flow
250	Cluster Contrast for Unsupervised Person Re-Identification
253	EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network
255	MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning
264	SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model
268	Enhancing Fairness of Visual Attribute Predictors
269	Adaptive FSP : Adaptive Architecture Search with Filter Shape Pruning
273	Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
275	Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition
284	Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation
288	Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer
292	3D Pose Based Feedback For Physical Exercises
295	Multi-Branch Network with Ensemble Learning for Text Removal in the Wild
296	Few-shot Adaptive Object Detection with Cross-Domain CutMix
297	Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning
298	Cross-View Self-Fusion for Self-Supervised 3D Human Pose Estimation in the Wild
300	AutoEnhancer: Transformer on U-Net Architecture search for Underwater Image Enhancement
301	PromptLearner-CLIP: Contrastive Multi-Modal Action Representation Learning with Context Optimization
302	3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
304	Heterogeneous Avatar Synthesis Based on Disentanglement of Topology and Rendering
307	A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation
310	Flare Transformer: Solar Flare Prediction using Magnetograms and Sunspot Physical Features
311	Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective
317	A Diffusion-ReFinement Model for Sketch-to-Point Modeling
320	TCVM: Temporal Contrasting Video Montage Framework for Self-supervised Video Representation Learning
323	Causal Property based Anti-Conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation
325	Multi-granularity Transformer for Image Super-resolution
326	MGTR: End-to-End Mutual Gaze Detection with Transformer
328	AONet: Attentional Occlusion-aware Network for Occluded Person Re-identification
329	FFD Augmentor: Towards Few-Shot Oracle Character Recognition from Scratch
330	Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation
333	Consistent Semantic Attacks on Optical Flow
336	DecisioNet - A Binary-Tree Structured Neural Network
339	Fully Transformer Network for Change Detection of Remote Sensing Images
342	Generating Multiple Hypotheses for 3D Human Mesh and Pose using Conditional Generative Adversarial Nets
344	SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis
348	'Labelling the Gaps': A Weakly Supervised Automatic Eye Gaze Estimation
352	Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection
356	ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection
357	Physical Passive Patch Adversarial Attacks on Visual Odometry Systems
358	Staged Adaptive Blind Watermarking Scheme
363	Learnable Orthogonal Transformed Projection for Semi-supervised Image Classification
366	An RNN-Based Framework for the MILP Problem in Robustness Verification of Neural Networks
367	Multi-Scale Wavelet Transformer for Face Forgery Detection
368	Few-Shot Metric Learning: Online Adaptation of Embedding for Retrieval
376	Adaptive Range guided Multi-view Depth Estimation with Normal Ranking Loss
378	Slice-mask based 3D Cardiac Shape Reconstruction from CT volume
379	Self-Supervised Augmented Patches Segmentation for Anomaly Detection
381	TeCM-CLIP: Text-based Controllable Multi-attribute Face Image Manipulation
382	gScoreCAM: What objects is CLIP looking at?
384	Is an Object-Centric Video Representation Beneficial for Transfer?
385	Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data
386	Training-free NAS for 3D Point Cloud Processing
389	Gated cross word-visual attention-driven generative adversarial networks for text-to-image synthesis
393	Structure Representation Network and Uncertainty Feedback Learning for \\Dense Non-Uniform Fog Removal
395	Rethinking Low-level Features for Interest Point Detection and Description
396	DILane: Dynamic Instance-Aware Network for Lane Detection
398	BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models
400	CIRL: A Category-Instance Representation Learning Framework for Tropical Cyclone Intensity Estimation
406	Explaining Deep Neural Networks for Point Clouds using Gradient-based Visualisations
417	DualBLN: Dual Branch LUT-aware Network for Real-time Image Retouching
419	SCOAD: Signal-frame Click Supervision for Online Action Detection
420	AFF-CAM: Adaptive Frequency Filtering based Channel Attention Module
422	CSIE: Coded Strip-patterns Image Enhancement Embedded in Structured Light-based Methods
423	What Role Does Data Augmentation Play in Knowledge Distillation?
424	Inverting Adversarially Robust Networks for Image Synthesis
426	Multispectral-Based Imaging and Machine Learning for Noninvasive Blood Loss Estimation
440	Improving the Quality of Sparse-view Cone-Beam Computed Tomography via Reconstruction-Friendly Interpolation Network
441	Co-Attention Aligned Mutual Cross-Attention for Cloth-Changing Person Re-Identification
442	Rethinking Online Knowledge Distillation with Multi-Exits
447	Style Image Harmonization via Global-Local Style Mutual Guided
448	Image Denoising using Convolutional Sparse Coding Network with Dry Friction
454	Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization
459	End-to-end Surface Reconstruction For Touching Trajectories
461	3D Shape Temporal Aggregation for Video-Based Clothing-Change Person Re-identification
463	Robustizing Object Detection Networks Using Augmented Feature Pooling
466	A General Divergence Modeling Strategy for Salient Object Detection
469	Re-parameterization Making GC-Net-style 3DConvNets More Efficient
471	AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports
476	Teacher-Guided Learning for Blind Image Quality Assessment
478	PU-Transformer: Point Cloud Upsampling Transformer
482	Multi-scale Residual Interaction for RGB-D Salient Object Detection
484	Unreliability-aware Disentangling for Cross-Domain Semi-supervised Pedestrian Detection
488	Patch Embedding as Local Features: Unifying Deep Local and Global Features Via Vision Transformer for Image Retrieval
489	MSAF-Net: Multi-modal Semantic Adaptive Fusion network for 3D object detection
491	DCVQE: A Hierarchical Transformer for Video Quality Assessment
498	ElDet: An Anchor-free General Ellipse Object Detector
499	BorderNet: An Efficient Border-Attention Text Detector
508	Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition
513	Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification
514	Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning
518	Feature Selective Transformer for Semantic Image Segmentation
522	QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation
524	Improving Surveillance Object Detection with Adaptive Omni-Attention over both Inter-Frame and Intra-Frame Context
526	Semi-supervised Breast Lesion Segmentation using Local Cross Triplet Loss for Ultrafast Dynamic Contrast-Enhanced MRI
530	Video Object Segmentation via Structural Feature Reconfiguration
535	MatchFormer: Interleaving Attention in Transformers for Feature Matching
537	SG-Net: Semantic Guided Network for Image Dehazing
538	DIG: Draping Implicit Garment over the Human Body
543	Synchronous Bi-Directional Pedestrian Trajectory Prediction with Error Compensation
544	DENet: Detection-driven Enhancement Network for Object Detection under Adverse Weather Conditions
545	Neural Puppeteer: Keypoint-Based Neural Rendering of Dynamic Shapes
548	Decanus to Legatus: Synthetic training for 2D-3D human pose lifting
556	Coil-Agnostic Attention-Based Network for Parallel MRI Reconstruction
557	Lightweight Image Matting via Efficient Non-Local Guidance
559	IoU-Enhanced Attention for End-to-End Task Specific Object Detection
560	CSS-Net: Classification and Substitution for Segmentation of Rotator Cuff Tear
567	Pyramidal Signed Distance Learning for Spatio-Temporal Human Shape Completion
569	MSF$^2$DB:Multi Scale Feature Fusion Dehazing Network with Dense connection
576	Rove-Tree-8: The not-so-Wild Rover, A hierarchically structured image dataset for deep metric learning research
577	Robust Human Matting via Segmentation Guidance
580	Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image
584	Self-Supervised Dehazing Network Using Physical Priors
588	SWPT: Spherical Window-based Point Cloud Transformer
591	RGB Road Scene Material Segmentation
592	Self-Distilled Vision Transformer for Domain Generalization
594	Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis
595	CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition
597	Exemplar Free Class Agnostic Counting
599	Looking from a Higher-level Perspective: Attention and Recognition Enhanced Multi-scale Scene Text Segmentation
603	LHDR: HDR Reconstruct for Legacy Content using a Lightweight DNN
604	RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
626	Conditional GAN for Point Cloud Generation
628	Progressive Attentional Manifold Alignment for Arbitrary Style Transfer
629	From Within to Between: Knowledge Distillation for Cross Modality Retrieval
633	DreamNet: A Deep Riemannian Manifold Network for SPD Matrix Learning
636	Domain Generalized Person Re-identification by Locating and Eliminating Domain-Sensitive Features
639	ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning
649	PointFormer: A Dual Perception Attention-based Network for Point Cloud Classification
651	FunnyNet: Audiovisual Learning of Funny Moments in Videos
652	UTB180: A High-quality Benchmark for Underwater Tracking
653	Affinity-Aware Relation Network for Oriented Object Detection in Aerial Images
654	HAZE-Net: High-Frequency Attentive Super-Resolved Gaze Estimation in Low-Resolution Face Images
655	LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation
656	Cross-Architecture Knowledge Distillation
660	Cross-Domain Local Characteristic Enhanced Deepfake Video Detection
665	Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation
671	Neural Residual Flow Fields for Efficient Video Representations
673	CV4Code: Sourcecode Understanding via Visual Code Representations
676	ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
677	A simple strategy to make neural networks provably invariant
681	Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward
682	Visual Explanation Generation Based on Lambda Attention Branch Networks
686	SCFNet: A Spatial-Channel Features Network based on Heterocentric Sample Loss for Visible-Infrared Person Re-Identification
690	NoiseTransfer: Image Noise Generation with Contrastive Embeddings
694	The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization
695	PathTR: Context-Aware Memory Transformer for Tumor Localization in Gigapixel Pathology Images
699	Scale Adaptive Fusion Network for RGB-D Salient Object Detection
700	Decoupling identity and visual quality for image and video anonymization
704	Thinking Hallucination for Video Captioning
706	Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection
712	MVFI-Net: Motion-aware Video Frame Interpolation Network
714	Region-of-interest Attentive Heteromodal Variational Encoder-Decoder for Segmentation with Missing Modalities
739	CCLSL: Combination of Contrastive Learning and Supervised Learning for Handwritten Mathematical Expression Recognition
744	PatchFlow: A Two-Stage Patch-Based Approach for Lightweight Optical Flow Estimation
745	Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis
746	Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects
747	Learning Texture Enhancement Prior with Deep Unfolding Network for Snapshot Compressive Imaging
751	Exp-GAN: 3D-Aware Facial Image Generation with Expression Control
753	PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search
762	Light Attenuation and Color Fluctuation for Underwater Image Restoration
765	Weighted Contrative Hashing
767	Neural Network Panning: Screening the Optimal Sparse Network Before Training
768	COLLIDER: A Robust Training Framework for Backdoor Data
769	Unsupervised Online Hashing with Multi-Bit Quantization
774	Action Representing by Constrained Conditional Mutual Information
784	Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement
791	Unified Learning of Multipurpose Energy Based Generative Hashing Network via Dual-Buffer MCMC Teaching for Supervised Image Hashing
793	DHG-GAN: Diverse Image Outpainting via Decoupled High Frequency Semantics
796	Learning and Transforming General Representations to Break Down Stability-Plasticity Dilemma
802	A Compressive Prior Guided Mask Predictive Coding Approach for Video Analysis
804	BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation
806	Network Pruning via Feature Shift Minimization
808	PPR-Net: Patch-based multi-scale pyramid registration network for defect detection of printed labels
814	A^2: Adaptive Augmentation for Effectively Mitigating Dataset Bias
818	Exploring Adversarially Robust Training for Unsupervised Domain Adaptation
824	RDRN: Recursively Defined Residual Network for Image Super-Resolution
826	MUSH: Multi-Scale Hierarchical Feature Extraction for Semantic Image Synthesis
832	Two-stage Multimodality Fusion for High-performance Text-based Visual Question Answering
848	Shape Prior is Not All You Need: Discovering Balance between Texture and Shape bias in CNN
860	Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action Recognition
862	A Prototype-Oriented Contrastive Adaption Network For Cross-domain Facial Expression Recognition
865	Content-Aware Hierarchical Representation Selection for Cross-View Geo-Localization
867	Learning to Predict Decomposed Dynamic Filters for Single Image Motion Deblurring
880	HaViT: Hybrid-attention based Vision Transformer for Video Classification
882	Learning Common and Specific Visual Prompts for Domain Generalization
884	Training Dynamics Aware Neural Network Optimization with Stabilization
891	Filter Pruning via Automatic Pruning Rate Search
894	Spotlights: Probing Shapes from Spherical Viewpoints
895	SAC-GAN : Face Image Inpainting with Spatial-aware Attribute Controllable GAN
897	Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search
914	DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving
916	PBCStereo: A Compressed Stereo Network with Pure Binary Convolutional Operations
918	A Lightweight Local-Global Attention Network for Single Image Super-Resolution
926	MGRLN-Net: Mask-guided Residual Learning Network for Joint Single-Image Shadow Detection and Removal
929	Social Aware Multi-Modal Pedestrian Crossing Behavior Prediction
935	OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition
938	Structure Guided Proposal Completion for 3D Object Detection
942	KinStyle: A Strong Baseline Photorealistic Kinship Face Synthesis with An Optimized StyleGAN Encoder
946	Comparing Complexities of Decision Boundaries for Robust Training: A Universal Approach
953	Tracking Small and Fast Moving Objects: A Benchmark
956	Deep Active Ensemble Sampling For Image Classification
957	Super-attention for exemplar-based image colorization
962	Multi-stream Fusion for Class Incremental Learning in Pill Image Classification
963	Compressed Vision for Efficient Video Understanding
980	Multi-modal Characteristic Guided Depth Completion Network
994	Bright as the Sun: In-depth Analysis of Imagination-driven Image Captioning
1000	Semi-Supervised Semantic Segmentation with Uncertainty-guided Self Cross Supervision
1011	A Cylindrical Convolution Network for Dense Top-View Semantic Segmentation with LiDAR Point Clouds
1016	Heterogeneous Interactive Learning Network for Unsupervised Cross-modal Retrieval
1041	Decision-Based Black-box Attack Specific to Large-Size Images
1044	Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
1052	Energy-Efficient Image Processing Using Binary Neural Networks with Hadamard Transforms