Paper ID |
Paper Title |
4 |
Skin tone Diagnosis in the Wild: Towards More Robust and Inclusive User Experience Using Oriented Aleatoric Uncertainty |
6 |
Spatial Temporal Network for Image and Skeleton Based Group Activity Recognition |
9 |
Calibrated Face Image Forgery Detection with Contrastive Representation Distillation |
22 |
Causes of Catastrophic Forgetting in Class-Incremental Semantic Segmentation |
27 |
RepF-Net: Distortion-aware Re-projection Fusion Network for Object Detection in Panorama Image |
28 |
Spatio-channel Attention Blocks for Cross-modal Crowd Counting |
29 |
Revisiting Image Pyramid Structure for High Resolution Salient Object Detection |
31 |
CLUE: Consolidating Learned and Undergoing Experience in Domain-Incremental Classification |
35 |
D^3: Duplicate Detection Decontaminator for Multiple Object Tracking on Sports Video |
38 |
DAC-GAN: Dual Auxiliary Consistency Generative Adversarial Network for Text-to-Image Generation |
39 |
RA Loss: Relation-Aware Loss for Robust Person Re-identification |
40 |
From Sparse to Dense: Semantic Graph Evolutionary Hashing for Unsupervised Cross-Modal Retrieval |
50 |
A Differentiable Distance Approximation for Fairer Image Classification |
52 |
HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining |
56 |
Class Specialized Knowledge Distillation |
60 |
Modular Degradation Simulation and Restoration for Under-Display Camera |
61 |
APAUNet: Axis Projection Attention UNet for Small Target in 3D Medical Segmentation |
65 |
UHD Underwater Image Enhancement via Frequency-Spatial Domain Aware Network |
66 |
EAI-Stereo: Error Aware Iterative Network for Stereo Matching |
68 |
Boosting Dense Long-Tailed Object Detection from Data-Centric View |
77 |
Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-modal Hashing |
79 |
Towards Real-time High-Definition Image Snow Removal: Efficient Pyramid Network with Asymmetrical Encoder-decoder Architecture |
88 |
Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition |
90 |
Temporal-aware Siamese Tracker: Integrate Temporal Context for 3D Object Tracking |
92 |
Neural Plenoptic Sampling: Learning Light-field from Thousands of Imaginary Eyes |
97 |
3D-Yoga: A 3D Yoga Dataset for Visual-based Hierarchical Sports Action Analysis |
98 |
Continuous Self-Study: Scene Graph Generation with Self-Knowledge Distillation and Spatial Augmentation |
101 |
Uncertainty-Based Thin Cloud Removal Network via Conditional Variational Autoencoders |
111 |
NEO-3DF: Novel Editing-Oriented 3D Face Creation and Reconstruction |
115 |
Learning Internal Semantics with Expanded Categories for Generative Zero-Shot Learning |
120 |
Group guided data association for multiple object tracking |
125 |
GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework |
126 |
LSMD-Net: LiDAR-Stereo Fusion with Mixture Density Network for Depth Sensing |
128 |
Meta-Prototype Decoupled Training for Long-tailed Learning |
135 |
Point Cloud Upsampling via Cascaded Refinement Network |
136 |
Exposing Face Forgery Clues via Retinex-based Image Enhancement |
137 |
Foreground-Specialized Model Imitation for Instance Segmentation |
138 |
TriMix: A General Framework for Medical Image Segmentation from Limited Supervision |
139 |
CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization |
144 |
GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification |
146 |
Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in CNN |
147 |
Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos |
148 |
Efficient Hardware-aware Neural Architecture Search for Image Super-resolution on Mobile Devices |
150 |
PEDTrans: A fine-grained visual classification model for self-attention patch enhancement and dropout |
153 |
Causal-SETR: a SEgementation TRansformer Variant based on Causal Intervention |
157 |
Learning Using Privileged Information for Zero-Shot Action Recognition |
160 |
SEIC: Semantic Embedding with Intermediate Classes for Zero-Shot Domain Generalization |
161 |
Domain Generalized RPPG Network: Disentangled Feature Learning with Domain Permutation and Domain Augmentation |
164 |
Gestalt-Guided Image Understanding for Few-Shot Learning |
166 |
Fine-Grained Image Style Transfer with Visual Transformers |
167 |
Diffusion Models for Counterfactual Explanations |
169 |
PhyLoNet: Physically-Constrained Long Term Video Prediction |
182 |
Blind Image Super-Resolution with Degradation-Aware Adaptation |
186 |
Depth Estimation via Sparse Radar Prior and Driving Scene Semantics |
------- |
------------- |
194 |
Dynamic Feature Aggregation for Efficient Video Object Detection |
205 |
Soft Label Mining and Average Expression Anchoring for Facial Expression Recognition |
206 |
Image Retrieval with Well-Separated Semantic Hash Centers |
210 |
Multi-View Coupled Self-Attention Network for Pulmonary Nodules Classification |
214 |
ADEL: Adaptive Distribution Effective-matching Method for Guiding Generators of GANs |
221 |
Three-stage Training Pipeline with Patch Random Drop for Few-shot Object Detection |
223 |
Vectorizing Building Blueprints |
224 |
Complex Handwriting Trajectory Recovery: Evaluation Metrics and Algorithm |
233 |
Occluded Facial Expression Recognition using Self-supervised Learning |
246 |
Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation |
247 |
Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling |
248 |
Unsupervised 3D Shape Representation Learning using Normalizing Flow |
250 |
Cluster Contrast for Unsupervised Person Re-Identification |
253 |
EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network |
255 |
MaxGNR: A Dynamic Weight Strategy via Maximizing Gradient-to-Noise Ratio for Multi-Task Learning |
264 |
SST-VLM: Sparse Sampling-Twice Inspired Video-Language Model |
268 |
Enhancing Fairness of Visual Attribute Predictors |
269 |
Adaptive FSP : Adaptive Architecture Search with Filter Shape Pruning |
273 |
Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation |
275 |
Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-based Action Recognition |
284 |
Learning Inter-Superpoint Affinity for Weakly Supervised 3D Instance Segmentation |
288 |
Improving Few-shot Learning by Spatially-aware Matching and CrossTransformer |
292 |
3D Pose Based Feedback For Physical Exercises |
295 |
Multi-Branch Network with Ensemble Learning for Text Removal in the Wild |
296 |
Few-shot Adaptive Object Detection with Cross-Domain CutMix |
297 |
Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning |
298 |
Cross-View Self-Fusion for Self-Supervised 3D Human Pose Estimation in the Wild |
300 |
AutoEnhancer: Transformer on U-Net Architecture search for Underwater Image Enhancement |
301 |
PromptLearner-CLIP: Contrastive Multi-Modal Action Representation Learning with Context Optimization |
302 |
3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction |
304 |
Heterogeneous Avatar Synthesis Based on Disentanglement of Topology and Rendering |
307 |
A Joint Framework Towards Class-aware and Class-agnostic Alignment for Few-shot Segmentation |
310 |
Flare Transformer: Solar Flare Prediction using Magnetograms and Sunspot Physical Features |
311 |
Revisiting Unsupervised Domain Adaptation Models: a Smoothness Perspective |
317 |
A Diffusion-ReFinement Model for Sketch-to-Point Modeling |
320 |
TCVM: Temporal Contrasting Video Montage Framework for Self-supervised Video Representation Learning |
323 |
Causal Property based Anti-Conflict Modeling with Hybrid Data Augmentation for Unbiased Scene Graph Generation |
325 |
Multi-granularity Transformer for Image Super-resolution |
326 |
MGTR: End-to-End Mutual Gaze Detection with Transformer |
328 |
AONet: Attentional Occlusion-aware Network for Occluded Person Re-identification |
329 |
FFD Augmentor: Towards Few-Shot Oracle Character Recognition from Scratch |
330 |
Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation |
333 |
Consistent Semantic Attacks on Optical Flow |
336 |
DecisioNet - A Binary-Tree Structured Neural Network |
339 |
Fully Transformer Network for Change Detection of Remote Sensing Images |
342 |
Generating Multiple Hypotheses for 3D Human Mesh and Pose using Conditional Generative Adversarial Nets |
344 |
SymmNeRF: Learning to Explore Symmetry Prior for Single-View View Synthesis |
348 |
'Labelling the Gaps': A Weakly Supervised Automatic Eye Gaze Estimation |
352 |
Meta-Det3D: Learn to Learn Few-Shot 3D Object Detection |
356 |
ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection |
357 |
Physical Passive Patch Adversarial Attacks on Visual Odometry Systems |
358 |
Staged Adaptive Blind Watermarking Scheme |
363 |
Learnable Orthogonal Transformed Projection for Semi-supervised Image Classification |
366 |
An RNN-Based Framework for the MILP Problem in Robustness Verification of Neural Networks |
367 |
Multi-Scale Wavelet Transformer for Face Forgery Detection |
368 |
Few-Shot Metric Learning: Online Adaptation of Embedding for Retrieval |
376 |
Adaptive Range guided Multi-view Depth Estimation with Normal Ranking Loss |
378 |
Slice-mask based 3D Cardiac Shape Reconstruction from CT volume |
379 |
Self-Supervised Augmented Patches Segmentation for Anomaly Detection |
381 |
TeCM-CLIP: Text-based Controllable Multi-attribute Face Image Manipulation |
382 |
gScoreCAM: What objects is CLIP looking at? |
384 |
Is an Object-Centric Video Representation Beneficial for Transfer? |
385 |
Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data |
386 |
Training-free NAS for 3D Point Cloud Processing |
389 |
Gated cross word-visual attention-driven generative adversarial networks for text-to-image synthesis |
393 |
Structure Representation Network and Uncertainty Feedback Learning for \\Dense Non-Uniform Fog Removal |
395 |
Rethinking Low-level Features for Interest Point Detection and Description |
396 |
DILane: Dynamic Instance-Aware Network for Lane Detection |
398 |
BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models |
400 |
CIRL: A Category-Instance Representation Learning Framework for Tropical Cyclone Intensity Estimation |
406 |
Explaining Deep Neural Networks for Point Clouds using Gradient-based Visualisations |
417 |
DualBLN: Dual Branch LUT-aware Network for Real-time Image Retouching |
419 |
SCOAD: Signal-frame Click Supervision for Online Action Detection |
420 |
AFF-CAM: Adaptive Frequency Filtering based Channel Attention Module |
422 |
CSIE: Coded Strip-patterns Image Enhancement Embedded in Structured Light-based Methods |
423 |
What Role Does Data Augmentation Play in Knowledge Distillation? |
424 |
Inverting Adversarially Robust Networks for Image Synthesis |
426 |
Multispectral-Based Imaging and Machine Learning for Noninvasive Blood Loss Estimation |
440 |
Improving the Quality of Sparse-view Cone-Beam Computed Tomography via Reconstruction-Friendly Interpolation Network |
441 |
Co-Attention Aligned Mutual Cross-Attention for Cloth-Changing Person Re-Identification |
442 |
Rethinking Online Knowledge Distillation with Multi-Exits |
447 |
Style Image Harmonization via Global-Local Style Mutual Guided |
448 |
Image Denoising using Convolutional Sparse Coding Network with Dry Friction |
454 |
Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization |
459 |
End-to-end Surface Reconstruction For Touching Trajectories |
461 |
3D Shape Temporal Aggregation for Video-Based Clothing-Change Person Re-identification |
463 |
Robustizing Object Detection Networks Using Augmented Feature Pooling |
466 |
A General Divergence Modeling Strategy for Salient Object Detection |
469 |
Re-parameterization Making GC-Net-style 3DConvNets More Efficient |
471 |
AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports |
476 |
Teacher-Guided Learning for Blind Image Quality Assessment |
478 |
PU-Transformer: Point Cloud Upsampling Transformer |
482 |
Multi-scale Residual Interaction for RGB-D Salient Object Detection |
484 |
Unreliability-aware Disentangling for Cross-Domain Semi-supervised Pedestrian Detection |
488 |
Patch Embedding as Local Features: Unifying Deep Local and Global Features Via Vision Transformer for Image Retrieval |
491 |
DCVQE: A Hierarchical Transformer for Video Quality Assessment |
498 |
ElDet: An Anchor-free General Ellipse Object Detector |
499 |
BorderNet: An Efficient Border-Attention Text Detector |
508 |
Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition |
513 |
Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification |
514 |
Truly Unsupervised Image-to-Image Translation with Contrastive Representation Learning |
518 |
Feature Selective Transformer for Semantic Image Segmentation |
522 |
QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation |
524 |
Improving Surveillance Object Detection with Adaptive Omni-Attention over both Inter-Frame and Intra-Frame Context |
526 |
Semi-supervised Breast Lesion Segmentation using Local Cross Triplet Loss for Ultrafast Dynamic Contrast-Enhanced MRI |
530 |
Video Object Segmentation via Structural Feature Reconfiguration |
535 |
MatchFormer: Interleaving Attention in Transformers for Feature Matching |
537 |
SG-Net: Semantic Guided Network for Image Dehazing |
538 |
DIG: Draping Implicit Garment over the Human Body |
543 |
Synchronous Bi-Directional Pedestrian Trajectory Prediction with Error Compensation |
544 |
DENet: Detection-driven Enhancement Network for Object Detection under Adverse Weather Conditions |
545 |
Neural Puppeteer: Keypoint-Based Neural Rendering of Dynamic Shapes |
548 |
Decanus to Legatus: Synthetic training for 2D-3D human pose lifting |
556 |
Coil-Agnostic Attention-Based Network for Parallel MRI Reconstruction |
557 |
Lightweight Image Matting via Efficient Non-Local Guidance |
559 |
IoU-Enhanced Attention for End-to-End Task Specific Object Detection |
560 |
CSS-Net: Classification and Substitution for Segmentation of Rotator Cuff Tear |
567 |
Pyramidal Signed Distance Learning for Spatio-Temporal Human Shape Completion |
569 |
MSF$^2$DB:Multi Scale Feature Fusion Dehazing Network with Dense connection |
576 |
Rove-Tree-8: The not-so-Wild Rover, A hierarchically structured image dataset for deep metric learning research |
577 |
Robust Human Matting via Segmentation Guidance |
580 |
Layered-Garment Net: Generating Multiple Implicit Garment Layers from a Single Image |
584 |
Self-Supervised Dehazing Network Using Physical Priors |
588 |
SWPT: Spherical Window-based Point Cloud Transformer |
591 |
RGB Road Scene Material Segmentation |
592 |
Self-Distilled Vision Transformer for Domain Generalization |
594 |
Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis |
595 |
CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition |
597 |
Exemplar Free Class Agnostic Counting |
599 |
Looking from a Higher-level Perspective: Attention and Recognition Enhanced Multi-scale Scene Text Segmentation |
603 |
LHDR: HDR Reconstruct for Legacy Content using a Lightweight DNN |
604 |
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality? |
626 |
Conditional GAN for Point Cloud Generation |
628 |
Progressive Attentional Manifold Alignment for Arbitrary Style Transfer |
629 |
From Within to Between: Knowledge Distillation for Cross Modality Retrieval |
633 |
DreamNet: A Deep Riemannian Manifold Network for SPD Matrix Learning |
636 |
Domain Generalized Person Re-identification by Locating and Eliminating Domain-Sensitive Features |
639 |
ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal Self-Ensemble for Active Learning |
649 |
PointFormer: A Dual Perception Attention-based Network for Point Cloud Classification |
651 |
FunnyNet: Audiovisual Learning of Funny Moments in Videos |
652 |
UTB180: A High-quality Benchmark for Underwater Tracking |
653 |
Affinity-Aware Relation Network for Oriented Object Detection in Aerial Images |
654 |
HAZE-Net: High-Frequency Attentive Super-Resolved Gaze Estimation in Low-Resolution Face Images |
655 |
LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation |
656 |
Cross-Architecture Knowledge Distillation |
660 |
Cross-Domain Local Characteristic Enhanced Deepfake Video Detection |
665 |
Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation |
671 |
Neural Residual Flow Fields for Efficient Video Representations |
673 |
CV4Code: Sourcecode Understanding via Visual Code Representations |
676 |
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval |
677 |
A simple strategy to make neural networks provably invariant |
681 |
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward |
682 |
Visual Explanation Generation Based on Lambda Attention Branch Networks |
686 |
SCFNet: A Spatial-Channel Features Network based on Heterocentric Sample Loss for Visible-Infrared Person Re-Identification |
690 |
NoiseTransfer: Image Noise Generation with Contrastive Embeddings |
694 |
The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization |
695 |
PathTR: Context-Aware Memory Transformer for Tumor Localization in Gigapixel Pathology Images |
699 |
Scale Adaptive Fusion Network for RGB-D Salient Object Detection |
700 |
Decoupling identity and visual quality for image and video anonymization |
704 |
Thinking Hallucination for Video Captioning |
706 |
Three-Stage Bidirectional Interaction Network for Efficient RGB-D Salient Object Detection |
712 |
MVFI-Net: Motion-aware Video Frame Interpolation Network |
714 |
Region-of-interest Attentive Heteromodal Variational Encoder-Decoder for Segmentation with Missing Modalities |
739 |
CCLSL: Combination of Contrastive Learning and Supervised Learning for Handwritten Mathematical Expression Recognition |
744 |
PatchFlow: A Two-Stage Patch-Based Approach for Lightweight Optical Flow Estimation |
745 |
Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis |
746 |
Two Video Data Sets for Tracking and Retrieval of Out of Distribution Objects |
747 |
Learning Texture Enhancement Prior with Deep Unfolding Network for Snapshot Compressive Imaging |
751 |
Exp-GAN: 3D-Aware Facial Image Generation with Expression Control |
753 |
PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person Search |
762 |
Light Attenuation and Color Fluctuation for Underwater Image Restoration |
765 |
Weighted Contrative Hashing |
767 |
Neural Network Panning: Screening the Optimal Sparse Network Before Training |
768 |
COLLIDER: A Robust Training Framework for Backdoor Data |
769 |
Unsupervised Online Hashing with Multi-Bit Quantization |
774 |
Action Representing by Constrained Conditional Mutual Information |
784 |
Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement |
791 |
Unified Learning of Multipurpose Energy Based Generative Hashing Network via Dual-Buffer MCMC Teaching for Supervised Image Hashing |
793 |
DHG-GAN: Diverse Image Outpainting via Decoupled High Frequency Semantics |
796 |
Learning and Transforming General Representations to Break Down Stability-Plasticity Dilemma |
802 |
A Compressive Prior Guided Mask Predictive Coding Approach for Video Analysis |
804 |
BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation |
806 |
Network Pruning via Feature Shift Minimization |
808 |
PPR-Net: Patch-based multi-scale pyramid registration network for defect detection of printed labels |
814 |
A^2: Adaptive Augmentation for Effectively Mitigating Dataset Bias |
818 |
Exploring Adversarially Robust Training for Unsupervised Domain Adaptation |
824 |
RDRN: Recursively Defined Residual Network for Image Super-Resolution |
826 |
MUSH: Multi-Scale Hierarchical Feature Extraction for Semantic Image Synthesis |
832 |
Two-stage Multimodality Fusion for High-performance Text-based Visual Question Answering |
848 |
Shape Prior is Not All You Need: Discovering Balance between Texture and Shape bias in CNN |
860 |
Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action Recognition |
862 |
A Prototype-Oriented Contrastive Adaption Network For Cross-domain Facial Expression Recognition |
865 |
Content-Aware Hierarchical Representation Selection for Cross-View Geo-Localization |
867 |
Learning to Predict Decomposed Dynamic Filters for Single Image Motion Deblurring |
880 |
HaViT: Hybrid-attention based Vision Transformer for Video Classification |
882 |
Learning Common and Specific Visual Prompts for Domain Generalization |
884 |
Training Dynamics Aware Neural Network Optimization with Stabilization |
891 |
Filter Pruning via Automatic Pruning Rate Search |
894 |
Spotlights: Probing Shapes from Spherical Viewpoints |
895 |
SAC-GAN : Face Image Inpainting with Spatial-aware Attribute Controllable GAN |
897 |
Vision Transformer Compression and Architecture Exploration with Efficient Embedding Space Search |
914 |
DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving |
916 |
PBCStereo: A Compressed Stereo Network with Pure Binary Convolutional Operations |
918 |
A Lightweight Local-Global Attention Network for Single Image Super-Resolution |
926 |
MGRLN-Net: Mask-guided Residual Learning Network for Joint Single-Image Shadow Detection and Removal |
929 |
Social Aware Multi-Modal Pedestrian Crossing Behavior Prediction |
935 |
OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition |
938 |
Structure Guided Proposal Completion for 3D Object Detection |
942 |
KinStyle: A Strong Baseline Photorealistic Kinship Face Synthesis with An Optimized StyleGAN Encoder |
946 |
Comparing Complexities of Decision Boundaries for Robust Training: A Universal Approach |
953 |
Tracking Small and Fast Moving Objects: A Benchmark |
956 |
Deep Active Ensemble Sampling For Image Classification |
957 |
Super-attention for exemplar-based image colorization |
962 |
Multi-stream Fusion for Class Incremental Learning in Pill Image Classification |
963 |
Compressed Vision for Efficient Video Understanding |
980 |
Multi-modal Characteristic Guided Depth Completion Network |
994 |
Bright as the Sun: In-depth Analysis of Imagination-driven Image Captioning |
1000 |
Semi-Supervised Semantic Segmentation with Uncertainty-guided Self Cross Supervision |
1011 |
A Cylindrical Convolution Network for Dense Top-View Semantic Segmentation with LiDAR Point Clouds |
1016 |
Heterogeneous Interactive Learning Network for Unsupervised Cross-modal Retrieval |
1041 |
Decision-Based Black-box Attack Specific to Large-Size Images |
1044 |
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks |
1052 |
Energy-Efficient Image Processing Using Binary Neural Networks with Hadamard Transforms |