专题:Speech and Audio Processing

This cluster of papers focuses on the advances in speech enhancement techniques, including audio-visual speech recognition, deep learning methods, noise reduction, source separation, reverberation handling, objective quality measures, beamforming, and lipreading. The papers cover a wide range of topics related to improving the quality and intelligibility of speech signals in various challenging acoustic environments.
最新文献
Deep, data-driven modeling of room acoustics: literature review and research perspectives

article Full Text OpenAlex

Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids

article Full Text OpenAlex

Visual-Informed Speech Enhancement Using Attention-Based Beamforming

article Full Text OpenAlex

Convolutional Neural Network Classifier for Unmanned Aerial Vehicles Detection and Identification Using Mel‐Frequency Spectrograms

article Full Text OpenAlex

SCESS-Net: Semantic consistency enhancement and segment selection network for audio–visual event localization

article Full Text OpenAlex

AuralNet: Hierarchical Attention-based 3D Binaural Localization of Overlapping Speakers

article Full Text OpenAlex

Towards a More Natural Urdu: A Comprehensive Approach to Text-to-Speech and Voice Cloning

article Full Text OpenAlex

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

article Full Text OpenAlex

TF-Mamba: A Time-Frequency Network for Sound Source Localization

article Full Text OpenAlex

Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses

article Full Text OpenAlex

近5年高被引文献
Kaldi Speech Recognition Toolkit

article Full Text OpenAlex 4893 FWCI0

Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics

article Full Text OpenAlex 1659 FWCI1800.35051355

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

article Full Text OpenAlex 1362 FWCI264.32840739

Robust Speech Recognition via Large-Scale Weak Supervision

preprint Full Text OpenAlex 1125 FWCI0

A high-performance neuroprosthesis for speech decoding and avatar control

article Full Text OpenAlex 388 FWCI103.88434296

Recent Advances in End-to-End Automatic Speech Recognition

article Full Text OpenAlex 347 FWCI67.550593

LiT: Zero-Shot Transfer with Locked-image text Tuning

article Full Text OpenAlex 314 FWCI43.90264029

AudioLM: A Language Modeling Approach to Audio Generation

article Full Text OpenAlex 310 FWCI78.42096651

CLAP Learning Audio Concepts from Natural Language Supervision

article Full Text OpenAlex 299 FWCI79.45675844

Self-Supervised Speech Representation Learning: A Review

review Full Text OpenAlex 297 FWCI57.36905435