专题:Speech Recognition and Synthesis

This cluster of papers focuses on the advances in speech recognition technology, covering topics such as acoustic modeling using deep neural networks, speaker verification, convolutional neural networks for speech recognition, end-to-end speech recognition systems, hidden Markov models, sequence-to-sequence models, automatic speech recognition, speaker diarization, and statistical language modeling.
最新文献
RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction With Spatio-Temporal Aggregation

article Full Text OpenAlex

Emotion Detection from Speech Using CNN-BiLSTM with Feature Rich Audio Inputs

article Full Text OpenAlex

Prompt-Guided Dual Latent Steering for Inversion Problems

article Full Text OpenAlex

The Development of Northern Thai Dialect Speech Recognition System

article Full Text OpenAlex

145From spoken language data to TEI-based ISO standard

book-chapter Full Text OpenAlex

UniSpeaker: A Unified Approach for Multimodality-driven Speaker Generation

article Full Text OpenAlex

An ML-SwinT-LSTM Method for SAR Compound Jamming Sequence Recognition

article Full Text OpenAlex

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing

article Full Text OpenAlex

Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries

article Full Text OpenAlex

Effective Context in Neural Speech Models

article Full Text OpenAlex

近5年高被引文献
Kaldi Speech Recognition Toolkit

article Full Text OpenAlex 4893 FWCI0

LLaMA: Open and Efficient Foundation Language Models

preprint Full Text OpenAlex 3821 FWCI0

Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics

article Full Text OpenAlex 1661 FWCI1802.54338872

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

article Full Text OpenAlex 1384 FWCI266.2863956

Robust Speech Recognition via Large-Scale Weak Supervision

preprint Full Text OpenAlex 1128 FWCI0

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

paratext Full Text OpenAlex 925 FWCI0

Spoken Language Processing

book-chapter Full Text OpenAlex 802 FWCI2.51996461

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

article Full Text OpenAlex 580 FWCI112.19272403

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

article Full Text OpenAlex 446 FWCI51.95932758

A high-performance neuroprosthesis for speech decoding and avatar control

article Full Text OpenAlex 400 FWCI107.10556289