Skip to main content

SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval

Publication ,  Conference
Lin, Y; Fu, Y; Zhang, J; Liu, Y; Sun, J; Li, HH; Chen, Y
Published in: Proceedings IEEE International Conference on Multimedia and Expo
January 1, 2025

While current Speech Large Language Models (Speech LLMs) excel at short-form tasks, they struggle with the computational and representational demands of longer audio clips. To advance the model's capabilities with long-form speech, we introduce Speech Information Retrieval (SIR), a long-context task for Speech LLMs, and present SPIRAL, a 1,012-sample benchmark testing models' ability to extract critical details from long spoken inputs. To overcome the challenges of processing long speech sequences, we propose SpeechPrune, a training-free token pruning strategy that uses speech-text similarity and approximated attention scores to efficiently discard irrelevant tokens. In SPIRAL, SpeechPrune achieves accuracy improvements of 29% and up to 47% over the original model and the random pruning model at a pruning rate of 20%, respectively. SpeechPrune can maintain network performance even at a pruning level of 80%. This highlights the potential of token-level pruning for efficient and scalable long-form speech understanding.

Duke Scholars

Published In

Proceedings IEEE International Conference on Multimedia and Expo

DOI

EISSN

1945-788X

ISSN

1945-7871

Publication Date

January 1, 2025
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lin, Y., Fu, Y., Zhang, J., Liu, Y., Sun, J., Li, H. H., & Chen, Y. (2025). SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval. In Proceedings IEEE International Conference on Multimedia and Expo. https://doi.org/10.1109/ICME59968.2025.11209113
Lin, Y., Y. Fu, J. Zhang, Y. Liu, J. Sun, H. H. Li, and Y. Chen. “SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval.” In Proceedings IEEE International Conference on Multimedia and Expo, 2025. https://doi.org/10.1109/ICME59968.2025.11209113.
Lin Y, Fu Y, Zhang J, Liu Y, Sun J, Li HH, et al. SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval. In: Proceedings IEEE International Conference on Multimedia and Expo. 2025.
Lin, Y., et al. “SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval.” Proceedings IEEE International Conference on Multimedia and Expo, 2025. Scopus, doi:10.1109/ICME59968.2025.11209113.
Lin Y, Fu Y, Zhang J, Liu Y, Sun J, Li HH, Chen Y. SpeechPrune: Context-Aware Token Pruning for Speech Information Retrieval. Proceedings IEEE International Conference on Multimedia and Expo. 2025.

Published In

Proceedings IEEE International Conference on Multimedia and Expo

DOI

EISSN

1945-788X

ISSN

1945-7871

Publication Date

January 1, 2025