Skip to main content

Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing

Publication ,  Journal Article
Cao, M; Zheng, S; Wang, L; Chen, Z; Brady, D; Yuan, X
Published in: IEEE Transactions on Multimedia
January 1, 2025

Digital cameras consume \sim 0.1 microjoule per pixel to capture and encode video, resulting in a power usage of \sim 20W for a 4K sensor operating at 30 fps. Imagining gigapixel cameras operating at 100-1000 fps, the current processing model is unsustainable. To address this, physical layer compressive measurement has been proposed to reduce power consumption per pixel by 10-100×. Video Snapshot Compressive Imaging (SCI) introduces high frequency modulation in the optical sensor layer to increase effective frame rate. A commonly used sampling strategy of video SCI is Random Sampling (RS) where each mask element value is randomly set to be 0 or 1. Similarly, image inpainting (I2P) has demonstrated that images can be recovered from a fraction of the image pixels. Inspired by I2P, we propose Ultra-Sparse Sampling (USS) regime, where at each spatial location, only one sub-frame is set to 1 and all others are set to 0. We then build a Digital Micro-mirror Device (DMD) encoding system to verify the effectiveness of our USS strategy. Ideally, we can decompose the USS measurement into sub-measurements for which we can utilize I2P algorithms to recover high-speed frames. However, due to the mismatch between the DMD and CCD, the USS measurement cannot be perfectly decomposed. To this end, we propose BSTFormer, a sparse TransFormer that utilizes local Block attention, global Sparse attention, and global Temporal attention to exploit the sparsity of the USS measurement. Extensive results on both simulated and real-world data show that our method significantly outperforms all previous state-of-the-art algorithms. Additionally, an essential advantage of the USS strategy is its higher dynamic range than that of the RS strategy. Finally, from the application perspective, the USS strategy is a good choice to implement a complete video SCI system on chip due to its fixed exposure time.

Duke Scholars

Published In

IEEE Transactions on Multimedia

DOI

EISSN

1941-0077

ISSN

1520-9210

Publication Date

January 1, 2025

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
  • 40 Engineering
  • 09 Engineering
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cao, M., Zheng, S., Wang, L., Chen, Z., Brady, D., & Yuan, X. (2025). Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2025.3639993
Cao, M., S. Zheng, L. Wang, Z. Chen, D. Brady, and X. Yuan. “Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing.” IEEE Transactions on Multimedia, January 1, 2025. https://doi.org/10.1109/TMM.2025.3639993.
Cao M, Zheng S, Wang L, Chen Z, Brady D, Yuan X. Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing. IEEE Transactions on Multimedia. 2025 Jan 1;
Cao, M., et al. “Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing.” IEEE Transactions on Multimedia, Jan. 2025. Scopus, doi:10.1109/TMM.2025.3639993.
Cao M, Zheng S, Wang L, Chen Z, Brady D, Yuan X. Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing. IEEE Transactions on Multimedia. 2025 Jan 1;

Published In

IEEE Transactions on Multimedia

DOI

EISSN

1941-0077

ISSN

1520-9210

Publication Date

January 1, 2025

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
  • 40 Engineering
  • 09 Engineering
  • 08 Information and Computing Sciences