AI Clipper for Long Videos & Talk Shows
WhisperVideo: AI turns long multi-speaker videos into labeled clips with speaker panels and synced subtitles
“Top Python Libraries” Publication 400 Subscriptions 20% Discount Offer Link.
WhisperVideo is a concise demo system designed for long-form, multi-speaker videos. It associates speech with the on-screen speaker and maintains a consistent identity throughout. It is built specifically for real conversations rather than short clips.
An end-to-end video understanding demo that includes:
SAM3 segmentation, WhisperX automatic speech recognition, speaker diarization, and an active speaker memory panel.
SAM3 video segmentation for robust face masking
Active speaker detection using TalkNet (audio-visual fusion)
Identity memory based on visual embeddings and trajectory clustering
Aligned subtitles with speaker ID and panel overlay
Panel visualization for compact review and presentation videos



