Local Whisper Transcription
B-Roll Me can transcribe video audio locally using OpenAI's Whisper model when YouTube captions aren't available. This runs entirely on your device — no cloud API calls required for transcription.
When Whisper Is Used
During the search phase, B-Roll Me fetches YouTube captions for each video. Some videos don't have captions available. When this happens:
- • If Auto-transcribe is enabled in Settings, Whisper will automatically transcribe the audio.
- • The transcription runs locally on your machine — no additional API costs.
- • Resulting transcript is used for keyword matching, just like YouTube captions.
Available Models
Choose a model based on your accuracy/speed tradeoff. Smaller models are faster but less accurate:
| Model | Size | Speed vs Accuracy |
|---|---|---|
| tiny.en | ~75 MB | Fastest, lowest accuracy |
| base.en | ~142 MB | Fast, decent accuracy |
| small.en | ~466 MB | Good balance |
| medium.en | ~1.5 GB | High accuracy, slower |
| large-v3-turbo-q5_0 | ~1.1 GB | Best accuracy, quantized for efficiency |
Downloading Models
Models must be downloaded before use. Go to Settings > Transcription, select a model, and click Download. The model is saved locally and can be deleted later to free space.
Apple Silicon Acceleration
On Apple Silicon Macs (M1/M2/M3/M4), Whisper runs with Metal acceleration for significantly faster transcription. On Intel Macs and Windows, it runs on CPU which is slower but still functional.
Tips
- Start with
base.enfor a good speed/accuracy balance on most machines. - If you have an Apple Silicon Mac with 16+ GB RAM,
large-v3-turbo-q5_0provides excellent accuracy with Metal acceleration. - Most YouTube videos have captions. Whisper is mainly needed for less popular content, unlisted videos, or non-English content.