Video files are heavy recording files, sometimes several hours long.
Previously, recordings were naively submitted to the Whisper API without
chunking, resulting in very large requests that could take a long time
to process. Video files are much larger than audio-only files, which
could cause performance issues during upload.
Introduce an extra step to extract the audio component from MP4 files,
producing a lighter audio-only file (to be confirmed). No re-encoding
is done, just a minimal FFmpeg extraction based on community guidance,
since I’m not an FFmpeg expert.
This feature is experimental and may introduce regressions, especially
if audio quality or sampling is impacted, which could reduce Whisper’s
accuracy. Early tests with the ASR model worked, but it has not been
tested on long recordings (e.g., 3-hour meetings),
which some users have.
Remove default unprivileged Docker user that was incompatible with hot
reloading in tilt stack. Update tilt config to resolve path issues.
CI builds still use unprivileged user, making this change safe while
enabling proper development workflow with hot reloading functionality.