Based on a request from our European partners, introduce new languages for the
transcription feature. Dutch and German are now supported, which is a great
addition.
It closes#837.
WhisperX is expected to support both languages.
Fix MinIO client configuration: I was incorrectly passing a full URL instead of
an endpoint, which caused errors in staging. Local development values did not
reflect the staging setup and were also out of sync with the backend.
Link the transcription document to its related recording by adding a short
header explaining that users can download the audio file via a dedicated link.
This was a highly requested feature, as many users need to keep their audio
files.
As part of a small refactor, remove the argument length check in the metadata
analytics class. The hardcoded argument count made code evolution harder and was
easy to forget updating. Argument unwrapping remains fragile and should be
redesigned later to be more robust.
The backend is responsible for generating the download link to ensure
consistency and reliability.
I tried adding a divider, but the Markdown-to-Yjs conversion is very lossy and
almost never handles it correctly. Only about one out of ten conversions works
as expected.
Video files are heavy recording files, sometimes several hours long.
Previously, recordings were naively submitted to the Whisper API without
chunking, resulting in very large requests that could take a long time
to process. Video files are much larger than audio-only files, which
could cause performance issues during upload.
Introduce an extra step to extract the audio component from MP4 files,
producing a lighter audio-only file (to be confirmed). No re-encoding
is done, just a minimal FFmpeg extraction based on community guidance,
since I’m not an FFmpeg expert.
This feature is experimental and may introduce regressions, especially
if audio quality or sampling is impacted, which could reduce Whisper’s
accuracy. Early tests with the ASR model worked, but it has not been
tested on long recordings (e.g., 3-hour meetings),
which some users have.
Screen recording are MP4 files containing video)
The current approach is suboptimal: the microservice will later be updated to
extract audio paths from video, which can be heavy to send to the Whisper
service.
This implementation is straightforward, but the notification service is now
handling many responsibilities through conditional logic. A refactor with a
more configurable approach (mapping attributes to processing steps via
settings) would be cleaner and easier to maintain.
For now, this works; further improvements can come later.
I follow the KISS principle, and try to make this new feature implemented
with the lesser impact on the codebase. This isn’t perfect.
The previous code lacked proper encapsulation, resulting in an overly complex
worker. While the initial naive approach was great for bootstrapping the
feature, the refactor introduces more maturity with dedicated service classes
that have clear, single responsibilities.
During the extraction to services, several minor issues were fixed:
1) Properly closing the MinIO response.
2) Enhanced validation of object filenames and extensions to ensure
correct file handling.
3) Introduced a context manager to automatically clean up temporary
local files, removing reliance on developers.
4) Slightly improved logging and naming for clarity.
5) Dynamic temporary file extension handling when it was previously
always an hardcoded .ogg file, even when it was not the case.
Pass recording options’ language to the summary service, allowing users to
personalize the recording language.
This is important because automatic language detection often fails, causing
empty transcriptions or 5xx errors from the Whisper API. Users then do not
receive their transcriptions, which leads to frustration. For most of our
userbase, meetings are in French, and automatic detection is unreliable.
Support for language parameterization in the Whisper API has existed for some
time; only the frontend and backend integration were missing.
I did not force French as the default, since a minority of users hold English or
other European meetings. A proper settings tab to configure this value will be
introduced later.
Implement Langfuse tracing integration for LLM service calls to capture
prompts, responses, latency, token usage, and errors, enabling
comprehensive monitoring and debugging of AI model interactions
for performance analysis and cost optimization.
Replace plain string fields with Pydantic SecretStr class for all
sensitive configuration values in FastAPI settings to prevent accidental
exposure in logs, error messages, or debugging output, following
security best practices for credential handling.
Install Langfuse observability client in summary service
to enable LLM tracing, monitoring, and debugging capabilities
for AI-powered summarization workflows,
improving visibility into model performance and behavior.
Consolidate scattered transcript formatting functions into single
cohesive class encapsulating all transcript processing logic
for better maintainability and clearer separation of concerns.
Add transcript cleaning step to remove spurious recognition artifacts
like randomly predicted "Vap'n'Roll Thierry" phrases that appear
without corresponding audio, improving transcript quality
by filtering model hallucinations.
- add admin action to retry a recording notification to external services
- log more Celery tasks' parameters
- add multilingual support for real-time subtitles
- update backend dependencies
Add detailed logging for owner ID, recording metadata, and
processing context in transcription tasks to improve debugging
capabilities.
It was especially important to get the created document id,
so when having trouble with the docs API, I could share
with them the newly created documents being impacted.
Restore correct task_args ordering in metadata manager after commit f0939b6f
added sender argument to Celery signals for transcription task scoping,
unexpectedly shifting positional arguments and breaking metadata creation.
Issue went undetected due to missing staging analytics deployment, silently
losing production observability on microservice without blocking transcription
job execution, highlighting need for staging analytics activation.
Add ability to use response_format in call function in order to
have better result with albert-large model
Use reponse_format for next steps and plan generation
Restrict metadata manager signal triggers to transcription-specific Celery
tasks to prevent exceptions when new summary worker executes tasks
not designed for metadata operations, reducing false-positive Sentry errors.
Make WhisperX language detection configurable through FastAPI settings
to handle empty audio start scenarios where automatic detection fails and
incorrectly defaults to English despite 99% French usage.
Quick fix acknowledging long-term solution should allow dynamic
per-recording language selection configured by users through web
interface rather than global server settings.
Sadly, we used user db id as the posthog distinct id
of identified user, and not the sub.
Before this commit, we were only passing sub to the
summary microservice.
Add the owner's id. Please note we introduce a different
naming behavir, by prefixing the id with "owner". We didn't
for the sub and the email.
We cannot align sub and email with this new naming approach,
because external contributors have already started building
their own microservice.
Ensure transcribe jobs are properly assigned to their specific queue
instead of using default queue. This prevents job routing issues and
ensures proper task distribution across workers.
Introduce FastAPI settings configuration option to completely disable
the summary feature. This improves developer experience by allowing
developers to skip summary-related setup when not needed for their
workflow.
Implement summarization functionality that processes completed meeting
transcripts to generate concise summaries.
First draft base on a simple recursive agentic scenario.
Observability and evaluation will be added in the next PRs.
Name the Celery queue used by transcription worker to prepare for
dedicated summarization queue separation, enabling faster transcript
delivery while isolating new agentic logic in separate worker processes.
Rename incorrectly named OpenAI configuration settings since
they're used to instantiate WhisperX client which is not OpenAI
compatible, preventing confusion about actual service dependencies.
Consolidate summary service into main development stack to centralize
development environment management and simplify service orchestration
with shared infrastructure like MinIO storage.
Sync ruff's target Python version to match Docker image version
used for summary component to ensure runtime consistency.
Prevents syntax/feature mismatches, catches version-specific issues
before deployment, and ensures linting targets the actual runtime
environment for better deployment safety.