Refactor the screen recording side panel to align with the transcription UX,
ensuring a more consistent and homogeneous user experience.
This commit also introduces a checkbox allowing users to request transcription
of the screen recording, which is one of the most requested features.
The side panel will be enriched with more information soon, especially once
Fichier is integrated for storing recordings, so the destination can be made
explicit.
More recording settings (layout, quality, etc.) will be introduced in upcoming
commits.
A lot of duplication existed, so I started factorizing components
now that a proper user experience is clearer.
Without over-abstracting, the first step introduces a reusable
“no access” view with configurable message and image.
This is just the beginning: props passing is still not ideal, but
it’s sufficient to merge and significantly reduce duplication.
Add a key feature allowing users to choose the language
of their transcription via a setting.
The default value is set to French, the most commonly used
language across our user base.
Users can still select English or “Automatic,” which re-enables automatic
language detection if no default is configured on the microservice.
Not all self-hosted instances will configure this setting, so a default text is
shown when the destination is unknown.
This is important to let users quickly click the link and understand which
platform is used to handle the transcription documents.
Video files are heavy recording files, sometimes several hours long.
Previously, recordings were naively submitted to the Whisper API without
chunking, resulting in very large requests that could take a long time
to process. Video files are much larger than audio-only files, which
could cause performance issues during upload.
Introduce an extra step to extract the audio component from MP4 files,
producing a lighter audio-only file (to be confirmed). No re-encoding
is done, just a minimal FFmpeg extraction based on community guidance,
since I’m not an FFmpeg expert.
This feature is experimental and may introduce regressions, especially
if audio quality or sampling is impacted, which could reduce Whisper’s
accuracy. Early tests with the ASR model worked, but it has not been
tested on long recordings (e.g., 3-hour meetings),
which some users have.
Screen recording are MP4 files containing video)
The current approach is suboptimal: the microservice will later be updated to
extract audio paths from video, which can be heavy to send to the Whisper
service.
This implementation is straightforward, but the notification service is now
handling many responsibilities through conditional logic. A refactor with a
more configurable approach (mapping attributes to processing steps via
settings) would be cleaner and easier to maintain.
For now, this works; further improvements can come later.
I follow the KISS principle, and try to make this new feature implemented
with the lesser impact on the codebase. This isn’t perfect.
The previous code lacked proper encapsulation, resulting in an overly complex
worker. While the initial naive approach was great for bootstrapping the
feature, the refactor introduces more maturity with dedicated service classes
that have clear, single responsibilities.
During the extraction to services, several minor issues were fixed:
1) Properly closing the MinIO response.
2) Enhanced validation of object filenames and extensions to ensure
correct file handling.
3) Introduced a context manager to automatically clean up temporary
local files, removing reliance on developers.
4) Slightly improved logging and naming for clarity.
5) Dynamic temporary file extension handling when it was previously
always an hardcoded .ogg file, even when it was not the case.
Pass recording options’ language to the summary service, allowing users to
personalize the recording language.
This is important because automatic language detection often fails, causing
empty transcriptions or 5xx errors from the Whisper API. Users then do not
receive their transcriptions, which leads to frustration. For most of our
userbase, meetings are in French, and automatic detection is unreliable.
Support for language parameterization in the Whisper API has existed for some
time; only the frontend and backend integration were missing.
I did not force French as the default, since a minority of users hold English or
other European meetings. A proper settings tab to configure this value will be
introduced later.
Major user feature request: allow starting recording and transcription
simultaneously. Inspired by Google Meet UX, add a subtle checkbox letting users
start a recording alongside transcription.
The backend support for this feature is not yet implemented and will come in
upcoming commits, I can only pass the options to the API. The update of the
notification service will be handled later.
We’re half way with a functional feature.
This is not enabled by default because screen recording is resource-intensive. I
prefer users opt in rather than making it their default choice until feature
usage and performance stabilize.
Using a JSON field allows iterating on recording data without running a new
migration each time additional options or metadata need to be tracked.
This comes with trade-offs, notably weaker data validation and less clarity on
which data can be stored alongside a recording.
In the long run, this JSON field can be refactored into dedicated columns once
the feature and data model have stabilized.
Inspired by proprietary solutions, add clearer details on how transcription
works and what users can expect from the feature. This new presentation is much
simpler to read, parse, and understand than the previous large block of text
that users were not reading at all.
Using icons helps users quickly understand where the transcription is sent, how
they are notified, and which meeting language is used.
Some information is currently hardcoded and will be parameterized in upcoming
commits. This work is ongoing.
Explicitly explain that transcription is reserved for public servants. Remove
the temporary beta form: the feature is now available to all public servants,
with restrictions based on domain. Make white-labeling rules explicit and
clarify who to contact for access.
The beta form created frustration, with users registering and never hearing
back from the team.
Improve guidance when a user may be the meeting host but is not logged in, and
therefore cannot activate recording. Add a clear hint and a quick action to log
in. This decision is based on frequent support requests where users could not
understand why recording was unavailable while they were simply not logged in.
Initially, I thought presenting the recording feature as a beta would clearly
signal that it was still under construction and being improved. In practice, it
sent a negative signal to users, reduced trust, and still generated many
questions for the support team.
Without clearly explaining why the feature was in beta or what was coming next,
the label only added confusion. I chose to simplify the interface and remove the
beta indication altogether.
Follow Robin’s suggestion on the meeting tool layout presentation. The result
does not yet exactly match the Figma design, and I took some freedom to stay
closer to a Google Meet–like layout.
In the initial approach, it was hard to understand that the full option was
clickable. Adding a light background improves discoverability and usability.
Robin chose to adopt Material Design icons, inspired by NVasse’s commit on
Fichier. This sets up the required CSS to easily use Material Icons throughout
the application.
Eventually, all icons in the app will be replaced with Material ones. For now,
the setup is only used in the recording UI refactor.
Simplify wording and presentation of the recording feature heading,
using a more concise and familiar product-style language inspired by
well-known proprietary solutions.
Many public servants use PCs with unusual screen resolutions. The screen
height is often quite small, which caused responsiveness issues on the
vertical axis.
When opening the side panel, they could not see the button to start the
recording. I improved the vertical responsiveness to address this issue and
reduce support requests such as “I cannot see the button”.
Users typically do not think about scrolling inside the side panel, so the
layout now better fits constrained screen heights.
Eliminate the perception of being 'under development,'
which can undermine trust with potential users.
Focus on creating a more confident and reassuring experience.
The previous attempt to make the Deepgram configuration extensible
introduced unnecessary complexity for a very limited use case and
made it harder to add new STT backends.
Refactor to a deliberately simple and explicit design with minimal
cognitive overhead. Configuration is now fully driven by environment
variables and provides enough flexibility for ops to select and
parameterize the STT backend.
Until a Pull Request is merged with our changes on livekit-agent
to support Kyutai API, we will use a custom and hacky python
library made from Arnaud's researches and published on an
unofficial pypi project page.
Everything is quite "draft" but it allows us to deploy and test
in real situation the work from Arnaud.
Some system dependencies were unexpectedly missing, causing the
LiveKit agent framework to fail at runtime.
Install the required dependencies based on runtime error logs.
This fixes Docker image failures in the remote (staging) environment.
Implement Langfuse tracing integration for LLM service calls to capture
prompts, responses, latency, token usage, and errors, enabling
comprehensive monitoring and debugging of AI model interactions
for performance analysis and cost optimization.
Replace plain string fields with Pydantic SecretStr class for all
sensitive configuration values in FastAPI settings to prevent accidental
exposure in logs, error messages, or debugging output, following
security best practices for credential handling.
Install Langfuse observability client in summary service
to enable LLM tracing, monitoring, and debugging capabilities
for AI-powered summarization workflows,
improving visibility into model performance and behavior.
Add OIDC_USER_SUB_FIELD_IMMUTABLE setting to our config and enforce
it in the user viewset. Previously relied on implicit Django
LaSuite defaults.
Makes the sub mutability constraint explicit and ensures it's enforced
at the application level, critical for provisional users where sub is
assigned on first login.
Update the sub field documentation to explicitly reflect its optional nature.
Originally intended to be mandatory, sub became optional due to a code issue.
This change acknowledges and formalizes that behavior as intentional.
The optional sub enables external API integrations to provision users with
only an email address. Full identity (sub) is assigned on first login,
allowing third-party platforms to create users before they authenticate.
Allow external platforms using the public API to create provisional users
with email-only identification when the user doesn't yet exist in our
system. This removes a key friction point blocking third-party integrations
from fully provisioning access on behalf of new users.
Provisional users are created with email as the primary identifier. Full
identity reconciliation (sub assignment) occurs on first login, ensuring
reliable user identification is eventually established.
While email-only user creation is not ideal from an identity perspective,
it provides a pragmatic path to unlock integrations and accelerate adoption
through external platforms that are increasingly driving our videoconference
tool's growth.
Reorder CHANGELOG section headings to follow standard Keep a Changelog format
(Added, Changed, Deprecated, Removed, Fixed, Security) for consistent structure
that users expect when reviewing release notes.