Pin egress to the production version, which uses a more recent release than the
default chart value (1.9.0).
Using the default could have led to issues; hopefully this change avoids them.
Instead of relying on the egress_started event—which fires when egress is
starting, not actually started—I now rely on egress_updated for more accurate
status updates. This is especially important for the active status, which
triggers after egress has truly joined the room. Using this avoids prematurely
stopping client-side listening to room.isRecording updates. A further
refactoring may remove reliance on room updates entirely.
The goal is to minimize handling metadata in the mediator class. egress_starting
is still used for simplicity, but egress_started could be considered in the
future.
Note: if the API to start egress hasn’t responded yet, the webhook may fail to
find the recording because it currently matches by worker ID. This is unstable.
A better approach would be to pass the database ID in the egress metadata and
recover the recording from it in the webhook.
Worked on a large PR (#827) and chose to consolidate all new features and
refactorings in the changelog at the end of the work instead of updating it per
commit. Not ideal—acknowledge this is bad practice.
Link recording statuses to the `isRecording` attribute from the room on the
client side.
After the refactor, the frontend relied only on recording statuses computed by
the backend. However, when egress is started and the backend is notified, the
recording is not actually active yet. It takes some time for the egress to join
the room and begin recording.
Enrich the frontend by combining backend statuses with the room recording state
to more accurately reflect when recording is truly active. This avoids missing
the first few seconds of audio at the beginning of a recording.
Only display transcription settings to room admins or owners. Showing these
controls to users without the required privileges would be misleading, since
they cannot actually configure or apply the settings.
Link the transcription document to its related recording by adding a short
header explaining that users can download the audio file via a dedicated link.
This was a highly requested feature, as many users need to keep their audio
files.
As part of a small refactor, remove the argument length check in the metadata
analytics class. The hardcoded argument count made code evolution harder and was
easy to forget updating. Argument unwrapping remains fragile and should be
redesigned later to be more robust.
The backend is responsible for generating the download link to ensure
consistency and reliability.
I tried adding a divider, but the Markdown-to-Yjs conversion is very lossy and
almost never handles it correctly. Only about one out of ten conversions works
as expected.
Specify distinct icons in the recording state toast for each mode to provide
clearer visual feedback on what is actually happening. Remove the pulse CSS
animation, as it did not improve visual clarity and accessibility.
Inspired by @ericboucher’s proposal, allow non-admin or non-owner participants
to request the start of a transcription or a recording.
All participants are notified of the request, but only the admin can actually
open the menu and start the recording.
This is a first simple and naive implementation and will be improved later.
Prefer opening the relevant recording menu for admins instead of offering a
direct quick action to start recording. With more options now tied to recording,
keeping the responsibility for starting it encapsulated within the side panel
felt cleaner.
This comes with some UX trade-offs, but it’s worth trying.
I also simplified the notification mechanism by disabling the action button for
the same duration as the notification, preventing duplicate triggers. This is
not perfect, since hovering the notification pauses its display, but it avoids
most accidental re-triggers.
Rework the visual hierarchy of the “no access” view to align it with other
presentation modes and ensure the title order is clear and understandable for
users.
Refactor literals in the recording status hook and introduce a new status.
Align the login prompt style with the newly introduced warning message, and
guide users by clearly indicating that the two modes are mutually exclusive.
Users are prompted to stop the other mode before starting a new one.
This situation should happen less often now that checkboxes allow users to start
transcription and recording together. Hopefully, the UX is clear enough.
The growing number of props passed to the controls buttons may become an issue
and will likely require refactoring later.
Mutualize and factorize the recording API error modal in a single place, and
extract all recording mutations into a dedicated hook exposing both start and
stop actions.
This hook is responsible for interacting with the API error dialog when needed.
Previously, this logic was duplicated across each side panel; centralizing it
clarifies responsibilities and reduces duplication.
This component is now extensible and way easier to understand.
Previously, the recording state toast was implicitly acting as a provider,
making its core responsibility unclear for developers. Its role is not to
inject all recording-related elements into the videoconference DOM, but to
expose a clean recording state toast reflecting the current recording status.
This commit also fixes the limit-reached modal that was no longer appearing
after the refactor, ensures the modal is always rendered,
and removes unused React ARIA labels.
In the original code, the limit reached dialog was wrongly rendered
only when the recording state toast was null.
It was a bug in the implementation. Fix it.
Following the previous commit, refactor the frontend to rely on room metadata to
track which recording is running and update the interface accordingly. This
implementation is not fully functional yet.
The limit-reached dialog triggering mechanism is currently broken and will be
fixed in upcoming commits. I also simplified the interface lifecycle, but some
edge cases are not yet handled—for example, transcription controls should be
disabled when a screen recording is started. This will be improved soon.
Controls were extracted into a reusable component using early returns. This
makes the logic easier to read, but slightly increases the overall complexity of
the recording side panel component.
Relying on literals to manage recording statuses is quite poor, feel free to
enhance this part.
Previously, this was handled manually by the client, sending notifications to
other participants and keeping the recording state only in memory. There was no
shared or persisted state, so leaving and rejoining a meeting lost this
information. Delegating this responsibility solely to the client was a poor
choice.
The backend now owns this responsibility and relies on LiveKit webhooks to keep
room metadata in sync with the egress lifecycle.
This also reveals that the room.isRecording attribute does not update as fast
as the egress stop event, which is unexpected and should be investigated
further.
This will make state management working when several room’s owner will be in
the same meeting, which is expected to arrive any time soon.
Now that screen recording and transcription share the same UI presentation,
extract the row logic into a reusable component to avoid code duplication and
improve code maintainability.
Centralize the logic to compute, internationalize, and present the maximum
recording duration in a human-readable way, reducing duplication across the
codebase.
Refactor the screen recording side panel to align with the transcription UX,
ensuring a more consistent and homogeneous user experience.
This commit also introduces a checkbox allowing users to request transcription
of the screen recording, which is one of the most requested features.
The side panel will be enriched with more information soon, especially once
Fichier is integrated for storing recordings, so the destination can be made
explicit.
More recording settings (layout, quality, etc.) will be introduced in upcoming
commits.
A lot of duplication existed, so I started factorizing components
now that a proper user experience is clearer.
Without over-abstracting, the first step introduces a reusable
“no access” view with configurable message and image.
This is just the beginning: props passing is still not ideal, but
it’s sufficient to merge and significantly reduce duplication.
Add a key feature allowing users to choose the language
of their transcription via a setting.
The default value is set to French, the most commonly used
language across our user base.
Users can still select English or “Automatic,” which re-enables automatic
language detection if no default is configured on the microservice.
Not all self-hosted instances will configure this setting, so a default text is
shown when the destination is unknown.
This is important to let users quickly click the link and understand which
platform is used to handle the transcription documents.
Video files are heavy recording files, sometimes several hours long.
Previously, recordings were naively submitted to the Whisper API without
chunking, resulting in very large requests that could take a long time
to process. Video files are much larger than audio-only files, which
could cause performance issues during upload.
Introduce an extra step to extract the audio component from MP4 files,
producing a lighter audio-only file (to be confirmed). No re-encoding
is done, just a minimal FFmpeg extraction based on community guidance,
since I’m not an FFmpeg expert.
This feature is experimental and may introduce regressions, especially
if audio quality or sampling is impacted, which could reduce Whisper’s
accuracy. Early tests with the ASR model worked, but it has not been
tested on long recordings (e.g., 3-hour meetings),
which some users have.
Screen recording are MP4 files containing video)
The current approach is suboptimal: the microservice will later be updated to
extract audio paths from video, which can be heavy to send to the Whisper
service.
This implementation is straightforward, but the notification service is now
handling many responsibilities through conditional logic. A refactor with a
more configurable approach (mapping attributes to processing steps via
settings) would be cleaner and easier to maintain.
For now, this works; further improvements can come later.
I follow the KISS principle, and try to make this new feature implemented
with the lesser impact on the codebase. This isn’t perfect.
The previous code lacked proper encapsulation, resulting in an overly complex
worker. While the initial naive approach was great for bootstrapping the
feature, the refactor introduces more maturity with dedicated service classes
that have clear, single responsibilities.
During the extraction to services, several minor issues were fixed:
1) Properly closing the MinIO response.
2) Enhanced validation of object filenames and extensions to ensure
correct file handling.
3) Introduced a context manager to automatically clean up temporary
local files, removing reliance on developers.
4) Slightly improved logging and naming for clarity.
5) Dynamic temporary file extension handling when it was previously
always an hardcoded .ogg file, even when it was not the case.
Pass recording options’ language to the summary service, allowing users to
personalize the recording language.
This is important because automatic language detection often fails, causing
empty transcriptions or 5xx errors from the Whisper API. Users then do not
receive their transcriptions, which leads to frustration. For most of our
userbase, meetings are in French, and automatic detection is unreliable.
Support for language parameterization in the Whisper API has existed for some
time; only the frontend and backend integration were missing.
I did not force French as the default, since a minority of users hold English or
other European meetings. A proper settings tab to configure this value will be
introduced later.
Major user feature request: allow starting recording and transcription
simultaneously. Inspired by Google Meet UX, add a subtle checkbox letting users
start a recording alongside transcription.
The backend support for this feature is not yet implemented and will come in
upcoming commits, I can only pass the options to the API. The update of the
notification service will be handled later.
We’re half way with a functional feature.
This is not enabled by default because screen recording is resource-intensive. I
prefer users opt in rather than making it their default choice until feature
usage and performance stabilize.
Using a JSON field allows iterating on recording data without running a new
migration each time additional options or metadata need to be tracked.
This comes with trade-offs, notably weaker data validation and less clarity on
which data can be stored alongside a recording.
In the long run, this JSON field can be refactored into dedicated columns once
the feature and data model have stabilized.
Inspired by proprietary solutions, add clearer details on how transcription
works and what users can expect from the feature. This new presentation is much
simpler to read, parse, and understand than the previous large block of text
that users were not reading at all.
Using icons helps users quickly understand where the transcription is sent, how
they are notified, and which meeting language is used.
Some information is currently hardcoded and will be parameterized in upcoming
commits. This work is ongoing.
Explicitly explain that transcription is reserved for public servants. Remove
the temporary beta form: the feature is now available to all public servants,
with restrictions based on domain. Make white-labeling rules explicit and
clarify who to contact for access.
The beta form created frustration, with users registering and never hearing
back from the team.
Improve guidance when a user may be the meeting host but is not logged in, and
therefore cannot activate recording. Add a clear hint and a quick action to log
in. This decision is based on frequent support requests where users could not
understand why recording was unavailable while they were simply not logged in.
Initially, I thought presenting the recording feature as a beta would clearly
signal that it was still under construction and being improved. In practice, it
sent a negative signal to users, reduced trust, and still generated many
questions for the support team.
Without clearly explaining why the feature was in beta or what was coming next,
the label only added confusion. I chose to simplify the interface and remove the
beta indication altogether.
Follow Robin’s suggestion on the meeting tool layout presentation. The result
does not yet exactly match the Figma design, and I took some freedom to stay
closer to a Google Meet–like layout.
In the initial approach, it was hard to understand that the full option was
clickable. Adding a light background improves discoverability and usability.
Robin chose to adopt Material Design icons, inspired by NVasse’s commit on
Fichier. This sets up the required CSS to easily use Material Icons throughout
the application.
Eventually, all icons in the app will be replaced with Material ones. For now,
the setup is only used in the recording UI refactor.
Simplify wording and presentation of the recording feature heading,
using a more concise and familiar product-style language inspired by
well-known proprietary solutions.
Many public servants use PCs with unusual screen resolutions. The screen
height is often quite small, which caused responsiveness issues on the
vertical axis.
When opening the side panel, they could not see the button to start the
recording. I improved the vertical responsiveness to address this issue and
reduce support requests such as “I cannot see the button”.
Users typically do not think about scrolling inside the side panel, so the
layout now better fits constrained screen heights.
Eliminate the perception of being 'under development,'
which can undermine trust with potential users.
Focus on creating a more confident and reassuring experience.
The previous attempt to make the Deepgram configuration extensible
introduced unnecessary complexity for a very limited use case and
made it harder to add new STT backends.
Refactor to a deliberately simple and explicit design with minimal
cognitive overhead. Configuration is now fully driven by environment
variables and provides enough flexibility for ops to select and
parameterize the STT backend.
Until a Pull Request is merged with our changes on livekit-agent
to support Kyutai API, we will use a custom and hacky python
library made from Arnaud's researches and published on an
unofficial pypi project page.
Everything is quite "draft" but it allows us to deploy and test
in real situation the work from Arnaud.
Some system dependencies were unexpectedly missing, causing the
LiveKit agent framework to fail at runtime.
Install the required dependencies based on runtime error logs.
This fixes Docker image failures in the remote (staging) environment.
Implement Langfuse tracing integration for LLM service calls to capture
prompts, responses, latency, token usage, and errors, enabling
comprehensive monitoring and debugging of AI model interactions
for performance analysis and cost optimization.