Implemented a service that automatically creates a SIP dispatch rule when
the first WebRTC participant joins a room and removes it when the room
becomes empty.
Why? I don’t want a SIP participant to join an empty room.
The PIN code could be easily leaked, and there is currently no lobby
mechanism available for SIP participants.
A WebRTC participant is still required to create a room.
This behavior is inspired by a proprietary tool. The service uses LiveKit’s
webhook notification system to react to room lifecycle events. This is
a naive implementation that currently supports only a single SIP trunk and
will require refactoring to support multiple trunks. When no trunk is
specified, rules are created by default on a fallback trunk.
@rouja wrote a minimal Helm chart for LiveKit SIP with Asterisk, which
couldn’t be versioned yet due to embedded credentials. I deployed it
locally and successfully tested the integration with a remote
OVH SIP trunk.
One point to note: LiveKit lacks advanced filtering capabilities when
listing dispatch rules. Their recommendation is to fetch all rules and
filter them within your backend logic. I’ve opened a feature request asking
for at least the ability to filter dispatch rules by room, since filtering
by trunk is already supported, room-based filtering feels like a natural
addition.
Until there's an update, I prefer to keep the implementation simple.
It works well at our current scale, and can be refactored when higher load
or multi-trunk support becomes necessary.
While caching dispatch rule IDs could be a performance optimization,
I feel it would be premature and potentially error-prone due to the complexity
of invalidation. If performance becomes an issue, I’ll consider introducing
caching at that point. To handle the edge case where multiple dispatch rules
with different PIN codes are present, the service performs an extensive
cleanup during room creation to ensure SIP routing remains clean and
predictable. This edge case should not happen.
In the 'delete_dispatch_rule' if deleting one rule fails, method would exit
without deleting the other rules. It's okay IMO for a first iteration.
If multiple dispatch rules are often found for room, I would enhance this part.
Remove assertion statement that was placed after code expected to raise an
exception. The assertion was never evaluated due to the exception flow,
making the test ineffective.
Rework regex pattern to exclude empty string matches since
url_encoded_folder_path is optional.
Add additional test cases covering edge cases and failure
scenarios to improve validation coverage
and prevent false positives.
Handle unhandled exceptions to prevent UX impact. Marketing email operations
are optional and should not disrupt core functionality.
My first implementation was imperfect, raising error in sentry.
Fix test cases for room PIN code generation that were not updated when
max retry limit was increased during code review. Aligns test expectations
with actual implementation to prevent false failures.
Enable users to join rooms via SIP telephony by:
- Dialing the SIP trunk number
- Entering the room's PIN followed by '#'
The PIN code needs to be generated before the LiveKit room is created,
allowing the owner to send invites to participants in advance.
With 10-digit PINs (10^10 combinations) and a large number of rooms
(e.g., 1M), collisions become statistically inevitable. A retry mechanism
helps reduce the chance of repeated collisions but doesn't eliminate
the overall risk.
With 100K generated PINs, the probability of at least one collision exceeds
39%, due to the birthday paradox.
To scale safely, we’ll later propose using multiple trunks. Each trunk
will handle a separate PIN namespace, and the combination of trunk_id and PIN
will ensure uniqueness. Room assignment will be evenly distributed across
trunks to balance load and minimize collisions.
Following XP principles, we’ll ship the simplest working version of this
feature. The goal is to deliver value quickly without over-engineering.
We’re not solving scaling challenges we don’t currently face.
Our production load is around 10,000 rooms — well within safe limits for
the initial implementation.
Discussion points:
- The `while` loop should be reviewed. Should we add rate limiting
for failed attempts?
- A systematic existence check before `INSERT` is more costly for a rare
event and doesn't prevent race conditions, whereas retrying on integrity
errors is more efficient overall.
- Should we add logging or monitoring to track and analyze collisions?
I tried to balance performance and simplicity while ensuring the
robustness of the PIN generation process.
Restrict access to room user permissions data by excluding this information
from room serializer response for non-admin/owner users. Previously all
members could see complete access lists. Change enforces stricter information
access control based on user role.
Spotted in #YWH-PGM14336-5.
Fix container networking issue where app-dev container couldn't resolve
localhost address when calling LiveKit API. Update configuration to use
proper container network addressing for backchannel communication between
services.
Create dedicated utility function for livekit API client initialization.
Centralizes configuration logic including custom session handling for SSL
verification. Improves code reuse across backend components that interact
with LiveKit.
Refactor BaseEgress class to leverage latest livekit-api client's custom
session support. Simplifies code by using built-in capability to disable SSL
verification in development environments instead of previous workaround.
Remove BaseEgress tests that were overly complicated and had excessive
mocking, making them unrealistic and difficult to maintain. Will replace with
more straightforward tests in future commits that better reflect actual code
behavior.
Update livekit-api dependency to most recent release, enabling custom session
configuration. New version allows disabling SSL verification in local
development environment through session parameter support.
Add expiration system for recordings.
Include option for users to set recordings as permanent (no expiration)
which is the default behavior.
System only calculates expiration dates and tracks status - actual deletion
is handled by Minio bucket lifecycle policies, not by application code.
Customize email notifications for recording availability based on each user's
language and timezone settings. Improves user experience through localized
communications.
Prioritize simple, maintainable implementation over complex code that would
form subgroups based on user preferences. Note: Changes individual email
sending instead of batch processing, which may impact performance for large
groups but is acceptable for typical recording access patterns.
Fix inconsistent test naming resulting from copy-pasted examples. Rename
tests to properly reflect their actual testing purpose and improve code
maintainability.
Add user language and timezone to serialized user data to enable frontend
customization. Allows backend email notifications to respect user's
localization preferences for improved communication relevance.
Modify media auth endpoint to properly handle recordings with "Notification
succeeded" status alongside "Saved" status. Previous code incorrectly
expected only "Saved" status, causing access issues after email notifications
were sent and status was updated.
Add recording key to serialized API response to enable frontend to generate
proper download links without additional backend calls. Simplifies media
access workflow across the application.
Implement new endpoint allowing admin/owner to invite participants via email.
Provides explicit way to search users and send meeting invitations with
direct links.
In upcoming commits, frontend will call ResourceAccess endpoint to add
invited people as members if they exist in visio, bypassing waiting room
for a smoother experience.
Fix code that accidentally exposed personal email addresses in logs during
email sending failures. Modify logging to remove identifying information
to protect user privacy while still providing useful debugging context.
Original code was inspired by Docs.
Implement secure recording file access through authentication instead of
exposing S3 bucket or using temporary signed links with loose permissions.
Inspired by docs and @spaccoud's implementation, with comprehensive
viewset checks to prevent unauthorized recording downloads.
The ingress reserved to media intercept the original request, and thanks to
Nginx annotations, check with the backend if the user is allowed to donwload
this recording file. This might introduce a dependency to Nginx in the project
by the way.
Note: Tests are integration-based rather than unit tests, requiring minio in
the compose stack and CI environment. Implementation includes known botocore
deprecation warnings that per GitHub issues won't be resolved for months.
Add Django built-in mixins to recording viewset to support individual record
retrieval. Enables frontend to access single recording details needed for
the upcoming download page implementation.
Introduce new property that verifies if a recording file has a saved
status. While the implementation is straightforward, it improves code
readability and provides a clear, semantic way to check file status.
Move logic for calculating recording keys and file extensions into proper
properties on recording objects. Simplifies access to Minio storage keys
and clearly documents expected behavior when saving recordings across the
application.
Include recording mode in serialized data to enable conditional UI elements
in frontend. Allows download controls to be dynamically enabled or disabled
based on the specific recording type being used.
Screen recording will be downloadable when transcript won't.
Implement backend method to send email notifications when screen recordings
are ready for download. Enables users to be alerted when their recordings are
available. Frontend implementation to follow in upcoming commits.
This service is triggered by the storage hook from Minio.
Add minimal unit test coverage for notification service, addressing previous
lack of tests in this area. The notification service was responsible for
calling the unstable summary service feature, which was developped way too
quickly.
The email template has been reviewed by a LLM, to make it user-friendly and
crystal clear.
Necessary in cross browser context situation, where we need to
pass data of a room newly created between two different windows.
This happens in Calendar integration.
Improves sendData reliability by preventing execution when the room
doesn’t exist.
This change addresses errors in staging and production where waiting
participants arrive before the room owner creates the room.
In remote environments, the LiveKit Python SDK doesn’t return a clean
Twirp error when the room is missing; instead of a proper "server unknown"
response, it raises a ContentTypeError, as if the LiveKit server weren’t
responding with a JSON payload, even though the code specifies otherwise.
While the issue cannot be reproduced locally,
this should help mitigate production errors.
Part of a broader effort to enhance data transmission reliability.
Importantly, a participant requesting entry to a room before the owner
arrives should not be considered an exception.
Add new endpoint to access the event-handler matching service. Route is
protected by LiveKit authentication, handle at the service level.
Enables webhook event processing through standardized API.
Create new service that matches received events with their appropriate
handlers. Provides centralized system for event routing and processing
across the application.
If an event has no handler, it would be ignored.
Implement new lobby service method to clear all participant entries from cache.
Lays foundation for upcoming feature where participant permissions reset when
meetings end. Currently introduces only the cache clearing functionality;
event handling for meeting conclusion will be implemented in future commits
Introduce new intermediate access level between public and restricted that
allows authenticated users to join rooms without admin approval. Not making
this the default level yet as current 12hr sessions would create painful
user experience for accessing rooms. Will reconsider default settings after
improving session management.
This access level definition may evolve to become stricter in the future,
potentially limiting access to authenticated users who share the same
organization as the room admin.
Replace excessive mocking with more realistic test scenarios to better
reflect actual code execution. Improves debuggability while maintaining
thorough test coverage.
Deactivate inherited user listing capability that allows authenticated users
to retrieve all application users in JSON format. This potentially unsecure
endpoint exposes user database to scraping and isn't currently used in the
application.
Implement security flag to disable access until properly refactored for
upcoming invitation feature. Will revisit and adapt endpoint behavior when
developing user invitation functionality.
Implement lobby service using cache as LiveKit doesn't natively support
secure lobby functionality. Their teams recommended to create our own
system in our app's backend.
The lobby system is totally independant of the DRF session IDs,
making the request_entry endpoint authentication agnostic.
This decoupling prevents future DRF changes from breaking lobby functionality
and makes participant tracking more explicit.
Security audit is needed as current LiveKit tokens have excessive privileges
for unprivileged users. I'll offer more option ASAP for the admin to control
participant privileges.
Race condition handling also requires improvements, but should not be critical
at this point.
A great enhancement, would be to add a webhook, notifying the backend when the
room is closed, to reset cache.
This commit makes redis a prerequesite to run the suite of tests. The readme
and CI will be updated in dedicated commits.
Replace unused is_public boolean field with access_level to allow for more
granular control. Initially maintains public/restricted functionality while
enabling future addition of "trusted" access level.
Submitting new users to the marketing service is currently handled
during signup and is performed only once.
This is a pragmatic first implementation, yet imperfect.
In the future, this should be improved by delegating the call to a Celery
worker or an async task.
Introduced a MarketingService protocol for typed marketing operations,
allowing easy integration of alternative services.
Implemented a Brevo wrapper following the protocol to decouple
the codebase from the sdk. These implementations are simple and pragmatic.
Feel free to refactor them.
Found this solution googling on Stack Overflow.
Without a default ordering on a model, Django raises a warning, that
pagination may yield inconsistent results.
Request the necessary scopes from ProConnect service.
Update configurations in every environments.
Note: ask given_name and usual_name scopes to get users' info.
(these scopes should be granted by default by ProConnect when
requesting a client id client secret)