Fix test cases for room PIN code generation that were not updated when
max retry limit was increased during code review. Aligns test expectations
with actual implementation to prevent false failures.
Enable users to join rooms via SIP telephony by:
- Dialing the SIP trunk number
- Entering the room's PIN followed by '#'
The PIN code needs to be generated before the LiveKit room is created,
allowing the owner to send invites to participants in advance.
With 10-digit PINs (10^10 combinations) and a large number of rooms
(e.g., 1M), collisions become statistically inevitable. A retry mechanism
helps reduce the chance of repeated collisions but doesn't eliminate
the overall risk.
With 100K generated PINs, the probability of at least one collision exceeds
39%, due to the birthday paradox.
To scale safely, we’ll later propose using multiple trunks. Each trunk
will handle a separate PIN namespace, and the combination of trunk_id and PIN
will ensure uniqueness. Room assignment will be evenly distributed across
trunks to balance load and minimize collisions.
Following XP principles, we’ll ship the simplest working version of this
feature. The goal is to deliver value quickly without over-engineering.
We’re not solving scaling challenges we don’t currently face.
Our production load is around 10,000 rooms — well within safe limits for
the initial implementation.
Discussion points:
- The `while` loop should be reviewed. Should we add rate limiting
for failed attempts?
- A systematic existence check before `INSERT` is more costly for a rare
event and doesn't prevent race conditions, whereas retrying on integrity
errors is more efficient overall.
- Should we add logging or monitoring to track and analyze collisions?
I tried to balance performance and simplicity while ensuring the
robustness of the PIN generation process.
Restrict access to room user permissions data by excluding this information
from room serializer response for non-admin/owner users. Previously all
members could see complete access lists. Change enforces stricter information
access control based on user role.
Spotted in #YWH-PGM14336-5.
Fix container networking issue where app-dev container couldn't resolve
localhost address when calling LiveKit API. Update configuration to use
proper container network addressing for backchannel communication between
services.
Create dedicated utility function for livekit API client initialization.
Centralizes configuration logic including custom session handling for SSL
verification. Improves code reuse across backend components that interact
with LiveKit.
Refactor BaseEgress class to leverage latest livekit-api client's custom
session support. Simplifies code by using built-in capability to disable SSL
verification in development environments instead of previous workaround.
Remove BaseEgress tests that were overly complicated and had excessive
mocking, making them unrealistic and difficult to maintain. Will replace with
more straightforward tests in future commits that better reflect actual code
behavior.
Update livekit-api dependency to most recent release, enabling custom session
configuration. New version allows disabling SSL verification in local
development environment through session parameter support.
Add expiration system for recordings.
Include option for users to set recordings as permanent (no expiration)
which is the default behavior.
System only calculates expiration dates and tracks status - actual deletion
is handled by Minio bucket lifecycle policies, not by application code.
Customize email notifications for recording availability based on each user's
language and timezone settings. Improves user experience through localized
communications.
Prioritize simple, maintainable implementation over complex code that would
form subgroups based on user preferences. Note: Changes individual email
sending instead of batch processing, which may impact performance for large
groups but is acceptable for typical recording access patterns.
Fix inconsistent test naming resulting from copy-pasted examples. Rename
tests to properly reflect their actual testing purpose and improve code
maintainability.
Add user language and timezone to serialized user data to enable frontend
customization. Allows backend email notifications to respect user's
localization preferences for improved communication relevance.
Modify media auth endpoint to properly handle recordings with "Notification
succeeded" status alongside "Saved" status. Previous code incorrectly
expected only "Saved" status, causing access issues after email notifications
were sent and status was updated.
Add recording key to serialized API response to enable frontend to generate
proper download links without additional backend calls. Simplifies media
access workflow across the application.
Implement new endpoint allowing admin/owner to invite participants via email.
Provides explicit way to search users and send meeting invitations with
direct links.
In upcoming commits, frontend will call ResourceAccess endpoint to add
invited people as members if they exist in visio, bypassing waiting room
for a smoother experience.
Fix code that accidentally exposed personal email addresses in logs during
email sending failures. Modify logging to remove identifying information
to protect user privacy while still providing useful debugging context.
Original code was inspired by Docs.
Implement secure recording file access through authentication instead of
exposing S3 bucket or using temporary signed links with loose permissions.
Inspired by docs and @spaccoud's implementation, with comprehensive
viewset checks to prevent unauthorized recording downloads.
The ingress reserved to media intercept the original request, and thanks to
Nginx annotations, check with the backend if the user is allowed to donwload
this recording file. This might introduce a dependency to Nginx in the project
by the way.
Note: Tests are integration-based rather than unit tests, requiring minio in
the compose stack and CI environment. Implementation includes known botocore
deprecation warnings that per GitHub issues won't be resolved for months.
Add Django built-in mixins to recording viewset to support individual record
retrieval. Enables frontend to access single recording details needed for
the upcoming download page implementation.
Introduce new property that verifies if a recording file has a saved
status. While the implementation is straightforward, it improves code
readability and provides a clear, semantic way to check file status.
Move logic for calculating recording keys and file extensions into proper
properties on recording objects. Simplifies access to Minio storage keys
and clearly documents expected behavior when saving recordings across the
application.
Include recording mode in serialized data to enable conditional UI elements
in frontend. Allows download controls to be dynamically enabled or disabled
based on the specific recording type being used.
Screen recording will be downloadable when transcript won't.
Implement backend method to send email notifications when screen recordings
are ready for download. Enables users to be alerted when their recordings are
available. Frontend implementation to follow in upcoming commits.
This service is triggered by the storage hook from Minio.
Add minimal unit test coverage for notification service, addressing previous
lack of tests in this area. The notification service was responsible for
calling the unstable summary service feature, which was developped way too
quickly.
The email template has been reviewed by a LLM, to make it user-friendly and
crystal clear.
Necessary in cross browser context situation, where we need to
pass data of a room newly created between two different windows.
This happens in Calendar integration.
Improves sendData reliability by preventing execution when the room
doesn’t exist.
This change addresses errors in staging and production where waiting
participants arrive before the room owner creates the room.
In remote environments, the LiveKit Python SDK doesn’t return a clean
Twirp error when the room is missing; instead of a proper "server unknown"
response, it raises a ContentTypeError, as if the LiveKit server weren’t
responding with a JSON payload, even though the code specifies otherwise.
While the issue cannot be reproduced locally,
this should help mitigate production errors.
Part of a broader effort to enhance data transmission reliability.
Importantly, a participant requesting entry to a room before the owner
arrives should not be considered an exception.
Add new endpoint to access the event-handler matching service. Route is
protected by LiveKit authentication, handle at the service level.
Enables webhook event processing through standardized API.
Create new service that matches received events with their appropriate
handlers. Provides centralized system for event routing and processing
across the application.
If an event has no handler, it would be ignored.
Implement new lobby service method to clear all participant entries from cache.
Lays foundation for upcoming feature where participant permissions reset when
meetings end. Currently introduces only the cache clearing functionality;
event handling for meeting conclusion will be implemented in future commits
Introduce new intermediate access level between public and restricted that
allows authenticated users to join rooms without admin approval. Not making
this the default level yet as current 12hr sessions would create painful
user experience for accessing rooms. Will reconsider default settings after
improving session management.
This access level definition may evolve to become stricter in the future,
potentially limiting access to authenticated users who share the same
organization as the room admin.
Replace excessive mocking with more realistic test scenarios to better
reflect actual code execution. Improves debuggability while maintaining
thorough test coverage.
Deactivate inherited user listing capability that allows authenticated users
to retrieve all application users in JSON format. This potentially unsecure
endpoint exposes user database to scraping and isn't currently used in the
application.
Implement security flag to disable access until properly refactored for
upcoming invitation feature. Will revisit and adapt endpoint behavior when
developing user invitation functionality.
Implement lobby service using cache as LiveKit doesn't natively support
secure lobby functionality. Their teams recommended to create our own
system in our app's backend.
The lobby system is totally independant of the DRF session IDs,
making the request_entry endpoint authentication agnostic.
This decoupling prevents future DRF changes from breaking lobby functionality
and makes participant tracking more explicit.
Security audit is needed as current LiveKit tokens have excessive privileges
for unprivileged users. I'll offer more option ASAP for the admin to control
participant privileges.
Race condition handling also requires improvements, but should not be critical
at this point.
A great enhancement, would be to add a webhook, notifying the backend when the
room is closed, to reset cache.
This commit makes redis a prerequesite to run the suite of tests. The readme
and CI will be updated in dedicated commits.
Replace unused is_public boolean field with access_level to allow for more
granular control. Initially maintains public/restricted functionality while
enabling future addition of "trusted" access level.
Submitting new users to the marketing service is currently handled
during signup and is performed only once.
This is a pragmatic first implementation, yet imperfect.
In the future, this should be improved by delegating the call to a Celery
worker or an async task.
Introduced a MarketingService protocol for typed marketing operations,
allowing easy integration of alternative services.
Implemented a Brevo wrapper following the protocol to decouple
the codebase from the sdk. These implementations are simple and pragmatic.
Feel free to refactor them.
Found this solution googling on Stack Overflow.
Without a default ordering on a model, Django raises a warning, that
pagination may yield inconsistent results.
Request the necessary scopes from ProConnect service.
Update configurations in every environments.
Note: ask given_name and usual_name scopes to get users' info.
(these scopes should be granted by default by ProConnect when
requesting a client id client secret)
Inspired by @sampaccoud's eee2003 commit on impress, adapt the code to be more
Pythonic. Add basic test coverage for user name synchronization on login. User
name fields now update automatically at each login when new data is available.
Note: current logic doesn't handle the case where a user with existing names
logs in with missing first/last names - should we clear the names then?
Removing a field that was present in the initial form is not a valid update
operation.
I've protected this endpoint with a feature flag, and an authentication
class, as it will be exposed on the public internet.
I've tried to keep the viewset logic as minimal as possible, I've
to ship smth and will continue iterating on this piece of code.
At some point, abstracting webhook endpoint and authentication class
might be beneficial for the project. YAGNI as of today.
A recording is savable only if it's active or stopped. In other
status (error, already saved, etc.) it won't be possible. I might
iterate on this piece of code. Let's ship it.
When a new file is uploaded to a Minio Bucket, a webhook can be
configured to notify third parties about the event. Basically,
it's a POST call with a payload providing informations on the
event that just happened.
When a recording worker will stop, it will upload its data to a Minio
bucket, which will trigger the webhook.
Try to introduce the minimalest code to parse these events, discard
them whener it's relevant, and extract the recording ID, thus we
know which recording was successfully saved to the Minio bucket.
In the longer runner, it will trigger a callback.
Avoided unnecessary manipulation of the room name to prevent issues when
starting an egress worker. Previously, hyphens were stripped from the room
name, likely inherited from the legacy setup with Jitsi in Magnify, though
the purpose of this change is unclear and might be an undesired legacy
feature.
To ensure accurate room matching during egress worker requests, this update
removes any manipulation of the room name. This approach minimizes the risk
of errors when initiating recordings and maintains the integrity of the
original room name throughout the process.
The LiveKit egress worker interactions are proxied through the backend for
security reasons. Allowing clients to directly use tokens with sufficient
grants to start recordings could lead to misuse, enabling users to spam the
egress worker API and potentially initiate a DDOS attack on the egress
service. To prevent this, only users with room-specific privileges can
initiate recordings.
We make sure only one recording at the time can be made on a room.
The requested recording mode is stored so it can be referenced later when
the recording is saved, triggering a callback action as needed.
A feature flag was also introduced for this capability; while this is a simple
approach, a more robust system for managing feature flags could be valuable
long-term. For now, KISS (Keep It Simple, Stupid) applies.
The viewset endpoints were designed to be as straightforward as possible—
let me know if anything can be improved.
Introducing a new worker service architecture. Sorry for the long commit.
This design adheres to several key principles, primarily the Single
Responsibility Principle. Dependency Injection and composition are
prioritized over inheritance, enhancing modularity and maintainability.
Interactions between the backend and external workers are encapsulated in
classes implementing a common `WorkerService` interface. I chose Protocol
over an abstract class for agility, aligning closely with static typing
without requiring inheritance. Each `WorkerService` implementation can
independently manage recordings according to its specific requirements.
This flexibility ensures that adding a new worker service, such as for
LiveKit, can be done without any change to existing components.
Configuration management is centralized in a single `WorkerServiceConfig`
class, which loads and provides all settings for different worker
implementations, keeping configurations organized and extensible. The worker
service class itself handles accessing relevant configurations as needed,
simplifying the configuration process.
A basic dictionary in Django settings acts as a factory, responsible for
instantiating the correct worker service based on the client's request mode.
This approach aligns with Django development conventions, emphasizing
simplicity. While a full factory class with a builder pattern could provide
future flexibility, YAGNI (You Aren't Gonna Need It) suggests deferring
such complexity until it’s necessary.
At the core of this design is the worker mediator, which decouples worker
service implementations from the Django ORM and manages database state
according to worker state. The mediator is purposefully limited in
responsibility, handling only what’s essential. It doesn’t instantiate
worker services directly; instead, services are injected via composition,
allowing the mediator to manage any object conforming to the `WorkerService`
interface. This setup preserves flexibility and maintains a clear
separation of responsibilities. The factory create worker services,
the mediator runs it.
(sorry for this long commit)
Add test cases for email-based user matching fallback logic:
- String comparison edge cases
- Multiple users with matching email addresses
- Invalid email format handling
Fix will follow in subsequent commit.