Remove call to generate demo data in tilt stack as it was never useful
to developers and only complicated the migration job unnecessarily.
Migration job should be laser focused on applying database migrations
rather than seeding mock data, improving clarity and reducing
complexity.
Replace mock Django secret key with longer version to resolve security
warnings in development stack.
Still not production-suitable as key remains versioned in repository,
but eliminates security warnings during development workflow.
Remove dependencies on bitnami Helm charts since recent changes in
bitnami organization led to charts no longer being maintained or
published.
Enhanced the Tilt dependencies to avoid any bootstrap or refresh
errors while developping using the Tilt stack.
Making components dependant from each others increase slightly
the time required to spin up the stack the first time.
Replace default "visio" with "LaSuite Meet" and allow env variable
customization. Default Docker image uses "LaSuite Meet", but deployments
can override with custom values by setting env vars and rebuilding.
Our Helm chart wasn't suitable for use with Helm alone because jobs
remained after deployment. We chose to configure ttlSecondsAfterFinished
to clean up jobs after a period of time.
Enable users to join rooms via SIP telephony by:
- Dialing the SIP trunk number
- Entering the room's PIN followed by '#'
The PIN code needs to be generated before the LiveKit room is created,
allowing the owner to send invites to participants in advance.
With 10-digit PINs (10^10 combinations) and a large number of rooms
(e.g., 1M), collisions become statistically inevitable. A retry mechanism
helps reduce the chance of repeated collisions but doesn't eliminate
the overall risk.
With 100K generated PINs, the probability of at least one collision exceeds
39%, due to the birthday paradox.
To scale safely, we’ll later propose using multiple trunks. Each trunk
will handle a separate PIN namespace, and the combination of trunk_id and PIN
will ensure uniqueness. Room assignment will be evenly distributed across
trunks to balance load and minimize collisions.
Following XP principles, we’ll ship the simplest working version of this
feature. The goal is to deliver value quickly without over-engineering.
We’re not solving scaling challenges we don’t currently face.
Our production load is around 10,000 rooms — well within safe limits for
the initial implementation.
Discussion points:
- The `while` loop should be reviewed. Should we add rate limiting
for failed attempts?
- A systematic existence check before `INSERT` is more costly for a rare
event and doesn't prevent race conditions, whereas retrying on integrity
errors is more efficient overall.
- Should we add logging or monitoring to track and analyze collisions?
I tried to balance performance and simplicity while ensuring the
robustness of the PIN generation process.
Refactor BaseEgress class to leverage latest livekit-api client's custom
session support. Simplifies code by using built-in capability to disable SSL
verification in development environments instead of previous workaround.
Add new application base URL configuration setting. While somewhat redundant
with existing domain setting, these serve different purposes in the
application. Base URL will be used for constructing complete URLs in
notifications and external references.
Implement secure recording file access through authentication instead of
exposing S3 bucket or using temporary signed links with loose permissions.
Inspired by docs and @spaccoud's implementation, with comprehensive
viewset checks to prevent unauthorized recording downloads.
The ingress reserved to media intercept the original request, and thanks to
Nginx annotations, check with the backend if the user is allowed to donwload
this recording file. This might introduce a dependency to Nginx in the project
by the way.
Note: Tests are integration-based rather than unit tests, requiring minio in
the compose stack and CI environment. Implementation includes known botocore
deprecation warnings that per GitHub issues won't be resolved for months.
Implement backend method to send email notifications when screen recordings
are ready for download. Enables users to be alerted when their recordings are
available. Frontend implementation to follow in upcoming commits.
This service is triggered by the storage hook from Minio.
Add minimal unit test coverage for notification service, addressing previous
lack of tests in this area. The notification service was responsible for
calling the unstable summary service feature, which was developped way too
quickly.
The email template has been reviewed by a LLM, to make it user-friendly and
crystal clear.
Override LiveKit Docker image to include nip.io Certificate Authority for
development environment. Addresses issue where LiveKit webhook calls fail in
dev mode due to unknown CA. Custom image places certificate in appropriate
location since LiveKit chart lacks volume mounting options for CA certs or
webhook SSL disabling capabilities.
Discussed with @rouja.
Enable LiveKit webhook feature to notify backend when events occur in rooms.
Configure LiveKit to call our endpoint whenever events are triggered,
providing real-time updates on room activities. Refer to LiveKit
documentation or LiveKitWebhookEventType enum for complete list of available
events.
This commit is not functionnal, LiveKit fails verifying our backend's
certificate. It will be fixed in the upcoming commits.
LiveKit uses aiohttp which relies on the ssl module under the hood.
Set certificate file using an env variable, similar to @rouja's fix
for the request module.
This tweak applies only in the dev environment.
Avoid disabling SSL verification in development environment,
simply mount in the right folder, an extra volume, that declares
the certificate authority necessary to validate nip.io domains.
Refactored ClusterSecretStore and ExternalSecret deployment to support
VaultWarden custom fields beyond login/password, including multi-line
values via file input. Also made the secret template name configurable
for added flexibility.
ClusterSecretStore are supposed to be cluster-wide objects, it's useless
to precise any namespace.
Offer a standalone dev environment or a dinum specific dev
environment with ProConnect authentication.
Needed to refactor the way secrets are managed in the project,
and also re-organize the Helm chart to make it totally standalone.
Particulary useful for external wanting to run the project.
Work done by @rouja.
Configure dev and staging environment to use our self-deployed
models (Whisper and LLM). Secrets need to be updated btw.
Because of outscale LB bug, which timeout after 60s, we need to
connect directly to the svc.
Draft a piece of code to try the feature in staging. I'll consolidate this
implementation ASAP, as soon we have a first implementation functional.
What's missing?
- when owners are multiple
- retry when the backend cannot reach the summary service
- factorize the key oneliner, duplicated from the egress service
- optimize SQL query
- unit tests
Update values for dev and staging environment to enable
recording-related endpoints. A new secret need to be created.
Production values will be added in an upcoming commit.
Request the necessary scopes from ProConnect service.
Update configurations in every environments.
Note: ask given_name and usual_name scopes to get users' info.
(these scopes should be granted by default by ProConnect when
requesting a client id client secret)
Egress is already deployed in staging. But, while
working locally on feature relying on Egress, it's not
suitable to test your development or iterate.
Especially I'll need to test the connection between the Egress
and the minio bucket in my next PR.
We faced quite a few issue while starting the whole stack.
Egress didn't want to start. Its connection with the livekit server
while the egress participant was joining the room was not successful.
The Turn part of the livekit server helm chart was activated. We needed
to update few values to in the helm configuration to enabled this turn.
Updated CoreDNS to expose Egress pod. Egress tries connecting to MinIO at
127.0.0.1, where no instance exists. Using minio.127.0.0.1.nip.io resolves
to 127.0.0.1, causing Egress to connect to itself for uploads. The CoreDNS
rewrite directs this to the Ingress IP, correctly routing to MinIO.
Inspired by Impress and @sampaccoud's work.
We use Indie Hoster’s Kubernetes objects in staging and production.
In the "dev" environment, we install the `bitnami/minio` chart to mimic
Indie Hoster’s MinIO setup.
To access the MinIO admin interface in dev, use port forwarding;
the interface runs on port 9001.
Based on feedback from @rouja, I've updated the Helm configuration for PostHog
to use separate ingress resources for each service. Although the documentation
suggests sharing the same ingress, the services have different externalName
values, which conflicts with the use of a vhost in the ingress annotations.
This change ensures proper service redirection by aligning each service with
its own ingress.
Declare the expected support and analytics env variables
expected by the frontend for each environment. To avoid consuming
too much credits from our PostHog free tier plan.
Updated Django's ALLOWED_HOSTS setting from '*' to the specific host of the
server. Setting ALLOWED_HOSTS to '*' is a security risk as it allows any host
to access the application, potentially exposing it to malicious attacks.
Restricting ALLOWED_HOSTS to the server's host ensures only legitimate
requests are processed.
In a Kubernetes environment, we also needed to whitelist the pod's IP address
to allow health checks to pass. This ensures that Kubernetes liveness and
readiness probes can access the application to verify its health.
Silenced certain Django security warnings because the application is served
behind a reverse proxy. These warnings are not applicable in our deployment
context, where the reverse proxy handles these security concerns.
This change ensures relevant security measures are appropriately managed
while avoiding unnecessary warnings. Any question? asked @rouja.
/!\ actually, this commit is not working, and should be fixed.
Require users to create a room in the database
before requesting a LiveKit token.
If user request an access token for a room that doesn't
exist in our db, its request would end in a 404 error.
Ensure that rooms must be registered by a user before they can be accessed.
By default, all created rooms remain public, allowing anonymous users to join
any room created by a logged-in user.
However, anonymous users cannot create rooms themselves.
I have set up distinct namespaces for each environment. You can now push
events to the development namespace without affecting production data.
Please note that these keys are not 'secret'. They will also be configured
in the browser SDK, which is inherently insecure. The documentation does not
specify a secure storage method for these keys.
It seems appropriate that backend owns the responsability of knowing any
information/configurations of the LiveKit server. Then, it shares those
with the frontend.
Please see my previous commit to understand why environment variables are
not appropriate for deployment in several remove environments.
As of today, the LiveKit server URL is the only configuration exposed
dynamically to the frontend. Thus, it doesn't justify adding a new route
to the API, responsible for exposing configurations (e.g. /configuration).
As the frontend needs to call the backend when it wants to initiate a new
webconference room, let's pass the server URL when retrieving the room's token.
It is relevant, to get both the room location and the keys to open the room in
the same call.
I prefered to be pragmatic, if the need appears any soon, I would refactor
these parts.
Discussed IRL with @manuhabitela. In developpement, we build locally the
Docker image. Thus, we can pass values to the frontend before the npm build
command was called.
Environment variables are great for configuration, and work perfectly in dev
mode, building Docker image on the fly.
However, in other environment (e.g. staging, pre-prod, prod) we'll pull a common
Docker image published in a remote registry. All cited environments should use
the same Docker image to make tests/deployment reproducible between envs.
As the Docker image is not rebuilt on the fly, we cannot easily configure
customized environment variables for each environment.
The API base URL would have a different value for each environment, and would
require a different environment variable.
Inspired by Impress works, if no environment variable is passed for the API URL,
the window origin will be used, and then the API path will be appended.
Frontend and backend are always deployed on the same URL, usually frontend
is at the '/' route, and backend at the '/api/vXX/' route.
If any configuration are required for each remote environment, they would be
retrieved from the API at runtime.
Voila! Don't hesitate to challenge this commit.
Done:
- Rename all occurrences of "impress" to "meet".
- Update Agent Connect secrets credentials for the dev environment.
- Add new development secrets for LiveKit.
- Remove Minio from the dev stack (no cold storage required).
- Add LiveKit chart to the stack.
- Remove templates and values related to the WebSocket server.
The integration of LiveKit was inspired by an example from the "numerique-gouve/infrastructure" repo.
However, a notable issue persists with LiveKit's default chart: we are unable to override
the namespace, resulting in all LiveKit components running in the default namespace.
thx to @rouja for his help.
I have created two new repositories on DockerHub, one for the currently
existing backend image, and one for the future frontend image.
I searched-replaced all occurences of "lasuite/impress-frontend" or "lasuite/impress-backend".
One image won't exist anymore, "impress-y-webrtc-signaling", I have
removed the steps building and pushing its image to the DockerHub account.
This commit introduces a boilerplate inspired by https://github.com/numerique-gouv/impress.
The code has been cleaned to remove unnecessary Impress logic and dependencies.
Changes made:
- Removed Minio, WebRTC, and create bucket from the stack.
- Removed the Next.js frontend (it will be replaced by Vite).
- Cleaned up impress-specific backend logics.
The whole stack remains functional:
- All tests pass.
- Linter checks pass.
- Agent Connexion sources are already set-up.
Why clear out the code?
To adhere to the KISS principle, we aim to maintain a minimalist codebase. Cloning Impress
allowed us to quickly inherit its code quality tools and deployment configurations for staging,
pre-production, and production environments.
What’s broken?
- The tsclient is not functional anymore.
- Some make commands need to be fixed.
- Helm sources are outdated.
- Naming across the project sources are inconsistent (impress, visio, etc.)
- CI is not configured properly.
This list might be incomplete. Let's grind it.