From c07b8f920fb00f1badd7a1d94a85545c290c3146 Mon Sep 17 00:00:00 2001 From: Martin Guitteny <“martin.guitteny@centralesupelec.fr”> Date: Mon, 29 Sep 2025 15:52:40 +0200 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D(docs)=20add=20summarization=20docu?= =?UTF-8?q?mentation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add documentation for transcription et summarization Include sequence diagrams --- docs/features/summarization.md | 4 ++ docs/features/transcription.md | 92 ++++++++++++++++++++++++++++++++++ src/summary/README.md | 8 ++- 3 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 docs/features/summarization.md create mode 100644 docs/features/transcription.md diff --git a/docs/features/summarization.md b/docs/features/summarization.md new file mode 100644 index 00000000..be14b2f8 --- /dev/null +++ b/docs/features/summarization.md @@ -0,0 +1,4 @@ + +# Meeting summarization (WIP) + +This feature is currently under development and not yet ready for production use. Documentation and detailed instructions will be provided once the feature is stable and officially released. diff --git a/docs/features/transcription.md b/docs/features/transcription.md new file mode 100644 index 00000000..e5533795 --- /dev/null +++ b/docs/features/transcription.md @@ -0,0 +1,92 @@ +# Transcription + +La Suite Meet provides a room transcription capability, currently available in beta. This feature is under active development, with ongoing enhancements planned. + +The transcription feature enables users to record room sessions. Upon completion of a recording, the room owner receives a notification containing a link to LaSuite Docs, where the transcribed meeting content can be accessed. + +> [!NOTE] +> Audio recordings are automatically deleted after the configured `RECORDING_EXPIRATION_DAYS` period. + +For configuration and setup details of the recording functionality, refer to the [Recording feature documentation](https://github.com/suitenumerique/meet/blob/main/docs/features/recording.md). +This page only describes the supplementary tools required for audio processing. + +Example of a transcript : + +``` +**SPEAKER_00**: Hello everyone! +**SPEAKER_01**: Yes, it works. +``` + + +### Current Limitations + +* Participant identification is not yet implemented; participants are labeled generically (e.g., `PARTICIPANT_1`). +* Transcription backend relies on [WhisperX](https://github.com/m-bain/whisperX), which does not provide an OpenAI-compatible API. + +> [!NOTE] +> Questions? Open an issue on [GitHub](https://github.com/suitenumerique/meet/issues/new?assignees=&labels=bug&template=Bug_report.md) or join our [Matrix community](https://matrix.to/#/#meet-official:matrix.org). + +## Special requirements + +To enable the transcription feature, the following components must be in place: + +* Recording feature components: All dependencies and configurations required for the [recording feature](https://github.com/suitenumerique/meet/blob/main/docs/features/recording.md). +* LaSuite Docs instance: A running [LaSuite Docs](https://github.com/suitenumerique/docs) capable of handling requests to the `/create-for-owner` endpoint. +* WhisperX API: A running WhisperX service. An open-source implementation combining WhisperX and FastAPI is available [here](https://github.com/suitenumerique/meet-whisperx). +* Deployment of the [summary service](https://hub.docker.com/r/lasuite/meet-summary), a Celery worker, and a Redis instance. + +## How It Works + +```mermaid +sequenceDiagram + participant Backend as Backend API + participant Summary as Summary Service + participant Celery as Celery Workers (transcribe-queue) + participant MinIO as MinIO (Object Storage) + participant STT as WhisperX API + participant Docs as LaSuite Docs + + Backend->>Summary: POST /api/v1/tasks/ (bearer token, payload) + Note right of Backend: Payload contains 7 params: owner_id, filename, email, sub, room, recording_date, recording_time + + Summary->>Celery: Register task (transcribe-queue) + Celery->>MinIO: Fetch audio file + Celery->>STT: Transcribe audio (WhisperX) + STT-->>Celery: Segmented transcript + + Celery->>Celery: Format transcript (text) + + Celery->>Docs: POST /create-for-owner (title, content, email, sub, api token) + Docs-->>Celery: Acknowledgement +``` + +## Configuration Options + +| Option | Type | Default | Description | +| ------------------------ | --------- |-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| app_name | String | `"app"` | Name of the application/service. | +| app_api_v1_str | String | `"/api/v1"` | Base path for the API endpoints. | +| app_api_token | Secret | — | API token for authenticating requests. | +| recording_max_duration | Integer | `None` | Maximum duration of audio recordings in milliseconds. Set to `None` for unlimited. Audio recordings longer than the configured limit will be ignored and not processed. | +| celery_broker_url | String | `"redis://redis/0"` | Celery broker URL. | +| celery_result_backend | String | `"redis://redis/0"` | Celery result backend URL. | +| celery_max_retries | Integer | `1` | Maximum number of retries for Celery tasks. | +| transcribe_queue | String | `"transcribe-queue"` | Name of the Celery queue for transcription tasks. | +| aws_storage_bucket_name | String | — | Name of the S3/MinIO bucket used for storing recordings. | +| aws_s3_endpoint_url | String | — | Endpoint URL of the S3/MinIO storage. | +| aws_s3_access_key_id | String | — | Access key for S3/MinIO. | +| aws_s3_secret_access_key | Secret | — | Secret key for S3/MinIO. | +| aws_s3_secure_access | Boolean | `True` | Use HTTPS for S3/MinIO requests. | +| whisperx_api_key | Secret | — | API key for accessing WhisperX. | +| whisperx_base_url | String | `"https://api.whisperx.com/v1"` | Base URL for the WhisperX API. | +| whisperx_asr_model | String | `"whisper-1"` | ASR model used for transcription. | +| whisperx_max_retries | Integer | `0` | Maximum number of retries for WhisperX API requests. | +| webhook_max_retries | Integer | `2` | Maximum retries for webhook requests. | +| webhook_status_forcelist | List[Int] | `[502, 503, 504]` | HTTP status codes triggering webhook retry. | +| webhook_backoff_factor | Float | `0.1` | Exponential backoff factor for webhook retries. | +| webhook_api_token | Secret | — | Token to authenticate incoming webhook requests. | +| webhook_url | String | — | URL to which webhook events are sent. | +| document_default_title | String | `"Transcription"` | Default title for generated documents. | +| document_title_template | String | `'Réunion "{room}" du {room_recording_date} à {room_recording_time}'` | Template for document title. | +| sentry_is_enabled | Boolean | `False` | Enable or disable Sentry error tracking. | +| sentry_dsn | String | `None` | DSN for Sentry integration. | diff --git a/src/summary/README.md b/src/summary/README.md index 3cea22ce..34246255 100644 --- a/src/summary/README.md +++ b/src/summary/README.md @@ -2,7 +2,13 @@ This is an experimental part of the stack. It currently lacks proper observability, unit tests, and other production-grade features. This serves as the base for AI features in Visio. -## Usage +## How it works + +Please refer to the [Recording feature documentation](https://github.com/suitenumerique/meet/blob/main/docs/features/recording.md) and the [Transcription feature documentation](https://github.com/suitenumerique/meet/blob/main/docs/features/transcription.md). + +## How to develop + +(To develop locally follow the instructions on [developing La Suite Meet locally](https://github.com/suitenumerique/meet/blob/main/docs/developping_locally.md)) From the root of the project: