This repository has been archived on 2026-03-27. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
drive/docs/s3-layout.md
Sienna Meridian Satterwhite 58237d9e44 Initial commit — Drive, an S3 file browser with WOPI editing
Lightweight replacement for the upstream La Suite Numérique drive
(Django/Celery/Next.js) built as a single Deno binary.

Server (Deno + Hono):
- S3 file operations via AWS SigV4 (no SDK) with pre-signed URLs
- WOPI host for Collabora Online (CheckFileInfo, GetFile, PutFile, locks)
- Ory Kratos session auth + CSRF protection
- Ory Keto permission model (OPL namespaces, not yet wired to routes)
- PostgreSQL metadata with recursive folder sizes
- S3 backfill API for registering files uploaded outside the UI
- OpenTelemetry tracing + metrics (opt-in via OTEL_ENABLED)

Frontend (React 19 + Cunningham v4 + react-aria):
- File browser with GridList, keyboard nav, multi-select
- Collabora editor iframe (full-screen, form POST, postMessage)
- Profile menu, waffle menu, drag-drop upload, asset type badges
- La Suite integration service theming (runtime CSS)

Testing (549 tests):
- 235 server unit tests (Deno) — 90%+ coverage
- 278 UI unit tests (Vitest) — 90%+ coverage
- 11 E2E tests (Playwright)
- 12 integration service tests (Playwright)
- 13 WOPI integration tests (Playwright + Docker Compose + Collabora)

MIT licensed.
2026-03-25 18:28:37 +00:00

8.0 KiB

S3 Layout

How files are stored in SeaweedFS, and why you can browse the bucket and actually understand what you're looking at.


Key Convention

S3 keys are human-readable paths. This is intentional, and we'd do it again.

Personal files

{identity-id}/my-files/{path}/{filename}

Examples:

a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/quarterly-report.docx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/level-01.fbx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Documents/meeting-notes.odt

The identity ID is the Kratos identity UUID. my-files is a fixed segment that separates the identity prefix from user content.

Shared files

shared/{path}/{filename}

Examples:

shared/team-assets/brand-guide.pdf
shared/templates/invoice-template.xlsx

Shared files use "shared" as the owner ID.

Folders

Folder keys end with a trailing slash:

a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/

Why Human-Readable?

Most S3-backed file systems use UUIDs as keys (files/550e8400-e29b-41d4.bin). We don't. You should be able to s3cmd ls the bucket and immediately see who owns what, what the folder structure looks like, and what the files are.

This pays for itself when debugging, doing backfills, migrating data, or when someone asks "where did my file go?" — you can answer by looking at S3 directly, no database cross-referencing required.

The tradeoff: renames and moves require S3 copy + delete (S3 has no rename operation). The updateFile handler in server/files.ts handles this:

if (newS3Key !== file.s3_key && !file.is_folder && Number(file.size) > 0) {
  await copyObject(file.s3_key, newS3Key);
  await deleteObject(file.s3_key);
}

The PostgreSQL Metadata Layer

S3 stores the bytes. PostgreSQL tracks everything else.

The files Table

CREATE TABLE files (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  s3_key      TEXT NOT NULL UNIQUE,
  filename    TEXT NOT NULL,
  mimetype    TEXT NOT NULL DEFAULT 'application/octet-stream',
  size        BIGINT NOT NULL DEFAULT 0,
  owner_id    TEXT NOT NULL,
  parent_id   UUID REFERENCES files(id) ON DELETE CASCADE,
  is_folder   BOOLEAN NOT NULL DEFAULT false,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  deleted_at  TIMESTAMPTZ
);

Design decisions worth noting:

  • UUID primary key — API routes use UUIDs, not paths. Decouples the URL space from the S3 key space.
  • s3_key is unique — the bridge between metadata and storage. UUID → s3_key and s3_key → metadata, both directions work.
  • parent_id is self-referencing — folders and files share a table. A folder is a row with is_folder = true. The hierarchy is a tree of parent_id pointers.
  • Soft deletedeleted_at gets set, the row stays. Trash view queries deleted_at IS NOT NULL. Restore clears it.
  • owner_id is a text field — not a FK to a users table, because there is no users table. It's the Kratos identity UUID. We don't duplicate identity data locally.

Indexes:

CREATE INDEX idx_files_parent ON files(parent_id) WHERE deleted_at IS NULL;
CREATE INDEX idx_files_owner  ON files(owner_id)  WHERE deleted_at IS NULL;
CREATE INDEX idx_files_s3key  ON files(s3_key);

Partial indexes on parent_id and owner_id exclude soft-deleted rows — most queries filter on deleted_at IS NULL, so the indexes stay lean. Deleted files don't bloat the hot path.

The user_file_state Table

CREATE TABLE user_file_state (
  user_id     TEXT NOT NULL,
  file_id     UUID NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  favorited   BOOLEAN NOT NULL DEFAULT false,
  last_opened TIMESTAMPTZ,
  PRIMARY KEY (user_id, file_id)
);

Per-user state that doesn't belong on the file itself. Favorites and recent files are powered by this table.


Folder Sizes

Folders show their total size (all descendants, recursively). Two PostgreSQL functions handle this.

recompute_folder_size(folder_id)

Recursive CTE that walks all descendants:

CREATE OR REPLACE FUNCTION recompute_folder_size(folder_id UUID)
RETURNS BIGINT LANGUAGE SQL AS $$
  WITH RECURSIVE descendants AS (
    SELECT id, size, is_folder
    FROM files
    WHERE parent_id = folder_id AND deleted_at IS NULL
    UNION ALL
    SELECT f.id, f.size, f.is_folder
    FROM files f
    JOIN descendants d ON f.parent_id = d.id
    WHERE f.deleted_at IS NULL
  )
  SELECT COALESCE(SUM(size) FILTER (WHERE NOT is_folder), 0)
  FROM descendants;
$$;

Sums file sizes only (not folders) to avoid double-counting. Excludes soft-deleted items.

propagate_folder_sizes(start_parent_id)

Walks up the ancestor chain, recomputing each folder's size:

CREATE OR REPLACE FUNCTION propagate_folder_sizes(start_parent_id UUID)
RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
  current_id UUID := start_parent_id;
  computed  BIGINT;
BEGIN
  WHILE current_id IS NOT NULL LOOP
    computed := recompute_folder_size(current_id);
    UPDATE files SET size = computed WHERE id = current_id AND is_folder = true;
    SELECT parent_id INTO current_id FROM files WHERE id = current_id;
  END LOOP;
END;
$$;

Called after every file mutation: create, delete, restore, move, upload completion. The handler calls it like:

await sql`SELECT propagate_folder_sizes(${parentId}::uuid)`;

When a file moves between folders, both the old and new parent chains get recomputed:

await propagateFolderSizes(newParentId);
if (oldParentId && oldParentId !== newParentId) {
  await propagateFolderSizes(oldParentId);
}

The Backfill API

S3 and the database can get out of sync — someone uploaded directly to SeaweedFS, a migration didn't finish, a backup restore happened. The backfill API reconciles them.

POST /api/admin/backfill
Content-Type: application/json

{
  "prefix": "",
  "dry_run": true
}

Both fields are optional. prefix filters by S3 key prefix. dry_run shows what would happen without writing anything.

Response:

{
  "scanned": 847,
  "already_registered": 812,
  "folders_created": 15,
  "files_created": 20,
  "errors": [],
  "dry_run": false
}

What it does

  1. Lists all objects in the bucket (paginated, 1000 at a time)
  2. Loads existing s3_key values from PostgreSQL into a Set
  3. For each S3 object not in the database:
    • Parses the key → owner ID, path, filename
    • Infers mimetype from extension (extensive map in server/backfill.ts — documents, images, video, audio, 3D formats, code, archives)
    • HEAD on the object for real content-type and size
    • Creates parent folder rows recursively if missing
    • Inserts the file row
  4. Recomputes folder sizes for every folder

The key parsing handles both conventions:

// {identity-id}/my-files/{path} -> owner = identity-id
// shared/{path}                 -> owner = "shared"

When to reach for it

  • After bulk-uploading files directly to SeaweedFS
  • After migrating from another storage system
  • After restoring from a backup
  • Any time S3 has files the database doesn't know about

The endpoint requires an authenticated session but isn't exposed via ingress — admin-only, on purpose.


S3 Client

The S3 client (server/s3.ts) does AWS Signature V4 with Web Crypto. No AWS SDK — it's a lot of dependency for six API calls. Implements:

  • listObjects — ListObjectsV2 with pagination
  • headObject — HEAD for content-type and size
  • getObject — GET (streaming response)
  • putObject — PUT with content hash
  • deleteObject — DELETE (404 is not an error)
  • copyObject — PUT with x-amz-copy-source header

Pre-signed URLs (server/s3-presign.ts) support:

  • presignGetUrl — download directly from S3
  • presignPutUrl — upload directly to S3
  • createMultipartUpload + presignUploadPart + completeMultipartUpload — large file uploads

Default pre-signed URL expiry is 1 hour.