docs/s3-layout.md

# S3 Layout

How files are stored in SeaweedFS, and why you can browse the bucket and actually understand what you're looking at.

---

## Key Convention

S3 keys are human-readable paths. This is intentional, and we'd do it again.

### Personal files

```
{identity-id}/my-files/{path}/{filename}
```

Examples:
```
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/quarterly-report.docx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/level-01.fbx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Documents/meeting-notes.odt
```

The identity ID is the Kratos identity UUID. `my-files` is a fixed segment that separates the identity prefix from user content.

### Shared files

```
shared/{path}/{filename}
```

Examples:
```
shared/team-assets/brand-guide.pdf
shared/templates/invoice-template.xlsx
```

Shared files use `"shared"` as the owner ID.

### Folders

Folder keys end with a trailing slash:
```
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/
```

### Why Human-Readable?

Most S3-backed file systems use UUIDs as keys (`files/550e8400-e29b-41d4.bin`). We don't. You should be able to `s3cmd ls` the bucket and immediately see who owns what, what the folder structure looks like, and what the files are.

This pays for itself when debugging, doing backfills, migrating data, or when someone asks "where did my file go?" — you can answer by looking at S3 directly, no database cross-referencing required.

The tradeoff: renames and moves require S3 copy + delete (S3 has no rename operation). The `updateFile` handler in `server/files.ts` handles this:

```typescript
if (newS3Key !== file.s3_key && !file.is_folder && Number(file.size) > 0) {
  await copyObject(file.s3_key, newS3Key);
  await deleteObject(file.s3_key);
}
```

---

## The PostgreSQL Metadata Layer

S3 stores the bytes. PostgreSQL tracks everything else.

### The `files` Table

```sql
CREATE TABLE files (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  s3_key      TEXT NOT NULL UNIQUE,
  filename    TEXT NOT NULL,
  mimetype    TEXT NOT NULL DEFAULT 'application/octet-stream',
  size        BIGINT NOT NULL DEFAULT 0,
  owner_id    TEXT NOT NULL,
  parent_id   UUID REFERENCES files(id) ON DELETE CASCADE,
  is_folder   BOOLEAN NOT NULL DEFAULT false,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  deleted_at  TIMESTAMPTZ
);
```

Design decisions worth noting:

- **UUID primary key** — API routes use UUIDs, not paths. Decouples the URL space from the S3 key space.
- **`s3_key` is unique** — the bridge between metadata and storage. UUID → s3_key and s3_key → metadata, both directions work.
- **`parent_id` is self-referencing** — folders and files share a table. A folder is a row with `is_folder = true`. The hierarchy is a tree of `parent_id` pointers.
- **Soft delete** — `deleted_at` gets set, the row stays. Trash view queries `deleted_at IS NOT NULL`. Restore clears it.
- **`owner_id` is a text field** — not a FK to a users table, because there is no users table. It's the Kratos identity UUID. We don't duplicate identity data locally.

Indexes:

```sql
CREATE INDEX idx_files_parent ON files(parent_id) WHERE deleted_at IS NULL;
CREATE INDEX idx_files_owner  ON files(owner_id)  WHERE deleted_at IS NULL;
CREATE INDEX idx_files_s3key  ON files(s3_key);
```

Partial indexes on `parent_id` and `owner_id` exclude soft-deleted rows — most queries filter on `deleted_at IS NULL`, so the indexes stay lean. Deleted files don't bloat the hot path.

### The `user_file_state` Table

```sql
CREATE TABLE user_file_state (
  user_id     TEXT NOT NULL,
  file_id     UUID NOT NULL REFERENCES files(id) ON DELETE CASCADE,
  favorited   BOOLEAN NOT NULL DEFAULT false,
  last_opened TIMESTAMPTZ,
  PRIMARY KEY (user_id, file_id)
);
```

Per-user state that doesn't belong on the file itself. Favorites and recent files are powered by this table.

---

## Folder Sizes

Folders show their total size (all descendants, recursively). Two PostgreSQL functions handle this.

### `recompute_folder_size(folder_id)`

Recursive CTE that walks all descendants:

```sql
CREATE OR REPLACE FUNCTION recompute_folder_size(folder_id UUID)
RETURNS BIGINT LANGUAGE SQL AS $$
  WITH RECURSIVE descendants AS (
    SELECT id, size, is_folder
    FROM files
    WHERE parent_id = folder_id AND deleted_at IS NULL
    UNION ALL
    SELECT f.id, f.size, f.is_folder
    FROM files f
    JOIN descendants d ON f.parent_id = d.id
    WHERE f.deleted_at IS NULL
  )
  SELECT COALESCE(SUM(size) FILTER (WHERE NOT is_folder), 0)
  FROM descendants;
$$;
```

Sums file sizes only (not folders) to avoid double-counting. Excludes soft-deleted items.

### `propagate_folder_sizes(start_parent_id)`

Walks up the ancestor chain, recomputing each folder's size:

```sql
CREATE OR REPLACE FUNCTION propagate_folder_sizes(start_parent_id UUID)
RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
  current_id UUID := start_parent_id;
  computed  BIGINT;
BEGIN
  WHILE current_id IS NOT NULL LOOP
    computed := recompute_folder_size(current_id);
    UPDATE files SET size = computed WHERE id = current_id AND is_folder = true;
    SELECT parent_id INTO current_id FROM files WHERE id = current_id;
  END LOOP;
END;
$$;
```

Called after every file mutation: create, delete, restore, move, upload completion. The handler calls it like:

```typescript
await sql`SELECT propagate_folder_sizes(${parentId}::uuid)`;
```

When a file moves between folders, both the old and new parent chains get recomputed:

```typescript
await propagateFolderSizes(newParentId);
if (oldParentId && oldParentId !== newParentId) {
  await propagateFolderSizes(oldParentId);
}
```

---

## The Backfill API

S3 and the database can get out of sync — someone uploaded directly to SeaweedFS, a migration didn't finish, a backup restore happened. The backfill API reconciles them.

```
POST /api/admin/backfill
Content-Type: application/json

{
  "prefix": "",
  "dry_run": true
}
```

Both fields are optional. `prefix` filters by S3 key prefix. `dry_run` shows what would happen without writing anything.

Response:

```json
{
  "scanned": 847,
  "already_registered": 812,
  "folders_created": 15,
  "files_created": 20,
  "errors": [],
  "dry_run": false
}
```

### What it does

1. Lists all objects in the bucket (paginated, 1000 at a time)
2. Loads existing `s3_key` values from PostgreSQL into a Set
3. For each S3 object not in the database:
   - Parses the key → owner ID, path, filename
   - Infers mimetype from extension (extensive map in `server/backfill.ts` — documents, images, video, audio, 3D formats, code, archives)
   - `HEAD` on the object for real content-type and size
   - Creates parent folder rows recursively if missing
   - Inserts the file row
4. Recomputes folder sizes for every folder

The key parsing handles both conventions:

```typescript
// {identity-id}/my-files/{path} -> owner = identity-id
// shared/{path}                 -> owner = "shared"
```

### When to reach for it

- After bulk-uploading files directly to SeaweedFS
- After migrating from another storage system
- After restoring from a backup
- Any time S3 has files the database doesn't know about

The endpoint requires an authenticated session but isn't exposed via ingress — admin-only, on purpose.

---

## S3 Client

The S3 client (`server/s3.ts`) does AWS Signature V4 with Web Crypto. No AWS SDK — it's a lot of dependency for six API calls. Implements:

- `listObjects` — ListObjectsV2 with pagination
- `headObject` — HEAD for content-type and size
- `getObject` — GET (streaming response)
- `putObject` — PUT with content hash
- `deleteObject` — DELETE (404 is not an error)
- `copyObject` — PUT with `x-amz-copy-source` header

Pre-signed URLs (`server/s3-presign.ts`) support:
- `presignGetUrl` — download directly from S3
- `presignPutUrl` — upload directly to S3
- `createMultipartUpload` + `presignUploadPart` + `completeMultipartUpload` — large file uploads

Default pre-signed URL expiry is 1 hour.
Initial commit — Drive, an S3 file browser with WOPI editing Lightweight replacement for the upstream La Suite Numérique drive (Django/Celery/Next.js) built as a single Deno binary. Server (Deno + Hono): - S3 file operations via AWS SigV4 (no SDK) with pre-signed URLs - WOPI host for Collabora Online (CheckFileInfo, GetFile, PutFile, locks) - Ory Kratos session auth + CSRF protection - Ory Keto permission model (OPL namespaces, not yet wired to routes) - PostgreSQL metadata with recursive folder sizes - S3 backfill API for registering files uploaded outside the UI - OpenTelemetry tracing + metrics (opt-in via OTEL_ENABLED) Frontend (React 19 + Cunningham v4 + react-aria): - File browser with GridList, keyboard nav, multi-select - Collabora editor iframe (full-screen, form POST, postMessage) - Profile menu, waffle menu, drag-drop upload, asset type badges - La Suite integration service theming (runtime CSS) Testing (549 tests): - 235 server unit tests (Deno) — 90%+ coverage - 278 UI unit tests (Vitest) — 90%+ coverage - 11 E2E tests (Playwright) - 12 integration service tests (Playwright) - 13 WOPI integration tests (Playwright + Docker Compose + Collabora) MIT licensed. 2026-03-25 18:28:37 +00:00			`# S3 Layout`

			`How files are stored in SeaweedFS, and why you can browse the bucket and actually understand what you're looking at.`

			`---`

			`## Key Convention`

			`S3 keys are human-readable paths. This is intentional, and we'd do it again.`

			`### Personal files`

			```
			`{identity-id}/my-files/{path}/{filename}`
			```

			`Examples:`
			```
			`a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/quarterly-report.docx`
			`a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/level-01.fbx`
			`a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Documents/meeting-notes.odt`
			```

			The identity ID is the Kratos identity UUID. `my-files` is a fixed segment that separates the identity prefix from user content.

			`### Shared files`

			```
			`shared/{path}/{filename}`
			```

			`Examples:`
			```
			`shared/team-assets/brand-guide.pdf`
			`shared/templates/invoice-template.xlsx`
			```

			Shared files use `"shared"` as the owner ID.

			`### Folders`

			`Folder keys end with a trailing slash:`
			```
			`a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/`
			`a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/`
			```

			`### Why Human-Readable?`

			Most S3-backed file systems use UUIDs as keys (`files/550e8400-e29b-41d4.bin`). We don't. You should be able to `s3cmd ls` the bucket and immediately see who owns what, what the folder structure looks like, and what the files are.

			`This pays for itself when debugging, doing backfills, migrating data, or when someone asks "where did my file go?" — you can answer by looking at S3 directly, no database cross-referencing required.`

			The tradeoff: renames and moves require S3 copy + delete (S3 has no rename operation). The `updateFile` handler in `server/files.ts` handles this:

			```typescript
			`if (newS3Key !== file.s3_key && !file.is_folder && Number(file.size) > 0) {`
			`await copyObject(file.s3_key, newS3Key);`
			`await deleteObject(file.s3_key);`
			`}`
			```

			`---`

			`## The PostgreSQL Metadata Layer`

			`S3 stores the bytes. PostgreSQL tracks everything else.`

			### The `files` Table

			```sql
			`CREATE TABLE files (`
			`id UUID PRIMARY KEY DEFAULT gen_random_uuid(),`
			`s3_key TEXT NOT NULL UNIQUE,`
			`filename TEXT NOT NULL,`
			`mimetype TEXT NOT NULL DEFAULT 'application/octet-stream',`
			`size BIGINT NOT NULL DEFAULT 0,`
			`owner_id TEXT NOT NULL,`
			`parent_id UUID REFERENCES files(id) ON DELETE CASCADE,`
			`is_folder BOOLEAN NOT NULL DEFAULT false,`
			`created_at TIMESTAMPTZ NOT NULL DEFAULT now(),`
			`updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),`
			`deleted_at TIMESTAMPTZ`
			`);`
			```

			`Design decisions worth noting:`

			`- UUID primary key — API routes use UUIDs, not paths. Decouples the URL space from the S3 key space.`
			- `s3_key` is unique — the bridge between metadata and storage. UUID → s3_key and s3_key → metadata, both directions work.
			- `parent_id` is self-referencing — folders and files share a table. A folder is a row with `is_folder = true`. The hierarchy is a tree of `parent_id` pointers.
			- Soft delete — `deleted_at` gets set, the row stays. Trash view queries `deleted_at IS NOT NULL`. Restore clears it.
			- `owner_id` is a text field — not a FK to a users table, because there is no users table. It's the Kratos identity UUID. We don't duplicate identity data locally.

			`Indexes:`

			```sql
			`CREATE INDEX idx_files_parent ON files(parent_id) WHERE deleted_at IS NULL;`
			`CREATE INDEX idx_files_owner ON files(owner_id) WHERE deleted_at IS NULL;`
			`CREATE INDEX idx_files_s3key ON files(s3_key);`
			```

			Partial indexes on `parent_id` and `owner_id` exclude soft-deleted rows — most queries filter on `deleted_at IS NULL`, so the indexes stay lean. Deleted files don't bloat the hot path.

			### The `user_file_state` Table

			```sql
			`CREATE TABLE user_file_state (`
			`user_id TEXT NOT NULL,`
			`file_id UUID NOT NULL REFERENCES files(id) ON DELETE CASCADE,`
			`favorited BOOLEAN NOT NULL DEFAULT false,`
			`last_opened TIMESTAMPTZ,`
			`PRIMARY KEY (user_id, file_id)`
			`);`
			```

			`Per-user state that doesn't belong on the file itself. Favorites and recent files are powered by this table.`

			`---`

			`## Folder Sizes`

			`Folders show their total size (all descendants, recursively). Two PostgreSQL functions handle this.`

			### `recompute_folder_size(folder_id)`

			`Recursive CTE that walks all descendants:`

			```sql
			`CREATE OR REPLACE FUNCTION recompute_folder_size(folder_id UUID)`
			`RETURNS BIGINT LANGUAGE SQL AS $$`
			`WITH RECURSIVE descendants AS (`
			`SELECT id, size, is_folder`
			`FROM files`
			`WHERE parent_id = folder_id AND deleted_at IS NULL`
			`UNION ALL`
			`SELECT f.id, f.size, f.is_folder`
			`FROM files f`
			`JOIN descendants d ON f.parent_id = d.id`
			`WHERE f.deleted_at IS NULL`
			`)`
			`SELECT COALESCE(SUM(size) FILTER (WHERE NOT is_folder), 0)`
			`FROM descendants;`
			`$$;`
			```

			`Sums file sizes only (not folders) to avoid double-counting. Excludes soft-deleted items.`

			### `propagate_folder_sizes(start_parent_id)`

			`Walks up the ancestor chain, recomputing each folder's size:`

			```sql
			`CREATE OR REPLACE FUNCTION propagate_folder_sizes(start_parent_id UUID)`
			`RETURNS VOID LANGUAGE plpgsql AS $$`
			`DECLARE`
			`current_id UUID := start_parent_id;`
			`computed BIGINT;`
			`BEGIN`
			`WHILE current_id IS NOT NULL LOOP`
			`computed := recompute_folder_size(current_id);`
			`UPDATE files SET size = computed WHERE id = current_id AND is_folder = true;`
			`SELECT parent_id INTO current_id FROM files WHERE id = current_id;`
			`END LOOP;`
			`END;`
			`$$;`
			```

			`Called after every file mutation: create, delete, restore, move, upload completion. The handler calls it like:`

			```typescript
			await sql`SELECT propagate_folder_sizes(${parentId}::uuid)`;
			```

			`When a file moves between folders, both the old and new parent chains get recomputed:`

			```typescript
			`await propagateFolderSizes(newParentId);`
			`if (oldParentId && oldParentId !== newParentId) {`
			`await propagateFolderSizes(oldParentId);`
			`}`
			```

			`---`

			`## The Backfill API`

			`S3 and the database can get out of sync — someone uploaded directly to SeaweedFS, a migration didn't finish, a backup restore happened. The backfill API reconciles them.`

			```
			`POST /api/admin/backfill`
			`Content-Type: application/json`

			`{`
			`"prefix": "",`
			`"dry_run": true`
			`}`
			```

			Both fields are optional. `prefix` filters by S3 key prefix. `dry_run` shows what would happen without writing anything.

			`Response:`

			```json
			`{`
			`"scanned": 847,`
			`"already_registered": 812,`
			`"folders_created": 15,`
			`"files_created": 20,`
			`"errors": [],`
			`"dry_run": false`
			`}`
			```

			`### What it does`

			`1. Lists all objects in the bucket (paginated, 1000 at a time)`
			2. Loads existing `s3_key` values from PostgreSQL into a Set
			`3. For each S3 object not in the database:`
			`- Parses the key → owner ID, path, filename`
			- Infers mimetype from extension (extensive map in `server/backfill.ts` — documents, images, video, audio, 3D formats, code, archives)
			- `HEAD` on the object for real content-type and size
			`- Creates parent folder rows recursively if missing`
			`- Inserts the file row`
			`4. Recomputes folder sizes for every folder`

			`The key parsing handles both conventions:`

			```typescript
			`// {identity-id}/my-files/{path} -> owner = identity-id`
			`// shared/{path} -> owner = "shared"`
			```

			`### When to reach for it`

			`- After bulk-uploading files directly to SeaweedFS`
			`- After migrating from another storage system`
			`- After restoring from a backup`
			`- Any time S3 has files the database doesn't know about`

			`The endpoint requires an authenticated session but isn't exposed via ingress — admin-only, on purpose.`

			`---`

			`## S3 Client`

			The S3 client (`server/s3.ts`) does AWS Signature V4 with Web Crypto. No AWS SDK — it's a lot of dependency for six API calls. Implements:

			- `listObjects` — ListObjectsV2 with pagination
			- `headObject` — HEAD for content-type and size
			- `getObject` — GET (streaming response)
			- `putObject` — PUT with content hash
			- `deleteObject` — DELETE (404 is not an error)
			- `copyObject` — PUT with `x-amz-copy-source` header

			Pre-signed URLs (`server/s3-presign.ts`) support:
			- `presignGetUrl` — download directly from S3
			- `presignPutUrl` — upload directly to S3
			- `createMultipartUpload` + `presignUploadPart` + `completeMultipartUpload` — large file uploads

			`Default pre-signed URL expiry is 1 hour.`