This repository has been archived on 2026-03-27. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
drive/docs/s3-layout.md

262 lines
8.0 KiB
Markdown
Raw Permalink Normal View History

# S3 Layout
How files are stored in SeaweedFS, and why you can browse the bucket and actually understand what you're looking at.
---
## Key Convention
S3 keys are human-readable paths. This is intentional, and we'd do it again.
### Personal files
```
{identity-id}/my-files/{path}/{filename}
```
Examples:
```
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/quarterly-report.docx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/level-01.fbx
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Documents/meeting-notes.odt
```
The identity ID is the Kratos identity UUID. `my-files` is a fixed segment that separates the identity prefix from user content.
### Shared files
```
shared/{path}/{filename}
```
Examples:
```
shared/team-assets/brand-guide.pdf
shared/templates/invoice-template.xlsx
```
Shared files use `"shared"` as the owner ID.
### Folders
Folder keys end with a trailing slash:
```
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/
a1b2c3d4-e5f6-7890-abcd-ef1234567890/my-files/Projects/game-prototype/
```
### Why Human-Readable?
Most S3-backed file systems use UUIDs as keys (`files/550e8400-e29b-41d4.bin`). We don't. You should be able to `s3cmd ls` the bucket and immediately see who owns what, what the folder structure looks like, and what the files are.
This pays for itself when debugging, doing backfills, migrating data, or when someone asks "where did my file go?" — you can answer by looking at S3 directly, no database cross-referencing required.
The tradeoff: renames and moves require S3 copy + delete (S3 has no rename operation). The `updateFile` handler in `server/files.ts` handles this:
```typescript
if (newS3Key !== file.s3_key && !file.is_folder && Number(file.size) > 0) {
await copyObject(file.s3_key, newS3Key);
await deleteObject(file.s3_key);
}
```
---
## The PostgreSQL Metadata Layer
S3 stores the bytes. PostgreSQL tracks everything else.
### The `files` Table
```sql
CREATE TABLE files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
s3_key TEXT NOT NULL UNIQUE,
filename TEXT NOT NULL,
mimetype TEXT NOT NULL DEFAULT 'application/octet-stream',
size BIGINT NOT NULL DEFAULT 0,
owner_id TEXT NOT NULL,
parent_id UUID REFERENCES files(id) ON DELETE CASCADE,
is_folder BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
deleted_at TIMESTAMPTZ
);
```
Design decisions worth noting:
- **UUID primary key** — API routes use UUIDs, not paths. Decouples the URL space from the S3 key space.
- **`s3_key` is unique** — the bridge between metadata and storage. UUID → s3_key and s3_key → metadata, both directions work.
- **`parent_id` is self-referencing** — folders and files share a table. A folder is a row with `is_folder = true`. The hierarchy is a tree of `parent_id` pointers.
- **Soft delete** — `deleted_at` gets set, the row stays. Trash view queries `deleted_at IS NOT NULL`. Restore clears it.
- **`owner_id` is a text field** — not a FK to a users table, because there is no users table. It's the Kratos identity UUID. We don't duplicate identity data locally.
Indexes:
```sql
CREATE INDEX idx_files_parent ON files(parent_id) WHERE deleted_at IS NULL;
CREATE INDEX idx_files_owner ON files(owner_id) WHERE deleted_at IS NULL;
CREATE INDEX idx_files_s3key ON files(s3_key);
```
Partial indexes on `parent_id` and `owner_id` exclude soft-deleted rows — most queries filter on `deleted_at IS NULL`, so the indexes stay lean. Deleted files don't bloat the hot path.
### The `user_file_state` Table
```sql
CREATE TABLE user_file_state (
user_id TEXT NOT NULL,
file_id UUID NOT NULL REFERENCES files(id) ON DELETE CASCADE,
favorited BOOLEAN NOT NULL DEFAULT false,
last_opened TIMESTAMPTZ,
PRIMARY KEY (user_id, file_id)
);
```
Per-user state that doesn't belong on the file itself. Favorites and recent files are powered by this table.
---
## Folder Sizes
Folders show their total size (all descendants, recursively). Two PostgreSQL functions handle this.
### `recompute_folder_size(folder_id)`
Recursive CTE that walks all descendants:
```sql
CREATE OR REPLACE FUNCTION recompute_folder_size(folder_id UUID)
RETURNS BIGINT LANGUAGE SQL AS $$
WITH RECURSIVE descendants AS (
SELECT id, size, is_folder
FROM files
WHERE parent_id = folder_id AND deleted_at IS NULL
UNION ALL
SELECT f.id, f.size, f.is_folder
FROM files f
JOIN descendants d ON f.parent_id = d.id
WHERE f.deleted_at IS NULL
)
SELECT COALESCE(SUM(size) FILTER (WHERE NOT is_folder), 0)
FROM descendants;
$$;
```
Sums file sizes only (not folders) to avoid double-counting. Excludes soft-deleted items.
### `propagate_folder_sizes(start_parent_id)`
Walks up the ancestor chain, recomputing each folder's size:
```sql
CREATE OR REPLACE FUNCTION propagate_folder_sizes(start_parent_id UUID)
RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
current_id UUID := start_parent_id;
computed BIGINT;
BEGIN
WHILE current_id IS NOT NULL LOOP
computed := recompute_folder_size(current_id);
UPDATE files SET size = computed WHERE id = current_id AND is_folder = true;
SELECT parent_id INTO current_id FROM files WHERE id = current_id;
END LOOP;
END;
$$;
```
Called after every file mutation: create, delete, restore, move, upload completion. The handler calls it like:
```typescript
await sql`SELECT propagate_folder_sizes(${parentId}::uuid)`;
```
When a file moves between folders, both the old and new parent chains get recomputed:
```typescript
await propagateFolderSizes(newParentId);
if (oldParentId && oldParentId !== newParentId) {
await propagateFolderSizes(oldParentId);
}
```
---
## The Backfill API
S3 and the database can get out of sync — someone uploaded directly to SeaweedFS, a migration didn't finish, a backup restore happened. The backfill API reconciles them.
```
POST /api/admin/backfill
Content-Type: application/json
{
"prefix": "",
"dry_run": true
}
```
Both fields are optional. `prefix` filters by S3 key prefix. `dry_run` shows what would happen without writing anything.
Response:
```json
{
"scanned": 847,
"already_registered": 812,
"folders_created": 15,
"files_created": 20,
"errors": [],
"dry_run": false
}
```
### What it does
1. Lists all objects in the bucket (paginated, 1000 at a time)
2. Loads existing `s3_key` values from PostgreSQL into a Set
3. For each S3 object not in the database:
- Parses the key → owner ID, path, filename
- Infers mimetype from extension (extensive map in `server/backfill.ts` — documents, images, video, audio, 3D formats, code, archives)
- `HEAD` on the object for real content-type and size
- Creates parent folder rows recursively if missing
- Inserts the file row
4. Recomputes folder sizes for every folder
The key parsing handles both conventions:
```typescript
// {identity-id}/my-files/{path} -> owner = identity-id
// shared/{path} -> owner = "shared"
```
### When to reach for it
- After bulk-uploading files directly to SeaweedFS
- After migrating from another storage system
- After restoring from a backup
- Any time S3 has files the database doesn't know about
The endpoint requires an authenticated session but isn't exposed via ingress — admin-only, on purpose.
---
## S3 Client
The S3 client (`server/s3.ts`) does AWS Signature V4 with Web Crypto. No AWS SDK — it's a lot of dependency for six API calls. Implements:
- `listObjects` — ListObjectsV2 with pagination
- `headObject` — HEAD for content-type and size
- `getObject` — GET (streaming response)
- `putObject` — PUT with content hash
- `deleteObject` — DELETE (404 is not an error)
- `copyObject` — PUT with `x-amz-copy-source` header
Pre-signed URLs (`server/s3-presign.ts`) support:
- `presignGetUrl` — download directly from S3
- `presignPutUrl` — upload directly to S3
- `createMultipartUpload` + `presignUploadPart` + `completeMultipartUpload` — large file uploads
Default pre-signed URL expiry is 1 hour.