chore: initial commit
This commit is contained in:
1653
docs/rfcs/0001-crdt-gossip-sync.md
Normal file
1653
docs/rfcs/0001-crdt-gossip-sync.md
Normal file
File diff suppressed because it is too large
Load Diff
566
docs/rfcs/0002-persistence-strategy.md
Normal file
566
docs/rfcs/0002-persistence-strategy.md
Normal file
@@ -0,0 +1,566 @@
|
||||
# RFC 0002: Persistence Strategy for Battery-Efficient State Management
|
||||
|
||||
**Status:** Draft
|
||||
**Authors:** Sienna
|
||||
**Created:** 2025-11-15
|
||||
**Related:** RFC 0001 (CRDT Sync Protocol)
|
||||
|
||||
## Abstract
|
||||
|
||||
This RFC defines a persistence strategy that balances data durability with battery efficiency for mobile platforms (iPad). The core challenge: Bevy runs at 60fps and generates continuous state changes, but we can't write to SQLite on every frame without destroying battery life and flash storage.
|
||||
|
||||
## The Problem
|
||||
|
||||
**Naive approach (bad)**:
|
||||
```rust
|
||||
fn sync_to_db_system(query: Query<&NetworkedEntity, Changed<Transform>>) {
|
||||
for entity in query.iter() {
|
||||
db.execute("UPDATE components SET data = ? WHERE entity_id = ?", ...)?;
|
||||
// This runs 60 times per second!
|
||||
// iPad battery: 💀
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why this is terrible**:
|
||||
- SQLite writes trigger `fsync()` syscalls (flush to physical storage)
|
||||
- Each `fsync()` on iOS can take 5-20ms and drains battery significantly
|
||||
- At 60fps with multiple entities, we'd be doing hundreds of disk writes per second
|
||||
- Flash wear: mobile devices have limited write cycles
|
||||
- User moves object around → hundreds of unnecessary writes of intermediate positions
|
||||
|
||||
## Requirements
|
||||
|
||||
1. **Survive crashes**: If the app crashes, user shouldn't lose more than a few seconds of work
|
||||
2. **Battery efficient**: Minimize disk I/O, especially `fsync()` calls
|
||||
3. **Flash-friendly**: Reduce write amplification on mobile storage
|
||||
4. **Low latency**: Persistence shouldn't block rendering or input
|
||||
5. **Recoverable**: On startup, we should be able to reconstruct recent state
|
||||
|
||||
## Categorizing Data by Persistence Needs
|
||||
|
||||
Not all data is equal. We need to categorize by how critical immediate persistence is:
|
||||
|
||||
### Tier 1: Critical State (Persist Immediately)
|
||||
|
||||
**What**: State that's hard or impossible to reconstruct if lost
|
||||
- User-created entities (the fact that they exist)
|
||||
- Operation log entries (for CRDT sync)
|
||||
- Vector clock state (for causality tracking)
|
||||
- Document metadata (name, creation time, etc.)
|
||||
|
||||
**Why**: These are the "source of truth" - if we lose them, data is gone
|
||||
|
||||
**Strategy**: Write to database within ~1 second of creation, but still batched
|
||||
|
||||
### Tier 2: Derived State (Defer and Batch)
|
||||
|
||||
**What**: State that can be reconstructed or is constantly changing
|
||||
- Entity positions during drag operations
|
||||
- Transform components (position, rotation, scale)
|
||||
- UI state (selected items, viewport position)
|
||||
- Temporary drawing strokes in progress
|
||||
|
||||
**Why**: These change rapidly and the intermediate states aren't valuable
|
||||
|
||||
**Strategy**: Batch writes, flush every 5-10 seconds or on specific events
|
||||
|
||||
### Tier 3: Ephemeral State (Never Persist)
|
||||
|
||||
**What**: State that only matters during current session
|
||||
- Remote peer cursors
|
||||
- Presence indicators (who's online)
|
||||
- Network connection status
|
||||
- Frame-rate metrics
|
||||
|
||||
**Why**: These are meaningless after restart
|
||||
|
||||
**Strategy**: Keep in-memory only (Bevy resources, not components)
|
||||
|
||||
## Write Strategy: The Three-Buffer System
|
||||
|
||||
We use a three-tier approach to minimize disk writes while maintaining durability:
|
||||
|
||||
### Layer 1: In-Memory Dirty Tracking (0ms latency)
|
||||
|
||||
Bevy change detection marks components as dirty, but we don't write immediately. Instead, we maintain a dirty set:
|
||||
|
||||
```rust
|
||||
#[derive(Resource)]
|
||||
struct DirtyEntities {
|
||||
// Entities with changes not yet in write buffer
|
||||
entities: HashSet<Uuid>,
|
||||
components: HashMap<Uuid, HashSet<String>>, // entity → dirty component types
|
||||
last_modified: HashMap<Uuid, Instant>, // when was it last changed
|
||||
}
|
||||
```
|
||||
|
||||
**Update frequency**: Every frame (cheap - just memory operations)
|
||||
|
||||
### Layer 2: Write Buffer (100ms-1s batching)
|
||||
|
||||
Periodically (every 100ms-1s), we collect dirty entities and prepare a write batch:
|
||||
|
||||
```rust
|
||||
#[derive(Resource)]
|
||||
struct WriteBuffer {
|
||||
// Pending writes not yet committed to SQLite
|
||||
pending_operations: Vec<PersistenceOp>,
|
||||
last_flush: Instant,
|
||||
}
|
||||
|
||||
enum PersistenceOp {
|
||||
UpsertEntity { id: Uuid, data: EntityData },
|
||||
UpsertComponent { entity_id: Uuid, component_type: String, data: Vec<u8> },
|
||||
LogOperation { node_id: NodeId, seq: u64, op: Vec<u8> },
|
||||
UpdateVectorClock { node_id: NodeId, counter: u64 },
|
||||
}
|
||||
```
|
||||
|
||||
**Update frequency**: Every 100ms-1s (configurable based on battery level)
|
||||
|
||||
**Strategy**: Accumulate operations in memory, then batch-write them
|
||||
|
||||
### Layer 3: SQLite with WAL Mode (5-10s commit interval)
|
||||
|
||||
Write buffer is flushed to SQLite, but we don't call `fsync()` immediately. Instead, we use WAL mode and control checkpoint timing:
|
||||
|
||||
```sql
|
||||
-- Enable Write-Ahead Logging
|
||||
PRAGMA journal_mode = WAL;
|
||||
|
||||
-- Don't auto-checkpoint on every transaction
|
||||
PRAGMA wal_autocheckpoint = 0;
|
||||
|
||||
-- Synchronous = NORMAL (fsync WAL on commit, but not every write)
|
||||
PRAGMA synchronous = NORMAL;
|
||||
```
|
||||
|
||||
**Update frequency**: Manual checkpoints every 5-10 seconds (or on specific events)
|
||||
|
||||
## Flush Events: When to Force Persistence
|
||||
|
||||
Certain events require immediate persistence (within 1 second):
|
||||
|
||||
### 1. Entity Creation
|
||||
When user creates a new entity, we need to persist its existence quickly:
|
||||
- Add to write buffer immediately
|
||||
- Trigger flush within 1 second
|
||||
|
||||
### 2. Major User Actions
|
||||
Actions that represent "savepoints" in user's mental model:
|
||||
- Finishing a drawing stroke (stroke start → immediate, intermediate points → batched, stroke end → flush)
|
||||
- Deleting entities
|
||||
- Changing document metadata
|
||||
- Undo/redo operations
|
||||
|
||||
### 3. Application State Transitions
|
||||
State changes that might precede app termination:
|
||||
- App going to background (iOS `applicationWillResignActive`)
|
||||
- Low memory warning
|
||||
- User explicitly saving (if we have a save button)
|
||||
- Switching documents/workspaces
|
||||
|
||||
### 4. Network Events
|
||||
Sync protocol events that need persistence:
|
||||
- Receiving operation log entries from peers
|
||||
- Vector clock updates (every 5 operations or 5 seconds, whichever comes first)
|
||||
|
||||
### 5. Periodic Background Flush
|
||||
Even if no major events happen:
|
||||
- Flush every 10 seconds during active use
|
||||
- Flush every 30 seconds when idle (no user input for >1 minute)
|
||||
|
||||
## Battery-Adaptive Flushing
|
||||
|
||||
Different flush strategies based on battery level:
|
||||
|
||||
```rust
|
||||
fn get_flush_interval(battery_level: f32, is_charging: bool) -> Duration {
|
||||
if is_charging {
|
||||
Duration::from_secs(5) // Aggressive - power available
|
||||
} else if battery_level > 0.5 {
|
||||
Duration::from_secs(10) // Normal
|
||||
} else if battery_level > 0.2 {
|
||||
Duration::from_secs(30) // Conservative
|
||||
} else {
|
||||
Duration::from_secs(60) // Very conservative - low battery
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**On iOS**: Use `UIDevice.current.batteryLevel` and `UIDevice.current.batteryState`
|
||||
|
||||
## SQLite Optimizations for Mobile
|
||||
|
||||
### Transaction Batching
|
||||
|
||||
Group multiple writes into a single transaction:
|
||||
|
||||
```rust
|
||||
async fn flush_write_buffer(buffer: &WriteBuffer, db: &Connection) -> Result<()> {
|
||||
let tx = db.transaction()?;
|
||||
|
||||
// All writes in one transaction
|
||||
for op in &buffer.pending_operations {
|
||||
match op {
|
||||
PersistenceOp::UpsertEntity { id, data } => {
|
||||
tx.execute("INSERT OR REPLACE INTO entities (...) VALUES (...)", ...)?;
|
||||
}
|
||||
PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
|
||||
tx.execute("INSERT OR REPLACE INTO components (...) VALUES (...)", ...)?;
|
||||
}
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
tx.commit()?; // Single fsync for entire batch
|
||||
}
|
||||
```
|
||||
|
||||
**Impact**: 100 individual writes = 100 fsyncs. 1 transaction with 100 writes = 1 fsync.
|
||||
|
||||
### WAL Mode Checkpoint Control
|
||||
|
||||
```rust
|
||||
async fn checkpoint_wal(db: &Connection) -> Result<()> {
|
||||
// Manually checkpoint WAL to database file
|
||||
db.execute("PRAGMA wal_checkpoint(PASSIVE)", [])?;
|
||||
}
|
||||
```
|
||||
|
||||
**PASSIVE checkpoint**: Doesn't block readers, syncs when possible
|
||||
**When to checkpoint**: Every 10 seconds, or when WAL exceeds 1MB
|
||||
|
||||
### Index Strategy
|
||||
|
||||
Be selective about indexes - they increase write cost:
|
||||
|
||||
```sql
|
||||
-- Only index what we actually query frequently
|
||||
CREATE INDEX idx_components_entity ON components(entity_id);
|
||||
CREATE INDEX idx_oplog_node_seq ON operation_log(node_id, sequence_number);
|
||||
|
||||
-- DON'T index everything just because we can
|
||||
-- Every index = extra writes on every INSERT/UPDATE
|
||||
```
|
||||
|
||||
### Page Size Optimization
|
||||
|
||||
```sql
|
||||
-- Larger page size = fewer I/O operations for sequential writes
|
||||
-- Default is 4KB, but 8KB or 16KB can be better for mobile
|
||||
PRAGMA page_size = 8192;
|
||||
```
|
||||
|
||||
**Caveat**: Must be set before database is created (or VACUUM to rebuild)
|
||||
|
||||
## Recovery Strategy
|
||||
|
||||
What happens if app crashes before flush?
|
||||
|
||||
### What We Lose
|
||||
|
||||
**Worst case**: Up to 10 seconds of component updates (positions, transforms)
|
||||
|
||||
**What we DON'T lose**:
|
||||
- Entity existence (flushed within 1 second of creation)
|
||||
- Operation log entries (flushed with vector clock updates)
|
||||
- Any data from before the last checkpoint
|
||||
|
||||
### Recovery on Startup
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
A[App Starts] --> B[Open SQLite]
|
||||
B --> C{Check WAL file}
|
||||
C -->|WAL exists| D[Recover from WAL]
|
||||
C -->|No WAL| E[Load from main DB]
|
||||
D --> F[Load entities from DB]
|
||||
E --> F
|
||||
F --> G[Load operation log]
|
||||
G --> H[Rebuild vector clock]
|
||||
H --> I[Connect to gossip]
|
||||
I --> J[Request sync from peers]
|
||||
J --> K[Fill any gaps via anti-entropy]
|
||||
K --> L[Fully recovered]
|
||||
```
|
||||
|
||||
**Key insight**: Even if we lose local state, gossip sync repairs it. Peers send us missing operations.
|
||||
|
||||
### Crash Detection
|
||||
|
||||
On startup, detect if previous session crashed:
|
||||
|
||||
```sql
|
||||
CREATE TABLE session_state (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT
|
||||
);
|
||||
|
||||
-- On startup, check if previous session closed cleanly
|
||||
SELECT value FROM session_state WHERE key = 'clean_shutdown';
|
||||
|
||||
-- If not found or 'false', we crashed
|
||||
-- Trigger recovery procedures
|
||||
```
|
||||
|
||||
## Platform-Specific Concerns
|
||||
|
||||
### iOS / iPadOS
|
||||
|
||||
**Background app suspension**: iOS aggressively suspends apps. We have ~5 seconds when moving to background:
|
||||
|
||||
```rust
|
||||
// When app moves to background:
|
||||
fn handle_background_event() {
|
||||
// Force immediate flush
|
||||
flush_write_buffer().await?;
|
||||
checkpoint_wal().await?;
|
||||
|
||||
// Mark clean shutdown
|
||||
db.execute("INSERT OR REPLACE INTO session_state VALUES ('clean_shutdown', 'true')", [])?;
|
||||
}
|
||||
```
|
||||
|
||||
**Low Power Mode**: Detect and reduce flush frequency:
|
||||
```swift
|
||||
// iOS-specific detection
|
||||
if ProcessInfo.processInfo.isLowPowerModeEnabled {
|
||||
set_flush_interval(Duration::from_secs(60));
|
||||
}
|
||||
```
|
||||
|
||||
### Desktop (macOS/Linux/Windows)
|
||||
|
||||
More relaxed constraints:
|
||||
- Battery life less critical on plugged-in desktops
|
||||
- Can use more aggressive flush intervals (every 5 seconds)
|
||||
- Larger WAL sizes acceptable (up to 10MB before checkpoint)
|
||||
|
||||
## Monitoring & Metrics
|
||||
|
||||
Track these metrics to tune persistence:
|
||||
|
||||
```rust
|
||||
struct PersistenceMetrics {
|
||||
// Write volume
|
||||
total_writes: u64,
|
||||
bytes_written: u64,
|
||||
|
||||
// Timing
|
||||
flush_count: u64,
|
||||
avg_flush_duration: Duration,
|
||||
checkpoint_count: u64,
|
||||
avg_checkpoint_duration: Duration,
|
||||
|
||||
// WAL health
|
||||
wal_size_bytes: u64,
|
||||
max_wal_size_bytes: u64,
|
||||
|
||||
// Recovery
|
||||
crash_recovery_count: u64,
|
||||
clean_shutdown_count: u64,
|
||||
}
|
||||
```
|
||||
|
||||
**Alerts**:
|
||||
- Flush duration >50ms (disk might be slow or overloaded)
|
||||
- WAL size >5MB (checkpoint more frequently)
|
||||
- Crash recovery rate >10% (need more aggressive flushing)
|
||||
|
||||
## Write Coalescing: Deduplication
|
||||
|
||||
When the same entity is modified multiple times before flush, we only keep the latest:
|
||||
|
||||
```rust
|
||||
fn add_to_write_buffer(op: PersistenceOp, buffer: &mut WriteBuffer) {
|
||||
match op {
|
||||
PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
|
||||
// Remove any existing pending write for this entity+component
|
||||
buffer.pending_operations.retain(|existing_op| {
|
||||
!matches!(existing_op,
|
||||
PersistenceOp::UpsertComponent {
|
||||
entity_id: e_id,
|
||||
component_type: c_type,
|
||||
..
|
||||
} if e_id == &entity_id && c_type == &component_type
|
||||
)
|
||||
});
|
||||
|
||||
// Add the new one (latest state)
|
||||
buffer.pending_operations.push(op);
|
||||
}
|
||||
// ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Impact**: User drags object for 5 seconds @ 60fps = 300 transform updates → coalesced to 1 write
|
||||
|
||||
## Persistence vs Sync: Division of Responsibility
|
||||
|
||||
Important distinction:
|
||||
|
||||
**Persistence layer** (this RFC):
|
||||
- Writes to local SQLite
|
||||
- Optimized for durability and battery life
|
||||
- Only cares about local state survival
|
||||
|
||||
**Sync layer** (RFC 0001):
|
||||
- Broadcasts operations via gossip
|
||||
- Maintains operation log for anti-entropy
|
||||
- Ensures eventual consistency across peers
|
||||
|
||||
**Key insight**: These operate independently. An operation can be:
|
||||
1. Logged to operation log (for sync) - happens immediately
|
||||
2. Applied to ECS (for rendering) - happens immediately
|
||||
3. Persisted to SQLite (for durability) - happens on flush schedule
|
||||
|
||||
If local state is lost due to delayed flush, sync layer repairs it from peers.
|
||||
|
||||
## Configuration Schema
|
||||
|
||||
Expose configuration for tuning:
|
||||
|
||||
```toml
|
||||
[persistence]
|
||||
# Base flush interval (may be adjusted by battery level)
|
||||
flush_interval_secs = 10
|
||||
|
||||
# Max time to defer critical writes (entity creation, etc.)
|
||||
critical_flush_delay_ms = 1000
|
||||
|
||||
# WAL checkpoint interval
|
||||
checkpoint_interval_secs = 30
|
||||
|
||||
# Max WAL size before forced checkpoint
|
||||
max_wal_size_mb = 5
|
||||
|
||||
# Adaptive flushing based on battery
|
||||
battery_adaptive = true
|
||||
|
||||
# Flush intervals per battery tier
|
||||
[persistence.battery_tiers]
|
||||
charging = 5
|
||||
high = 10 # >50%
|
||||
medium = 30 # 20-50%
|
||||
low = 60 # <20%
|
||||
|
||||
# Platform overrides
|
||||
[persistence.ios]
|
||||
background_flush_timeout_secs = 5
|
||||
low_power_mode_interval_secs = 60
|
||||
```
|
||||
|
||||
## Example System Implementation
|
||||
|
||||
```rust
|
||||
fn persistence_system(
|
||||
dirty: Res<DirtyEntities>,
|
||||
mut write_buffer: ResMut<WriteBuffer>,
|
||||
db: Res<DatabaseConnection>,
|
||||
time: Res<Time>,
|
||||
battery: Res<BatteryStatus>,
|
||||
query: Query<(Entity, &NetworkedEntity, &Transform, &/* other components */)>,
|
||||
) {
|
||||
// Step 1: Check if it's time to collect dirty entities
|
||||
let flush_interval = get_flush_interval(battery.level, battery.is_charging);
|
||||
|
||||
if time.elapsed() - write_buffer.last_flush < flush_interval {
|
||||
return; // Not time yet
|
||||
}
|
||||
|
||||
// Step 2: Collect dirty entities into write buffer
|
||||
for entity_uuid in &dirty.entities {
|
||||
if let Some((entity, net_entity, transform, /* ... */)) =
|
||||
query.iter().find(|(_, ne, ..)| ne.network_id == *entity_uuid)
|
||||
{
|
||||
// Serialize component
|
||||
let transform_data = bincode::serialize(transform)?;
|
||||
|
||||
// Add to write buffer (coalescing happens here)
|
||||
write_buffer.add(PersistenceOp::UpsertComponent {
|
||||
entity_id: *entity_uuid,
|
||||
component_type: "Transform".to_string(),
|
||||
data: transform_data,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Step 3: Flush write buffer to SQLite (async, non-blocking)
|
||||
if write_buffer.pending_operations.len() > 0 {
|
||||
let ops = std::mem::take(&mut write_buffer.pending_operations);
|
||||
|
||||
// Spawn async task to write to SQLite
|
||||
spawn_blocking(move || {
|
||||
flush_to_sqlite(&ops, &db)
|
||||
});
|
||||
|
||||
write_buffer.last_flush = time.elapsed();
|
||||
}
|
||||
|
||||
// Step 4: Clear dirty tracking (they're now in write buffer/SQLite)
|
||||
dirty.entities.clear();
|
||||
}
|
||||
```
|
||||
|
||||
## Trade-offs and Decisions
|
||||
|
||||
### Why WAL Mode?
|
||||
|
||||
**Alternatives**:
|
||||
- DELETE mode (traditional journaling)
|
||||
- MEMORY mode (no durability)
|
||||
|
||||
**Decision**: WAL mode because:
|
||||
- Better write concurrency (readers don't block writers)
|
||||
- Fewer `fsync()` calls (only on checkpoint)
|
||||
- Better crash recovery (WAL can be replayed)
|
||||
|
||||
### Why Not Use a Dirty Flag on Components?
|
||||
|
||||
We could mark components with a `#[derive(Dirty)]` flag, but:
|
||||
- Bevy's `Changed<T>` already gives us change detection for free
|
||||
- A separate dirty flag adds memory overhead
|
||||
- We'd need to manually clear flags after persistence
|
||||
|
||||
**Decision**: Use Bevy's change detection + our own dirty tracking resource
|
||||
|
||||
### Why Not Use a Separate Persistence Thread?
|
||||
|
||||
We could run SQLite writes on a dedicated thread:
|
||||
|
||||
**Pros**: Never blocks main thread
|
||||
**Cons**: More complex synchronization, harder to guarantee flush order
|
||||
|
||||
**Decision**: Use `spawn_blocking` from async runtime (Tokio). Simpler, good enough.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Write ordering**: Do we need to guarantee operation log entries are persisted before entity state? Or can they be out of order?
|
||||
2. **Compression**: Should we compress component data before writing to SQLite? Trade-off: CPU vs I/O
|
||||
3. **Memory limits**: On iPad with 2GB RAM, how large can the write buffer grow before we force a flush?
|
||||
|
||||
## Success Criteria
|
||||
|
||||
We'll know this is working when:
|
||||
- [ ] App can run for 30 minutes with <5% battery drain attributed to persistence
|
||||
- [ ] Crash recovery loses <10 seconds of work
|
||||
- [ ] No perceptible frame drops during flush operations
|
||||
- [ ] SQLite file size grows linearly with user data, not explosively
|
||||
- [ ] WAL checkpoints complete in <100ms
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
1. **Phase 1**: Basic in-memory dirty tracking + batched writes
|
||||
2. **Phase 2**: WAL mode + manual checkpoint control
|
||||
3. **Phase 3**: Battery-adaptive flushing
|
||||
4. **Phase 4**: iOS background handling
|
||||
5. **Phase 5**: Monitoring and tuning based on metrics
|
||||
|
||||
## References
|
||||
|
||||
- [SQLite WAL Mode](https://www.sqlite.org/wal.html)
|
||||
- [iOS Background Execution](https://developer.apple.com/documentation/uikit/app_and_environment/scenes/preparing_your_ui_to_run_in_the_background)
|
||||
- [Bevy Change Detection](https://docs.rs/bevy/latest/bevy/ecs/change_detection/)
|
||||
39
docs/rfcs/README.md
Normal file
39
docs/rfcs/README.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# RFCs
|
||||
|
||||
Request for Comments (RFCs) for major design decisions in the Lonni project.
|
||||
|
||||
## Active RFCs
|
||||
|
||||
- [RFC 0001: CRDT Synchronization Protocol over iroh-gossip](./0001-crdt-gossip-sync.md) - Draft
|
||||
|
||||
## RFC Process
|
||||
|
||||
1. **Draft**: Initial proposal, open for discussion
|
||||
2. **Review**: Team reviews and provides feedback
|
||||
3. **Accepted**: Approved for implementation
|
||||
4. **Implemented**: Design has been built
|
||||
5. **Superseded**: Replaced by a newer RFC
|
||||
|
||||
RFCs are living documents - they can be updated as we learn during implementation.
|
||||
|
||||
## When to Write an RFC
|
||||
|
||||
Write an RFC when:
|
||||
- Making architectural decisions that affect multiple parts of the system
|
||||
- Choosing between significantly different approaches
|
||||
- Introducing new protocols or APIs
|
||||
- Making breaking changes
|
||||
|
||||
Don't write an RFC for:
|
||||
- Small bug fixes
|
||||
- Minor refactors
|
||||
- Isolated feature additions
|
||||
- Experimental prototypes
|
||||
|
||||
## RFC Format
|
||||
|
||||
- **Narrative first**: Tell the story of why and how
|
||||
- **Explain trade-offs**: What alternatives were considered?
|
||||
- **API examples**: Show how it would be used (not full implementations)
|
||||
- **Open questions**: What's still unclear?
|
||||
- **Success criteria**: How do we know it works?
|
||||
Reference in New Issue
Block a user