docs/rfcs/0002-persistence-strategy.md

# RFC 0002: Persistence Strategy for Battery-Efficient State Management

**Status:** Draft
**Authors:** Sienna
**Created:** 2025-11-15
**Related:** RFC 0001 (CRDT Sync Protocol)

## Abstract

This RFC defines a persistence strategy that balances data durability with battery efficiency for mobile platforms (iPad). The core challenge: Bevy runs at 60fps and generates continuous state changes, but we can't write to SQLite on every frame without destroying battery life and flash storage.

## The Problem

**Naive approach (bad)**:
```rust
fn sync_to_db_system(query: Query<&NetworkedEntity, Changed<Transform>>) {
    for entity in query.iter() {
        db.execute("UPDATE components SET data = ? WHERE entity_id = ?", ...)?;
        // This runs 60 times per second!
        // iPad battery: 💀
    }
}
```

**Why this is terrible**:
- SQLite writes trigger `fsync()` syscalls (flush to physical storage)
- Each `fsync()` on iOS can take 5-20ms and drains battery significantly
- At 60fps with multiple entities, we'd be doing hundreds of disk writes per second
- Flash wear: mobile devices have limited write cycles
- User moves object around → hundreds of unnecessary writes of intermediate positions

## Requirements

1. **Survive crashes**: If the app crashes, user shouldn't lose more than a few seconds of work
2. **Battery efficient**: Minimize disk I/O, especially `fsync()` calls
3. **Flash-friendly**: Reduce write amplification on mobile storage
4. **Low latency**: Persistence shouldn't block rendering or input
5. **Recoverable**: On startup, we should be able to reconstruct recent state

## Categorizing Data by Persistence Needs

Not all data is equal. We need to categorize by how critical immediate persistence is:

### Tier 1: Critical State (Persist Immediately)

**What**: State that's hard or impossible to reconstruct if lost
- User-created entities (the fact that they exist)
- Operation log entries (for CRDT sync)
- Vector clock state (for causality tracking)
- Document metadata (name, creation time, etc.)

**Why**: These are the "source of truth" - if we lose them, data is gone

**Strategy**: Write to database within ~1 second of creation, but still batched

### Tier 2: Derived State (Defer and Batch)

**What**: State that can be reconstructed or is constantly changing
- Entity positions during drag operations
- Transform components (position, rotation, scale)
- UI state (selected items, viewport position)
- Temporary drawing strokes in progress

**Why**: These change rapidly and the intermediate states aren't valuable

**Strategy**: Batch writes, flush every 5-10 seconds or on specific events

### Tier 3: Ephemeral State (Never Persist)

**What**: State that only matters during current session
- Remote peer cursors
- Presence indicators (who's online)
- Network connection status
- Frame-rate metrics

**Why**: These are meaningless after restart

**Strategy**: Keep in-memory only (Bevy resources, not components)

## Write Strategy: The Three-Buffer System

We use a three-tier approach to minimize disk writes while maintaining durability:

### Layer 1: In-Memory Dirty Tracking (0ms latency)

Bevy change detection marks components as dirty, but we don't write immediately. Instead, we maintain a dirty set:

```rust
#[derive(Resource)]
struct DirtyEntities {
    // Entities with changes not yet in write buffer
    entities: HashSet<Uuid>,
    components: HashMap<Uuid, HashSet<String>>,  // entity → dirty component types
    last_modified: HashMap<Uuid, Instant>,       // when was it last changed
}
```

**Update frequency**: Every frame (cheap - just memory operations)

### Layer 2: Write Buffer (100ms-1s batching)

Periodically (every 100ms-1s), we collect dirty entities and prepare a write batch:

```rust
#[derive(Resource)]
struct WriteBuffer {
    // Pending writes not yet committed to SQLite
    pending_operations: Vec<PersistenceOp>,
    last_flush: Instant,
}

enum PersistenceOp {
    UpsertEntity { id: Uuid, data: EntityData },
    UpsertComponent { entity_id: Uuid, component_type: String, data: Vec<u8> },
    LogOperation { node_id: NodeId, seq: u64, op: Vec<u8> },
    UpdateVectorClock { node_id: NodeId, counter: u64 },
}
```

**Update frequency**: Every 100ms-1s (configurable based on battery level)

**Strategy**: Accumulate operations in memory, then batch-write them

### Layer 3: SQLite with WAL Mode (5-10s commit interval)

Write buffer is flushed to SQLite, but we don't call `fsync()` immediately. Instead, we use WAL mode and control checkpoint timing:

```sql
-- Enable Write-Ahead Logging
PRAGMA journal_mode = WAL;

-- Don't auto-checkpoint on every transaction
PRAGMA wal_autocheckpoint = 0;

-- Synchronous = NORMAL (fsync WAL on commit, but not every write)
PRAGMA synchronous = NORMAL;
```

**Update frequency**: Manual checkpoints every 5-10 seconds (or on specific events)

## Flush Events: When to Force Persistence

Certain events require immediate persistence (within 1 second):

### 1. Entity Creation
When user creates a new entity, we need to persist its existence quickly:
- Add to write buffer immediately
- Trigger flush within 1 second

### 2. Major User Actions
Actions that represent "savepoints" in user's mental model:
- Finishing a drawing stroke (stroke start → immediate, intermediate points → batched, stroke end → flush)
- Deleting entities
- Changing document metadata
- Undo/redo operations

### 3. Application State Transitions
State changes that might precede app termination:
- App going to background (iOS `applicationWillResignActive`)
- Low memory warning
- User explicitly saving (if we have a save button)
- Switching documents/workspaces

### 4. Network Events
Sync protocol events that need persistence:
- Receiving operation log entries from peers
- Vector clock updates (every 5 operations or 5 seconds, whichever comes first)

### 5. Periodic Background Flush
Even if no major events happen:
- Flush every 10 seconds during active use
- Flush every 30 seconds when idle (no user input for >1 minute)

## Battery-Adaptive Flushing

Different flush strategies based on battery level:

```rust
fn get_flush_interval(battery_level: f32, is_charging: bool) -> Duration {
    if is_charging {
        Duration::from_secs(5)  // Aggressive - power available
    } else if battery_level > 0.5 {
        Duration::from_secs(10)  // Normal
    } else if battery_level > 0.2 {
        Duration::from_secs(30)  // Conservative
    } else {
        Duration::from_secs(60)  // Very conservative - low battery
    }
}
```

**On iOS**: Use `UIDevice.current.batteryLevel` and `UIDevice.current.batteryState`

## SQLite Optimizations for Mobile

### Transaction Batching

Group multiple writes into a single transaction:

```rust
async fn flush_write_buffer(buffer: &WriteBuffer, db: &Connection) -> Result<()> {
    let tx = db.transaction()?;

    // All writes in one transaction
    for op in &buffer.pending_operations {
        match op {
            PersistenceOp::UpsertEntity { id, data } => {
                tx.execute("INSERT OR REPLACE INTO entities (...) VALUES (...)", ...)?;
            }
            PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
                tx.execute("INSERT OR REPLACE INTO components (...) VALUES (...)", ...)?;
            }
            // ...
        }
    }

    tx.commit()?;  // Single fsync for entire batch
}
```

**Impact**: 100 individual writes = 100 fsyncs. 1 transaction with 100 writes = 1 fsync.

### WAL Mode Checkpoint Control

```rust
async fn checkpoint_wal(db: &Connection) -> Result<()> {
    // Manually checkpoint WAL to database file
    db.execute("PRAGMA wal_checkpoint(PASSIVE)", [])?;
}
```

**PASSIVE checkpoint**: Doesn't block readers, syncs when possible
**When to checkpoint**: Every 10 seconds, or when WAL exceeds 1MB

### Index Strategy

Be selective about indexes - they increase write cost:

```sql
-- Only index what we actually query frequently
CREATE INDEX idx_components_entity ON components(entity_id);
CREATE INDEX idx_oplog_node_seq ON operation_log(node_id, sequence_number);

-- DON'T index everything just because we can
-- Every index = extra writes on every INSERT/UPDATE
```

### Page Size Optimization

```sql
-- Larger page size = fewer I/O operations for sequential writes
-- Default is 4KB, but 8KB or 16KB can be better for mobile
PRAGMA page_size = 8192;
```

**Caveat**: Must be set before database is created (or VACUUM to rebuild)

## Recovery Strategy

What happens if app crashes before flush?

### What We Lose

**Worst case**: Up to 10 seconds of component updates (positions, transforms)

**What we DON'T lose**:
- Entity existence (flushed within 1 second of creation)
- Operation log entries (flushed with vector clock updates)
- Any data from before the last checkpoint

### Recovery on Startup

```mermaid
graph TB
    A[App Starts] --> B[Open SQLite]
    B --> C{Check WAL file}
    C -->|WAL exists| D[Recover from WAL]
    C -->|No WAL| E[Load from main DB]
    D --> F[Load entities from DB]
    E --> F
    F --> G[Load operation log]
    G --> H[Rebuild vector clock]
    H --> I[Connect to gossip]
    I --> J[Request sync from peers]
    J --> K[Fill any gaps via anti-entropy]
    K --> L[Fully recovered]
```

**Key insight**: Even if we lose local state, gossip sync repairs it. Peers send us missing operations.

### Crash Detection

On startup, detect if previous session crashed:

```sql
CREATE TABLE session_state (
    key TEXT PRIMARY KEY,
    value TEXT
);

-- On startup, check if previous session closed cleanly
SELECT value FROM session_state WHERE key = 'clean_shutdown';

-- If not found or 'false', we crashed
-- Trigger recovery procedures
```

## Platform-Specific Concerns

### iOS / iPadOS

**Background app suspension**: iOS aggressively suspends apps. We have ~5 seconds when moving to background:

```rust
// When app moves to background:
fn handle_background_event() {
    // Force immediate flush
    flush_write_buffer().await?;
    checkpoint_wal().await?;

    // Mark clean shutdown
    db.execute("INSERT OR REPLACE INTO session_state VALUES ('clean_shutdown', 'true')", [])?;
}
```

**Low Power Mode**: Detect and reduce flush frequency:
```swift
// iOS-specific detection
if ProcessInfo.processInfo.isLowPowerModeEnabled {
    set_flush_interval(Duration::from_secs(60));
}
```

### Desktop (macOS/Linux/Windows)

More relaxed constraints:
- Battery life less critical on plugged-in desktops
- Can use more aggressive flush intervals (every 5 seconds)
- Larger WAL sizes acceptable (up to 10MB before checkpoint)

## Monitoring & Metrics

Track these metrics to tune persistence:

```rust
struct PersistenceMetrics {
    // Write volume
    total_writes: u64,
    bytes_written: u64,

    // Timing
    flush_count: u64,
    avg_flush_duration: Duration,
    checkpoint_count: u64,
    avg_checkpoint_duration: Duration,

    // WAL health
    wal_size_bytes: u64,
    max_wal_size_bytes: u64,

    // Recovery
    crash_recovery_count: u64,
    clean_shutdown_count: u64,
}
```

**Alerts**:
- Flush duration >50ms (disk might be slow or overloaded)
- WAL size >5MB (checkpoint more frequently)
- Crash recovery rate >10% (need more aggressive flushing)

## Write Coalescing: Deduplication

When the same entity is modified multiple times before flush, we only keep the latest:

```rust
fn add_to_write_buffer(op: PersistenceOp, buffer: &mut WriteBuffer) {
    match op {
        PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
            // Remove any existing pending write for this entity+component
            buffer.pending_operations.retain(|existing_op| {
                !matches!(existing_op,
                    PersistenceOp::UpsertComponent {
                        entity_id: e_id,
                        component_type: c_type,
                        ..
                    } if e_id == &entity_id && c_type == &component_type
                )
            });

            // Add the new one (latest state)
            buffer.pending_operations.push(op);
        }
        // ...
    }
}
```

**Impact**: User drags object for 5 seconds @ 60fps = 300 transform updates → coalesced to 1 write

## Persistence vs Sync: Division of Responsibility

Important distinction:

**Persistence layer** (this RFC):
- Writes to local SQLite
- Optimized for durability and battery life
- Only cares about local state survival

**Sync layer** (RFC 0001):
- Broadcasts operations via gossip
- Maintains operation log for anti-entropy
- Ensures eventual consistency across peers

**Key insight**: These operate independently. An operation can be:
1. Logged to operation log (for sync) - happens immediately
2. Applied to ECS (for rendering) - happens immediately
3. Persisted to SQLite (for durability) - happens on flush schedule

If local state is lost due to delayed flush, sync layer repairs it from peers.

## Configuration Schema

Expose configuration for tuning:

```toml
[persistence]
# Base flush interval (may be adjusted by battery level)
flush_interval_secs = 10

# Max time to defer critical writes (entity creation, etc.)
critical_flush_delay_ms = 1000

# WAL checkpoint interval
checkpoint_interval_secs = 30

# Max WAL size before forced checkpoint
max_wal_size_mb = 5

# Adaptive flushing based on battery
battery_adaptive = true

# Flush intervals per battery tier
[persistence.battery_tiers]
charging = 5
high = 10      # >50%
medium = 30    # 20-50%
low = 60       # <20%

# Platform overrides
[persistence.ios]
background_flush_timeout_secs = 5
low_power_mode_interval_secs = 60
```

## Example System Implementation

```rust
fn persistence_system(
    dirty: Res<DirtyEntities>,
    mut write_buffer: ResMut<WriteBuffer>,
    db: Res<DatabaseConnection>,
    time: Res<Time>,
    battery: Res<BatteryStatus>,
    query: Query<(Entity, &NetworkedEntity, &Transform, &/* other components */)>,
) {
    // Step 1: Check if it's time to collect dirty entities
    let flush_interval = get_flush_interval(battery.level, battery.is_charging);

    if time.elapsed() - write_buffer.last_flush < flush_interval {
        return;  // Not time yet
    }

    // Step 2: Collect dirty entities into write buffer
    for entity_uuid in &dirty.entities {
        if let Some((entity, net_entity, transform, /* ... */)) =
            query.iter().find(|(_, ne, ..)| ne.network_id == *entity_uuid)
        {
            // Serialize component
            let transform_data = bincode::serialize(transform)?;

            // Add to write buffer (coalescing happens here)
            write_buffer.add(PersistenceOp::UpsertComponent {
                entity_id: *entity_uuid,
                component_type: "Transform".to_string(),
                data: transform_data,
            });
        }
    }

    // Step 3: Flush write buffer to SQLite (async, non-blocking)
    if write_buffer.pending_operations.len() > 0 {
        let ops = std::mem::take(&mut write_buffer.pending_operations);

        // Spawn async task to write to SQLite
        spawn_blocking(move || {
            flush_to_sqlite(&ops, &db)
        });

        write_buffer.last_flush = time.elapsed();
    }

    // Step 4: Clear dirty tracking (they're now in write buffer/SQLite)
    dirty.entities.clear();
}
```

## Trade-offs and Decisions

### Why WAL Mode?

**Alternatives**:
- DELETE mode (traditional journaling)
- MEMORY mode (no durability)

**Decision**: WAL mode because:
- Better write concurrency (readers don't block writers)
- Fewer `fsync()` calls (only on checkpoint)
- Better crash recovery (WAL can be replayed)

### Why Not Use a Dirty Flag on Components?

We could mark components with a `#[derive(Dirty)]` flag, but:
- Bevy's `Changed<T>` already gives us change detection for free
- A separate dirty flag adds memory overhead
- We'd need to manually clear flags after persistence

**Decision**: Use Bevy's change detection + our own dirty tracking resource

### Why Not Use a Separate Persistence Thread?

We could run SQLite writes on a dedicated thread:

**Pros**: Never blocks main thread
**Cons**: More complex synchronization, harder to guarantee flush order

**Decision**: Use `spawn_blocking` from async runtime (Tokio). Simpler, good enough.

## Open Questions

1. **Write ordering**: Do we need to guarantee operation log entries are persisted before entity state? Or can they be out of order?
2. **Compression**: Should we compress component data before writing to SQLite? Trade-off: CPU vs I/O
3. **Memory limits**: On iPad with 2GB RAM, how large can the write buffer grow before we force a flush?

## Success Criteria

We'll know this is working when:
- [ ] App can run for 30 minutes with <5% battery drain attributed to persistence
- [ ] Crash recovery loses <10 seconds of work
- [ ] No perceptible frame drops during flush operations
- [ ] SQLite file size grows linearly with user data, not explosively
- [ ] WAL checkpoints complete in <100ms

## Implementation Phases

1. **Phase 1**: Basic in-memory dirty tracking + batched writes
2. **Phase 2**: WAL mode + manual checkpoint control
3. **Phase 3**: Battery-adaptive flushing
4. **Phase 4**: iOS background handling
5. **Phase 5**: Monitoring and tuning based on metrics

## References

- [SQLite WAL Mode](https://www.sqlite.org/wal.html)
- [iOS Background Execution](https://developer.apple.com/documentation/uikit/app_and_environment/scenes/preparing_your_ui_to_run_in_the_background)
- [Bevy Change Detection](https://docs.rs/bevy/latest/bevy/ecs/change_detection/)
chore: initial commit 2025-11-15 23:42:12 +00:00			`# RFC 0002: Persistence Strategy for Battery-Efficient State Management`

			`Status: Draft`
			`Authors: Sienna`
			`Created: 2025-11-15`
			`Related: RFC 0001 (CRDT Sync Protocol)`

			`## Abstract`

			`This RFC defines a persistence strategy that balances data durability with battery efficiency for mobile platforms (iPad). The core challenge: Bevy runs at 60fps and generates continuous state changes, but we can't write to SQLite on every frame without destroying battery life and flash storage.`

			`## The Problem`

			`Naive approach (bad):`
			```rust
			`fn sync_to_db_system(query: Query<&NetworkedEntity, Changed<Transform>>) {`
			`for entity in query.iter() {`
			`db.execute("UPDATE components SET data = ? WHERE entity_id = ?", ...)?;`
			`// This runs 60 times per second!`
			`// iPad battery: 💀`
			`}`
			`}`
			```

			`Why this is terrible:`
			- SQLite writes trigger `fsync()` syscalls (flush to physical storage)
			- Each `fsync()` on iOS can take 5-20ms and drains battery significantly
			`- At 60fps with multiple entities, we'd be doing hundreds of disk writes per second`
			`- Flash wear: mobile devices have limited write cycles`
			`- User moves object around → hundreds of unnecessary writes of intermediate positions`

			`## Requirements`

			`1. Survive crashes: If the app crashes, user shouldn't lose more than a few seconds of work`
			2. Battery efficient: Minimize disk I/O, especially `fsync()` calls
			`3. Flash-friendly: Reduce write amplification on mobile storage`
			`4. Low latency: Persistence shouldn't block rendering or input`
			`5. Recoverable: On startup, we should be able to reconstruct recent state`

			`## Categorizing Data by Persistence Needs`

			`Not all data is equal. We need to categorize by how critical immediate persistence is:`

			`### Tier 1: Critical State (Persist Immediately)`

			`What: State that's hard or impossible to reconstruct if lost`
			`- User-created entities (the fact that they exist)`
			`- Operation log entries (for CRDT sync)`
			`- Vector clock state (for causality tracking)`
			`- Document metadata (name, creation time, etc.)`

			`Why: These are the "source of truth" - if we lose them, data is gone`

			`Strategy: Write to database within ~1 second of creation, but still batched`

			`### Tier 2: Derived State (Defer and Batch)`

			`What: State that can be reconstructed or is constantly changing`
			`- Entity positions during drag operations`
			`- Transform components (position, rotation, scale)`
			`- UI state (selected items, viewport position)`
			`- Temporary drawing strokes in progress`

			`Why: These change rapidly and the intermediate states aren't valuable`

			`Strategy: Batch writes, flush every 5-10 seconds or on specific events`

			`### Tier 3: Ephemeral State (Never Persist)`

			`What: State that only matters during current session`
			`- Remote peer cursors`
			`- Presence indicators (who's online)`
			`- Network connection status`
			`- Frame-rate metrics`

			`Why: These are meaningless after restart`

			`Strategy: Keep in-memory only (Bevy resources, not components)`

			`## Write Strategy: The Three-Buffer System`

			`We use a three-tier approach to minimize disk writes while maintaining durability:`

			`### Layer 1: In-Memory Dirty Tracking (0ms latency)`

			`Bevy change detection marks components as dirty, but we don't write immediately. Instead, we maintain a dirty set:`

			```rust
			`#[derive(Resource)]`
			`struct DirtyEntities {`
			`// Entities with changes not yet in write buffer`
			`entities: HashSet<Uuid>,`
			`components: HashMap<Uuid, HashSet<String>>, // entity → dirty component types`
			`last_modified: HashMap<Uuid, Instant>, // when was it last changed`
			`}`
			```

			`Update frequency: Every frame (cheap - just memory operations)`

			`### Layer 2: Write Buffer (100ms-1s batching)`

			`Periodically (every 100ms-1s), we collect dirty entities and prepare a write batch:`

			```rust
			`#[derive(Resource)]`
			`struct WriteBuffer {`
			`// Pending writes not yet committed to SQLite`
			`pending_operations: Vec<PersistenceOp>,`
			`last_flush: Instant,`
			`}`

			`enum PersistenceOp {`
			`UpsertEntity { id: Uuid, data: EntityData },`
			`UpsertComponent { entity_id: Uuid, component_type: String, data: Vec<u8> },`
			`LogOperation { node_id: NodeId, seq: u64, op: Vec<u8> },`
			`UpdateVectorClock { node_id: NodeId, counter: u64 },`
			`}`
			```

			`Update frequency: Every 100ms-1s (configurable based on battery level)`

			`Strategy: Accumulate operations in memory, then batch-write them`

			`### Layer 3: SQLite with WAL Mode (5-10s commit interval)`

			Write buffer is flushed to SQLite, but we don't call `fsync()` immediately. Instead, we use WAL mode and control checkpoint timing:

			```sql
			`-- Enable Write-Ahead Logging`
			`PRAGMA journal_mode = WAL;`

			`-- Don't auto-checkpoint on every transaction`
			`PRAGMA wal_autocheckpoint = 0;`

			`-- Synchronous = NORMAL (fsync WAL on commit, but not every write)`
			`PRAGMA synchronous = NORMAL;`
			```

			`Update frequency: Manual checkpoints every 5-10 seconds (or on specific events)`

			`## Flush Events: When to Force Persistence`

			`Certain events require immediate persistence (within 1 second):`

			`### 1. Entity Creation`
			`When user creates a new entity, we need to persist its existence quickly:`
			`- Add to write buffer immediately`
			`- Trigger flush within 1 second`

			`### 2. Major User Actions`
			`Actions that represent "savepoints" in user's mental model:`
			`- Finishing a drawing stroke (stroke start → immediate, intermediate points → batched, stroke end → flush)`
			`- Deleting entities`
			`- Changing document metadata`
			`- Undo/redo operations`

			`### 3. Application State Transitions`
			`State changes that might precede app termination:`
			- App going to background (iOS `applicationWillResignActive`)
			`- Low memory warning`
			`- User explicitly saving (if we have a save button)`
			`- Switching documents/workspaces`

			`### 4. Network Events`
			`Sync protocol events that need persistence:`
			`- Receiving operation log entries from peers`
			`- Vector clock updates (every 5 operations or 5 seconds, whichever comes first)`

			`### 5. Periodic Background Flush`
			`Even if no major events happen:`
			`- Flush every 10 seconds during active use`
			`- Flush every 30 seconds when idle (no user input for >1 minute)`

			`## Battery-Adaptive Flushing`

			`Different flush strategies based on battery level:`

			```rust
			`fn get_flush_interval(battery_level: f32, is_charging: bool) -> Duration {`
			`if is_charging {`
			`Duration::from_secs(5) // Aggressive - power available`
			`} else if battery_level > 0.5 {`
			`Duration::from_secs(10) // Normal`
			`} else if battery_level > 0.2 {`
			`Duration::from_secs(30) // Conservative`
			`} else {`
			`Duration::from_secs(60) // Very conservative - low battery`
			`}`
			`}`
			```

			On iOS: Use `UIDevice.current.batteryLevel` and `UIDevice.current.batteryState`

			`## SQLite Optimizations for Mobile`

			`### Transaction Batching`

			`Group multiple writes into a single transaction:`

			```rust
			`async fn flush_write_buffer(buffer: &WriteBuffer, db: &Connection) -> Result<()> {`
			`let tx = db.transaction()?;`

			`// All writes in one transaction`
			`for op in &buffer.pending_operations {`
			`match op {`
			`PersistenceOp::UpsertEntity { id, data } => {`
			`tx.execute("INSERT OR REPLACE INTO entities (...) VALUES (...)", ...)?;`
			`}`
			`PersistenceOp::UpsertComponent { entity_id, component_type, data } => {`
			`tx.execute("INSERT OR REPLACE INTO components (...) VALUES (...)", ...)?;`
			`}`
			`// ...`
			`}`
			`}`

			`tx.commit()?; // Single fsync for entire batch`
			`}`
			```

			`Impact: 100 individual writes = 100 fsyncs. 1 transaction with 100 writes = 1 fsync.`

			`### WAL Mode Checkpoint Control`

			```rust
			`async fn checkpoint_wal(db: &Connection) -> Result<()> {`
			`// Manually checkpoint WAL to database file`
			`db.execute("PRAGMA wal_checkpoint(PASSIVE)", [])?;`
			`}`
			```

			`PASSIVE checkpoint: Doesn't block readers, syncs when possible`
			`When to checkpoint: Every 10 seconds, or when WAL exceeds 1MB`

			`### Index Strategy`

			`Be selective about indexes - they increase write cost:`

			```sql
			`-- Only index what we actually query frequently`
			`CREATE INDEX idx_components_entity ON components(entity_id);`
			`CREATE INDEX idx_oplog_node_seq ON operation_log(node_id, sequence_number);`

			`-- DON'T index everything just because we can`
			`-- Every index = extra writes on every INSERT/UPDATE`
			```

			`### Page Size Optimization`

			```sql
			`-- Larger page size = fewer I/O operations for sequential writes`
			`-- Default is 4KB, but 8KB or 16KB can be better for mobile`
			`PRAGMA page_size = 8192;`
			```

			`Caveat: Must be set before database is created (or VACUUM to rebuild)`

			`## Recovery Strategy`

			`What happens if app crashes before flush?`

			`### What We Lose`

			`Worst case: Up to 10 seconds of component updates (positions, transforms)`

			`What we DON'T lose:`
			`- Entity existence (flushed within 1 second of creation)`
			`- Operation log entries (flushed with vector clock updates)`
			`- Any data from before the last checkpoint`

			`### Recovery on Startup`

			```mermaid
			`graph TB`
			`A[App Starts] --> B[Open SQLite]`
			`B --> C{Check WAL file}`
			`C -->\|WAL exists\| D[Recover from WAL]`
			`C -->\|No WAL\| E[Load from main DB]`
			`D --> F[Load entities from DB]`
			`E --> F`
			`F --> G[Load operation log]`
			`G --> H[Rebuild vector clock]`
			`H --> I[Connect to gossip]`
			`I --> J[Request sync from peers]`
			`J --> K[Fill any gaps via anti-entropy]`
			`K --> L[Fully recovered]`
			```

			`Key insight: Even if we lose local state, gossip sync repairs it. Peers send us missing operations.`

			`### Crash Detection`

			`On startup, detect if previous session crashed:`

			```sql
			`CREATE TABLE session_state (`
			`key TEXT PRIMARY KEY,`
			`value TEXT`
			`);`

			`-- On startup, check if previous session closed cleanly`
			`SELECT value FROM session_state WHERE key = 'clean_shutdown';`

			`-- If not found or 'false', we crashed`
			`-- Trigger recovery procedures`
			```

			`## Platform-Specific Concerns`

			`### iOS / iPadOS`

			`Background app suspension: iOS aggressively suspends apps. We have ~5 seconds when moving to background:`

			```rust
			`// When app moves to background:`
			`fn handle_background_event() {`
			`// Force immediate flush`
			`flush_write_buffer().await?;`
			`checkpoint_wal().await?;`

			`// Mark clean shutdown`
			`db.execute("INSERT OR REPLACE INTO session_state VALUES ('clean_shutdown', 'true')", [])?;`
			`}`
			```

			`Low Power Mode: Detect and reduce flush frequency:`
			```swift
			`// iOS-specific detection`
			`if ProcessInfo.processInfo.isLowPowerModeEnabled {`
			`set_flush_interval(Duration::from_secs(60));`
			`}`
			```

			`### Desktop (macOS/Linux/Windows)`

			`More relaxed constraints:`
			`- Battery life less critical on plugged-in desktops`
			`- Can use more aggressive flush intervals (every 5 seconds)`
			`- Larger WAL sizes acceptable (up to 10MB before checkpoint)`

			`## Monitoring & Metrics`

			`Track these metrics to tune persistence:`

			```rust
			`struct PersistenceMetrics {`
			`// Write volume`
			`total_writes: u64,`
			`bytes_written: u64,`

			`// Timing`
			`flush_count: u64,`
			`avg_flush_duration: Duration,`
			`checkpoint_count: u64,`
			`avg_checkpoint_duration: Duration,`

			`// WAL health`
			`wal_size_bytes: u64,`
			`max_wal_size_bytes: u64,`

			`// Recovery`
			`crash_recovery_count: u64,`
			`clean_shutdown_count: u64,`
			`}`
			```

			`Alerts:`
			`- Flush duration >50ms (disk might be slow or overloaded)`
			`- WAL size >5MB (checkpoint more frequently)`
			`- Crash recovery rate >10% (need more aggressive flushing)`

			`## Write Coalescing: Deduplication`

			`When the same entity is modified multiple times before flush, we only keep the latest:`

			```rust
			`fn add_to_write_buffer(op: PersistenceOp, buffer: &mut WriteBuffer) {`
			`match op {`
			`PersistenceOp::UpsertComponent { entity_id, component_type, data } => {`
			`// Remove any existing pending write for this entity+component`
			`buffer.pending_operations.retain(\|existing_op\| {`
			`!matches!(existing_op,`
			`PersistenceOp::UpsertComponent {`
			`entity_id: e_id,`
			`component_type: c_type,`
			`..`
			`} if e_id == &entity_id && c_type == &component_type`
			`)`
			`});`

			`// Add the new one (latest state)`
			`buffer.pending_operations.push(op);`
			`}`
			`// ...`
			`}`
			`}`
			```

			`Impact: User drags object for 5 seconds @ 60fps = 300 transform updates → coalesced to 1 write`

			`## Persistence vs Sync: Division of Responsibility`

			`Important distinction:`

			`Persistence layer (this RFC):`
			`- Writes to local SQLite`
			`- Optimized for durability and battery life`
			`- Only cares about local state survival`

			`Sync layer (RFC 0001):`
			`- Broadcasts operations via gossip`
			`- Maintains operation log for anti-entropy`
			`- Ensures eventual consistency across peers`

			`Key insight: These operate independently. An operation can be:`
			`1. Logged to operation log (for sync) - happens immediately`
			`2. Applied to ECS (for rendering) - happens immediately`
			`3. Persisted to SQLite (for durability) - happens on flush schedule`

			`If local state is lost due to delayed flush, sync layer repairs it from peers.`

			`## Configuration Schema`

			`Expose configuration for tuning:`

			```toml
			`[persistence]`
			`# Base flush interval (may be adjusted by battery level)`
			`flush_interval_secs = 10`

			`# Max time to defer critical writes (entity creation, etc.)`
			`critical_flush_delay_ms = 1000`

			`# WAL checkpoint interval`
			`checkpoint_interval_secs = 30`

			`# Max WAL size before forced checkpoint`
			`max_wal_size_mb = 5`

			`# Adaptive flushing based on battery`
			`battery_adaptive = true`

			`# Flush intervals per battery tier`
			`[persistence.battery_tiers]`
			`charging = 5`
			`high = 10 # >50%`
			`medium = 30 # 20-50%`
			`low = 60 # <20%`

			`# Platform overrides`
			`[persistence.ios]`
			`background_flush_timeout_secs = 5`
			`low_power_mode_interval_secs = 60`
			```

			`## Example System Implementation`

			```rust
			`fn persistence_system(`
			`dirty: Res<DirtyEntities>,`
			`mut write_buffer: ResMut<WriteBuffer>,`
			`db: Res<DatabaseConnection>,`
			`time: Res<Time>,`
			`battery: Res<BatteryStatus>,`
			`query: Query<(Entity, &NetworkedEntity, &Transform, &/* other components */)>,`
			`) {`
			`// Step 1: Check if it's time to collect dirty entities`
			`let flush_interval = get_flush_interval(battery.level, battery.is_charging);`

			`if time.elapsed() - write_buffer.last_flush < flush_interval {`
			`return; // Not time yet`
			`}`

			`// Step 2: Collect dirty entities into write buffer`
			`for entity_uuid in &dirty.entities {`
			`if let Some((entity, net_entity, transform, /* ... */)) =`
			`query.iter().find(\|(_, ne, ..)\| ne.network_id == *entity_uuid)`
			`{`
			`// Serialize component`
			`let transform_data = bincode::serialize(transform)?;`

			`// Add to write buffer (coalescing happens here)`
			`write_buffer.add(PersistenceOp::UpsertComponent {`
			`entity_id: *entity_uuid,`
			`component_type: "Transform".to_string(),`
			`data: transform_data,`
			`});`
			`}`
			`}`

			`// Step 3: Flush write buffer to SQLite (async, non-blocking)`
			`if write_buffer.pending_operations.len() > 0 {`
			`let ops = std::mem::take(&mut write_buffer.pending_operations);`

			`// Spawn async task to write to SQLite`
			`spawn_blocking(move \|\| {`
			`flush_to_sqlite(&ops, &db)`
			`});`

			`write_buffer.last_flush = time.elapsed();`
			`}`

			`// Step 4: Clear dirty tracking (they're now in write buffer/SQLite)`
			`dirty.entities.clear();`
			`}`
			```

			`## Trade-offs and Decisions`

			`### Why WAL Mode?`

			`Alternatives:`
			`- DELETE mode (traditional journaling)`
			`- MEMORY mode (no durability)`

			`Decision: WAL mode because:`
			`- Better write concurrency (readers don't block writers)`
			- Fewer `fsync()` calls (only on checkpoint)
			`- Better crash recovery (WAL can be replayed)`

			`### Why Not Use a Dirty Flag on Components?`

			We could mark components with a `#[derive(Dirty)]` flag, but:
			- Bevy's `Changed<T>` already gives us change detection for free
			`- A separate dirty flag adds memory overhead`
			`- We'd need to manually clear flags after persistence`

			`Decision: Use Bevy's change detection + our own dirty tracking resource`

			`### Why Not Use a Separate Persistence Thread?`

			`We could run SQLite writes on a dedicated thread:`

			`Pros: Never blocks main thread`
			`Cons: More complex synchronization, harder to guarantee flush order`

			Decision: Use `spawn_blocking` from async runtime (Tokio). Simpler, good enough.

			`## Open Questions`

			`1. Write ordering: Do we need to guarantee operation log entries are persisted before entity state? Or can they be out of order?`
			`2. Compression: Should we compress component data before writing to SQLite? Trade-off: CPU vs I/O`
			`3. Memory limits: On iPad with 2GB RAM, how large can the write buffer grow before we force a flush?`

			`## Success Criteria`

			`We'll know this is working when:`
			`- [ ] App can run for 30 minutes with <5% battery drain attributed to persistence`
			`- [ ] Crash recovery loses <10 seconds of work`
			`- [ ] No perceptible frame drops during flush operations`
			`- [ ] SQLite file size grows linearly with user data, not explosively`
			`- [ ] WAL checkpoints complete in <100ms`

			`## Implementation Phases`

			`1. Phase 1: Basic in-memory dirty tracking + batched writes`
			`2. Phase 2: WAL mode + manual checkpoint control`
			`3. Phase 3: Battery-adaptive flushing`
			`4. Phase 4: iOS background handling`
			`5. Phase 5: Monitoring and tuning based on metrics`

			`## References`

			`- [SQLite WAL Mode](https://www.sqlite.org/wal.html)`
			`- [iOS Background Execution](https://developer.apple.com/documentation/uikit/app_and_environment/scenes/preparing_your_ui_to_run_in_the_background)`
			`- [Bevy Change Detection](https://docs.rs/bevy/latest/bevy/ecs/change_detection/)`