Files
marathon/docs/rfcs/0002-persistence-strategy.md
2025-11-16 16:34:55 +00:00

17 KiB

RFC 0002: Persistence Strategy for Battery-Efficient State Management

Status: Implemented Authors: Sienna Created: 2025-11-15 Related: RFC 0001 (CRDT Sync Protocol)

Abstract

This RFC defines a persistence strategy that balances data durability with battery efficiency for mobile platforms (iPad). The core challenge: Bevy runs at 60fps and generates continuous state changes, but we can't write to SQLite on every frame without destroying battery life and flash storage.

The Problem

Naive approach (bad):

fn sync_to_db_system(query: Query<&NetworkedEntity, Changed<Transform>>) {
    for entity in query.iter() {
        db.execute("UPDATE components SET data = ? WHERE entity_id = ?", ...)?;
        // This runs 60 times per second!
        // iPad battery: 💀
    }
}

Why this is terrible:

  • SQLite writes trigger fsync() syscalls (flush to physical storage)
  • Each fsync() on iOS can take 5-20ms and drains battery significantly
  • At 60fps with multiple entities, we'd be doing hundreds of disk writes per second
  • Flash wear: mobile devices have limited write cycles
  • User moves object around → hundreds of unnecessary writes of intermediate positions

Requirements

  1. Survive crashes: If the app crashes, user shouldn't lose more than a few seconds of work
  2. Battery efficient: Minimize disk I/O, especially fsync() calls
  3. Flash-friendly: Reduce write amplification on mobile storage
  4. Low latency: Persistence shouldn't block rendering or input
  5. Recoverable: On startup, we should be able to reconstruct recent state

Categorizing Data by Persistence Needs

Not all data is equal. We need to categorize by how critical immediate persistence is:

Tier 1: Critical State (Persist Immediately)

What: State that's hard or impossible to reconstruct if lost

  • User-created entities (the fact that they exist)
  • Operation log entries (for CRDT sync)
  • Vector clock state (for causality tracking)
  • Document metadata (name, creation time, etc.)

Why: These are the "source of truth" - if we lose them, data is gone

Strategy: Write to database within ~1 second of creation, but still batched

Tier 2: Derived State (Defer and Batch)

What: State that can be reconstructed or is constantly changing

  • Entity positions during drag operations
  • Transform components (position, rotation, scale)
  • UI state (selected items, viewport position)
  • Temporary drawing strokes in progress

Why: These change rapidly and the intermediate states aren't valuable

Strategy: Batch writes, flush every 5-10 seconds or on specific events

Tier 3: Ephemeral State (Never Persist)

What: State that only matters during current session

  • Remote peer cursors
  • Presence indicators (who's online)
  • Network connection status
  • Frame-rate metrics

Why: These are meaningless after restart

Strategy: Keep in-memory only (Bevy resources, not components)

Write Strategy: The Three-Buffer System

We use a three-tier approach to minimize disk writes while maintaining durability:

Layer 1: In-Memory Dirty Tracking (0ms latency)

Bevy change detection marks components as dirty, but we don't write immediately. Instead, we maintain a dirty set:

#[derive(Resource)]
struct DirtyEntities {
    // Entities with changes not yet in write buffer
    entities: HashSet<Uuid>,
    components: HashMap<Uuid, HashSet<String>>,  // entity → dirty component types
    last_modified: HashMap<Uuid, Instant>,       // when was it last changed
}

Update frequency: Every frame (cheap - just memory operations)

Layer 2: Write Buffer (100ms-1s batching)

Periodically (every 100ms-1s), we collect dirty entities and prepare a write batch:

#[derive(Resource)]
struct WriteBuffer {
    // Pending writes not yet committed to SQLite
    pending_operations: Vec<PersistenceOp>,
    last_flush: Instant,
}

enum PersistenceOp {
    UpsertEntity { id: Uuid, data: EntityData },
    UpsertComponent { entity_id: Uuid, component_type: String, data: Vec<u8> },
    LogOperation { node_id: NodeId, seq: u64, op: Vec<u8> },
    UpdateVectorClock { node_id: NodeId, counter: u64 },
}

Update frequency: Every 100ms-1s (configurable based on battery level)

Strategy: Accumulate operations in memory, then batch-write them

Layer 3: SQLite with WAL Mode (5-10s commit interval)

Write buffer is flushed to SQLite, but we don't call fsync() immediately. Instead, we use WAL mode and control checkpoint timing:

-- Enable Write-Ahead Logging
PRAGMA journal_mode = WAL;

-- Don't auto-checkpoint on every transaction
PRAGMA wal_autocheckpoint = 0;

-- Synchronous = NORMAL (fsync WAL on commit, but not every write)
PRAGMA synchronous = NORMAL;

Update frequency: Manual checkpoints every 5-10 seconds (or on specific events)

Flush Events: When to Force Persistence

Certain events require immediate persistence (within 1 second):

1. Entity Creation

When user creates a new entity, we need to persist its existence quickly:

  • Add to write buffer immediately
  • Trigger flush within 1 second

2. Major User Actions

Actions that represent "savepoints" in user's mental model:

  • Finishing a drawing stroke (stroke start → immediate, intermediate points → batched, stroke end → flush)
  • Deleting entities
  • Changing document metadata
  • Undo/redo operations

3. Application State Transitions

State changes that might precede app termination:

  • App going to background (iOS applicationWillResignActive)
  • Low memory warning
  • User explicitly saving (if we have a save button)
  • Switching documents/workspaces

4. Network Events

Sync protocol events that need persistence:

  • Receiving operation log entries from peers
  • Vector clock updates (every 5 operations or 5 seconds, whichever comes first)

5. Periodic Background Flush

Even if no major events happen:

  • Flush every 10 seconds during active use
  • Flush every 30 seconds when idle (no user input for >1 minute)

Battery-Adaptive Flushing

Different flush strategies based on battery level:

fn get_flush_interval(battery_level: f32, is_charging: bool) -> Duration {
    if is_charging {
        Duration::from_secs(5)  // Aggressive - power available
    } else if battery_level > 0.5 {
        Duration::from_secs(10)  // Normal
    } else if battery_level > 0.2 {
        Duration::from_secs(30)  // Conservative
    } else {
        Duration::from_secs(60)  // Very conservative - low battery
    }
}

On iOS: Use UIDevice.current.batteryLevel and UIDevice.current.batteryState

SQLite Optimizations for Mobile

Transaction Batching

Group multiple writes into a single transaction:

async fn flush_write_buffer(buffer: &WriteBuffer, db: &Connection) -> Result<()> {
    let tx = db.transaction()?;

    // All writes in one transaction
    for op in &buffer.pending_operations {
        match op {
            PersistenceOp::UpsertEntity { id, data } => {
                tx.execute("INSERT OR REPLACE INTO entities (...) VALUES (...)", ...)?;
            }
            PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
                tx.execute("INSERT OR REPLACE INTO components (...) VALUES (...)", ...)?;
            }
            // ...
        }
    }

    tx.commit()?;  // Single fsync for entire batch
}

Impact: 100 individual writes = 100 fsyncs. 1 transaction with 100 writes = 1 fsync.

WAL Mode Checkpoint Control

async fn checkpoint_wal(db: &Connection) -> Result<()> {
    // Manually checkpoint WAL to database file
    db.execute("PRAGMA wal_checkpoint(PASSIVE)", [])?;
}

PASSIVE checkpoint: Doesn't block readers, syncs when possible When to checkpoint: Every 10 seconds, or when WAL exceeds 1MB

Index Strategy

Be selective about indexes - they increase write cost:

-- Only index what we actually query frequently
CREATE INDEX idx_components_entity ON components(entity_id);
CREATE INDEX idx_oplog_node_seq ON operation_log(node_id, sequence_number);

-- DON'T index everything just because we can
-- Every index = extra writes on every INSERT/UPDATE

Page Size Optimization

-- Larger page size = fewer I/O operations for sequential writes
-- Default is 4KB, but 8KB or 16KB can be better for mobile
PRAGMA page_size = 8192;

Caveat: Must be set before database is created (or VACUUM to rebuild)

Recovery Strategy

What happens if app crashes before flush?

What We Lose

Worst case: Up to 10 seconds of component updates (positions, transforms)

What we DON'T lose:

  • Entity existence (flushed within 1 second of creation)
  • Operation log entries (flushed with vector clock updates)
  • Any data from before the last checkpoint

Recovery on Startup

graph TB
    A[App Starts] --> B[Open SQLite]
    B --> C{Check WAL file}
    C -->|WAL exists| D[Recover from WAL]
    C -->|No WAL| E[Load from main DB]
    D --> F[Load entities from DB]
    E --> F
    F --> G[Load operation log]
    G --> H[Rebuild vector clock]
    H --> I[Connect to gossip]
    I --> J[Request sync from peers]
    J --> K[Fill any gaps via anti-entropy]
    K --> L[Fully recovered]

Key insight: Even if we lose local state, gossip sync repairs it. Peers send us missing operations.

Crash Detection

On startup, detect if previous session crashed:

CREATE TABLE session_state (
    key TEXT PRIMARY KEY,
    value TEXT
);

-- On startup, check if previous session closed cleanly
SELECT value FROM session_state WHERE key = 'clean_shutdown';

-- If not found or 'false', we crashed
-- Trigger recovery procedures

Platform-Specific Concerns

iOS / iPadOS

Background app suspension: iOS aggressively suspends apps. We have ~5 seconds when moving to background:

// When app moves to background:
fn handle_background_event() {
    // Force immediate flush
    flush_write_buffer().await?;
    checkpoint_wal().await?;

    // Mark clean shutdown
    db.execute("INSERT OR REPLACE INTO session_state VALUES ('clean_shutdown', 'true')", [])?;
}

Low Power Mode: Detect and reduce flush frequency:

// iOS-specific detection
if ProcessInfo.processInfo.isLowPowerModeEnabled {
    set_flush_interval(Duration::from_secs(60));
}

Desktop (macOS/Linux/Windows)

More relaxed constraints:

  • Battery life less critical on plugged-in desktops
  • Can use more aggressive flush intervals (every 5 seconds)
  • Larger WAL sizes acceptable (up to 10MB before checkpoint)

Monitoring & Metrics

Track these metrics to tune persistence:

struct PersistenceMetrics {
    // Write volume
    total_writes: u64,
    bytes_written: u64,

    // Timing
    flush_count: u64,
    avg_flush_duration: Duration,
    checkpoint_count: u64,
    avg_checkpoint_duration: Duration,

    // WAL health
    wal_size_bytes: u64,
    max_wal_size_bytes: u64,

    // Recovery
    crash_recovery_count: u64,
    clean_shutdown_count: u64,
}

Alerts:

  • Flush duration >50ms (disk might be slow or overloaded)
  • WAL size >5MB (checkpoint more frequently)
  • Crash recovery rate >10% (need more aggressive flushing)

Write Coalescing: Deduplication

When the same entity is modified multiple times before flush, we only keep the latest:

fn add_to_write_buffer(op: PersistenceOp, buffer: &mut WriteBuffer) {
    match op {
        PersistenceOp::UpsertComponent { entity_id, component_type, data } => {
            // Remove any existing pending write for this entity+component
            buffer.pending_operations.retain(|existing_op| {
                !matches!(existing_op,
                    PersistenceOp::UpsertComponent {
                        entity_id: e_id,
                        component_type: c_type,
                        ..
                    } if e_id == &entity_id && c_type == &component_type
                )
            });

            // Add the new one (latest state)
            buffer.pending_operations.push(op);
        }
        // ...
    }
}

Impact: User drags object for 5 seconds @ 60fps = 300 transform updates → coalesced to 1 write

Persistence vs Sync: Division of Responsibility

Important distinction:

Persistence layer (this RFC):

  • Writes to local SQLite
  • Optimized for durability and battery life
  • Only cares about local state survival

Sync layer (RFC 0001):

  • Broadcasts operations via gossip
  • Maintains operation log for anti-entropy
  • Ensures eventual consistency across peers

Key insight: These operate independently. An operation can be:

  1. Logged to operation log (for sync) - happens immediately
  2. Applied to ECS (for rendering) - happens immediately
  3. Persisted to SQLite (for durability) - happens on flush schedule

If local state is lost due to delayed flush, sync layer repairs it from peers.

Configuration Schema

Expose configuration for tuning:

[persistence]
# Base flush interval (may be adjusted by battery level)
flush_interval_secs = 10

# Max time to defer critical writes (entity creation, etc.)
critical_flush_delay_ms = 1000

# WAL checkpoint interval
checkpoint_interval_secs = 30

# Max WAL size before forced checkpoint
max_wal_size_mb = 5

# Adaptive flushing based on battery
battery_adaptive = true

# Flush intervals per battery tier
[persistence.battery_tiers]
charging = 5
high = 10      # >50%
medium = 30    # 20-50%
low = 60       # <20%

# Platform overrides
[persistence.ios]
background_flush_timeout_secs = 5
low_power_mode_interval_secs = 60

Example System Implementation

fn persistence_system(
    dirty: Res<DirtyEntities>,
    mut write_buffer: ResMut<WriteBuffer>,
    db: Res<DatabaseConnection>,
    time: Res<Time>,
    battery: Res<BatteryStatus>,
    query: Query<(Entity, &NetworkedEntity, &Transform, &/* other components */)>,
) {
    // Step 1: Check if it's time to collect dirty entities
    let flush_interval = get_flush_interval(battery.level, battery.is_charging);

    if time.elapsed() - write_buffer.last_flush < flush_interval {
        return;  // Not time yet
    }

    // Step 2: Collect dirty entities into write buffer
    for entity_uuid in &dirty.entities {
        if let Some((entity, net_entity, transform, /* ... */)) =
            query.iter().find(|(_, ne, ..)| ne.network_id == *entity_uuid)
        {
            // Serialize component
            let transform_data = bincode::serialize(transform)?;

            // Add to write buffer (coalescing happens here)
            write_buffer.add(PersistenceOp::UpsertComponent {
                entity_id: *entity_uuid,
                component_type: "Transform".to_string(),
                data: transform_data,
            });
        }
    }

    // Step 3: Flush write buffer to SQLite (async, non-blocking)
    if write_buffer.pending_operations.len() > 0 {
        let ops = std::mem::take(&mut write_buffer.pending_operations);

        // Spawn async task to write to SQLite
        spawn_blocking(move || {
            flush_to_sqlite(&ops, &db)
        });

        write_buffer.last_flush = time.elapsed();
    }

    // Step 4: Clear dirty tracking (they're now in write buffer/SQLite)
    dirty.entities.clear();
}

Trade-offs and Decisions

Why WAL Mode?

Alternatives:

  • DELETE mode (traditional journaling)
  • MEMORY mode (no durability)

Decision: WAL mode because:

  • Better write concurrency (readers don't block writers)
  • Fewer fsync() calls (only on checkpoint)
  • Better crash recovery (WAL can be replayed)

Why Not Use a Dirty Flag on Components?

We could mark components with a #[derive(Dirty)] flag, but:

  • Bevy's Changed<T> already gives us change detection for free
  • A separate dirty flag adds memory overhead
  • We'd need to manually clear flags after persistence

Decision: Use Bevy's change detection + our own dirty tracking resource

Why Not Use a Separate Persistence Thread?

We could run SQLite writes on a dedicated thread:

Pros: Never blocks main thread Cons: More complex synchronization, harder to guarantee flush order

Decision: Use spawn_blocking from async runtime (Tokio). Simpler, good enough.

Open Questions

  1. Write ordering: Do we need to guarantee operation log entries are persisted before entity state? Or can they be out of order?
  2. Compression: Should we compress component data before writing to SQLite? Trade-off: CPU vs I/O
  3. Memory limits: On iPad with 2GB RAM, how large can the write buffer grow before we force a flush?

Success Criteria

We'll know this is working when:

  • App can run for 30 minutes with <5% battery drain attributed to persistence
  • Crash recovery loses <10 seconds of work
  • No perceptible frame drops during flush operations
  • SQLite file size grows linearly with user data, not explosively
  • WAL checkpoints complete in <100ms

Implementation Phases

  1. Phase 1: Basic in-memory dirty tracking + batched writes
  2. Phase 2: WAL mode + manual checkpoint control
  3. Phase 3: Battery-adaptive flushing
  4. Phase 4: iOS background handling
  5. Phase 5: Monitoring and tuning based on metrics

References