initial commit for session and lock features

Signed-off-by: Sienna Meridian Satterwhite <sienna@r3t.io>
This commit is contained in:
2025-12-12 20:18:41 +00:00
parent e4754eef3d
commit 9d4e603db3
28 changed files with 3178 additions and 655 deletions

View File

@@ -68,45 +68,32 @@ The session data model defines how collaborative sessions are identified, tracke
### Session Identification
Each collaborative session needs a globally unique identifier that's both machine-readable and human-friendly. We use UUIDs internally for uniqueness while providing a human-readable "session code" format (like `abc-def-123`) that users can easily share and enter manually.
Each collaborative session needs a unique identifier that users can easily share and enter. The design prioritizes human usability while maintaining technical uniqueness and security.
The `SessionId` type wraps a UUID but provides bidirectional conversion to/from 6-character alphanumeric codes. This code format makes it easy to verbally communicate session IDs ("join session abc-def-one-two-three") or manually type them into a join dialog.
**User-Facing Session Codes**:
Additionally, each session ID can be deterministically converted to an ALPN (Application-Layer Protocol Negotiation) identifier using BLAKE3 hashing. This ensures that peers in different sessions are cryptographically isolated at the network transport layer - they literally cannot discover or communicate with each other.
Sessions are identified by short, memorable codes in the format `abc-def-123` (9 alphanumeric characters in three groups). This format is:
- **Easy to communicate verbally**: "Join session abc-def-one-two-three"
- **Simple to type**: No confusing characters (0 vs O, 1 vs l)
- **Shareable**: Can be sent via chat, email, or written down
```rust
/// Unique identifier for a collaborative session
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Hash)]
pub struct SessionId(Uuid);
When a user creates a session, they see a code like `xyz-789-mno` that they can share with collaborators. When joining, they simply type this code into a dialog.
impl SessionId {
/// Create a new random session ID
pub fn new() -> Self {
Self(Uuid::new_v4())
}
**Technical Implementation**:
/// Create from a human-readable code (e.g., "abc-def-ghi")
pub fn from_code(code: &str) -> Result<Self> {
// Parse format: xxx-yyy-zzz (3 groups of 3 lowercase alphanumeric)
// Maps to deterministic UUID using hash
let uuid = generate_uuid_from_code(code)?;
Ok(Self(uuid))
}
Behind the scenes, each session code maps to a UUID (Universally Unique Identifier) that provides true global uniqueness. The `SessionId` type handles bidirectional conversion:
- User codes → UUIDs via deterministic hashing
- UUIDs → display codes via formatting
/// Get human-readable code (first 9 characters of UUID formatted)
pub fn to_code(&self) -> String {
format_uuid_as_code(&self.0)
}
**Network Isolation**:
/// Derive ALPN protocol identifier from session ID
pub fn to_alpn(&self) -> [u8; 32] {
let mut hasher = blake3::Hasher::new();
hasher.update(b"lonni-session-v1");
hasher.update(self.0.as_bytes());
*hasher.finalize().as_bytes()
}
}
```
Each session ID also derives a unique ALPN (Application-Layer Protocol Negotiation) identifier using BLAKE3 hashing. This provides cryptographic isolation at the transport layer - peers in different sessions literally cannot discover or communicate with each other, even if they're on the same local network.
**Key Operations**:
- `SessionId::new()`: Generates a random UUID v4 for new sessions
- `SessionId::from_code(code)`: Parses human-readable codes (format: `xxx-yyy-zzz`) into UUIDs
- `to_code()`: Converts UUID to a 9-character alphanumeric code for display
- `to_alpn()`: Derives a 32-byte ALPN identifier for network isolation
### Session Metadata
@@ -120,58 +107,23 @@ This metadata serves several purposes:
The `CurrentSession` resource represents the active session within the Bevy ECS world. It includes both the session metadata and the vector clock state at the time of joining, which is essential for the hybrid sync protocol.
```rust
/// Metadata about a collaborative session
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Session {
/// Unique identifier for this session
pub id: SessionId,
**Session Structure**:
/// Human-readable name (optional)
pub name: Option<String>,
The `Session` struct contains:
- **id**: Unique `SessionId` identifying this session
- **name**: Optional human-readable label (e.g., "Monday Design Review")
- **created_at**: Timestamp of session creation
- **last_active**: When this node was last active in the session (for auto-rejoin)
- **entity_count**: Cached count of entities (for UI display)
- **state**: Current lifecycle state (see state machine below)
- **secret**: Optional encrypted password for session access control
/// When this session was created
pub created_at: DateTime<Utc>,
**Session States**:
/// Last time this node was active in this session
pub last_active: DateTime<Utc>,
Five states track the session lifecycle (see "Session State Transitions" section below for detailed state machine):
- `Created`, `Joining`, `Active`, `Disconnected`, `Left`
/// How many entities are in this session (cached)
pub entity_count: usize,
/// Session state
pub state: SessionState,
/// Optional session secret for authentication
pub secret: Option<Vec<u8>>,
}
/// Current state of a session
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
pub enum SessionState {
/// Session created but not yet joined network
Created,
/// Currently joining (waiting for FullState or deltas)
Joining,
/// Fully synchronized and active
Active,
/// Temporarily disconnected (will attempt rejoin)
Disconnected,
/// Cleanly left (archived, can rejoin later)
Left,
}
/// Bevy resource tracking the current session
#[derive(Resource)]
pub struct CurrentSession {
pub session: Session,
pub vector_clock_at_join: VectorClock,
}
```
The `CurrentSession` Bevy resource wraps the session metadata along with the vector clock captured at join time. This clock snapshot enables the hybrid sync protocol to determine which deltas are needed when rejoining.
### Database Schema
@@ -272,15 +224,9 @@ Instead of using a single global gossip topic, each session gets its own ALPN (A
Each session derives a unique ALPN identifier using BLAKE3 cryptographic hashing. The derivation is deterministic - the same session ID always produces the same ALPN - which allows all peers to independently compute the correct ALPN for a session they want to join.
```rust
/// Derive a unique ALPN protocol identifier from a session ID
pub fn derive_alpn_from_session(session_id: &SessionId) -> [u8; 32] {
let mut hasher = blake3::Hasher::new();
hasher.update(b"/app/v1/session-id/"); // Domain separation prefix
hasher.update(session_id.0.as_bytes());
*hasher.finalize().as_bytes()
}
```
**Derivation Process**:
The ALPN is computed by hashing the session UUID with BLAKE3, using a domain separation prefix (`/app/v1/session-id/`) followed by the session ID bytes. This produces a deterministic 32-byte identifier that all peers independently compute from the same session code.
The design provides several security and isolation guarantees:
@@ -301,59 +247,29 @@ The gossip network initialization is modified to use session-specific ALPNs inst
The combination ensures both local and remote sessions work seamlessly.
```rust
/// Initialize gossip with session-specific ALPN
async fn init_gossip_for_session(session: &Session) -> Result<GossipBridge> {
info!("Creating endpoint with discovery...");
let endpoint = Endpoint::builder()
.discovery(MdnsDiscovery::builder()) // Local network
.discovery(PkarrDiscovery::builder()) // Internet-wide via pkarr DNS
.bind()
.await?;
**Initialization Flow**:
let endpoint_id = endpoint.addr().id;
let node_id = endpoint_id_to_uuid(&endpoint_id);
The initialization process has three temporal phases:
info!("Node ID: {}", node_id);
info!("Session ID: {}", session.id.to_code());
**One-time Endpoint Setup** (occurs once per application launch):
1. **Endpoint Creation**: Build an iroh `Endpoint` with both mDNS and Pkarr discovery mechanisms enabled
2. **Gossip Protocol**: Spawn the gossip protocol handler using `Gossip::builder().spawn(endpoint)`
// Derive session-specific ALPN
let alpn = session.id.to_alpn();
info!("Using session ALPN: {}", hex::encode(&alpn[..8]));
**Per-session Connection** (occurs when joining each session):
3. **ALPN Derivation**: Call `session.id.to_alpn()` to compute the 32-byte session-specific ALPN identifier
4. **Router Configuration**: Create a router that only accepts connections on the session's ALPN
- Critical: Use the derived ALPN, not the default `iroh_gossip::ALPN`
- This enforces network isolation at the transport layer
5. **Topic Subscription**: Subscribe to a gossip topic derived from the ALPN (can reuse the same bytes)
6. **Join Wait**: Wait up to 2 seconds for the join confirmation
- Timeout is expected for the first node in a session (no peers yet)
- Errors are logged but don't prevent continuing
info!("Spawning gossip protocol...");
let gossip = Gossip::builder().spawn(endpoint.clone());
**Background Operation** (runs continuously while session is active):
7. **Bridge Creation**: Create a `GossipBridge` that wraps the gossip channels and provides session context
8. **Task Spawning**: Launch background tasks to forward messages between gossip and application
// Router accepts connections with this specific ALPN only
info!("Setting up router with session ALPN...");
let router = Router::builder(endpoint.clone())
.accept(alpn, gossip.clone()) // NOTE: Use session ALPN, not iroh_gossip::ALPN
.spawn();
// Subscribe to topic (can use session ID directly as topic)
let topic_id = TopicId::from_bytes(alpn);
info!("Subscribing to session topic...");
let subscribe_handle = gossip.subscribe(topic_id, vec![]).await?;
let (sender, mut receiver) = subscribe_handle.split();
// Wait for join (with timeout)
info!("Waiting for gossip join...");
match tokio::time::timeout(Duration::from_secs(2), receiver.joined()).await {
Ok(Ok(())) => info!("Joined session gossip swarm"),
Ok(Err(e)) => warn!("Join error: {} (proceeding anyway)", e),
Err(_) => info!("Join timeout (first node in session)"),
}
// Create bridge with session context
let bridge = GossipBridge::new_with_session(node_id, session.id.clone());
// Spawn forwarding tasks
spawn_bridge_tasks(sender, receiver, bridge.clone(), endpoint, router, gossip);
Ok(bridge)
}
```
The key architectural decision is using the same ALPN bytes for both transport-layer connection acceptance and application-layer topic identification. This ensures consistent isolation across both layers.
### Session Discovery
@@ -405,42 +321,17 @@ The key fields are:
Existing peers use this information to decide: "Can I send just deltas, or do I need to send the full state?"
```rust
/// Request to join a session
JoinRequest {
/// ID of the node requesting to join
node_id: NodeId,
**JoinRequest Message Structure**:
/// Session ID we're trying to join
session_id: SessionId,
| Field | Type | Purpose |
|-------|------|---------|
| `node_id` | `NodeId` | Identifier of the joining node |
| `session_id` | `SessionId` | Target session UUID |
| `session_secret` | `Option<Vec<u8>>` | Authentication credential if session is password-protected |
| `last_known_clock` | `Option<VectorClock>` | Vector clock from previous participation; `None` indicates fresh join requiring full state |
| `join_type` | `JoinType` | Enum: `Fresh` or `Rejoin { last_active, entity_count }` |
/// Optional session secret for authentication
session_secret: Option<Vec<u8>>,
/// Our last known vector clock for this session (if rejoining)
/// None = fresh join (need full state)
/// Some = rejoin (only need deltas since this clock)
last_known_clock: Option<VectorClock>,
/// Are we rejoining or joining fresh?
join_type: JoinType,
},
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum JoinType {
/// First time joining this session
Fresh,
/// Rejoining after disconnect/restart
Rejoin {
/// When we last left
last_active: DateTime<Utc>,
/// How many entities we had
entity_count: usize,
},
}
```
The `last_known_clock` field is the key discriminator: its presence signals that the node has persistent state and only needs deltas, while its absence triggers full state transfer.
### Join Flow: Fresh Join
@@ -532,103 +423,40 @@ sequenceDiagram
The join handler is a Bevy system that runs on existing peers and responds to incoming `JoinRequest` messages. Its responsibility is to decide whether to send full state or deltas based on the joining node's vector clock.
The handler performs several critical validations:
**Message Processing Loop**:
1. **Session ID Validation**: Ensures the request is for the current session (prevents cross-session pollution)
2. **Secret Validation**: If the session is password-protected, validates the provided secret using constant-time comparison
3. **Clock Analysis**: Compares the requester's `last_known_clock` with the operation log to determine if deltas are feasible
4. **Response Selection**: Chooses between sending `MissingDeltas` (for rejoins with <1000 operations) or `FullState` (for fresh joins or large deltas)
The system polls the gossip bridge for incoming messages and filters for `JoinRequest` messages. For each request, it performs a multi-stage validation and response pipeline:
The 1000-operation threshold is a heuristic: below this, sending individual deltas is more efficient than serializing the entire world. Above it, the overhead of transmitting many small deltas exceeds the cost of sending a snapshot.
**Stage 1: Security Validation**
```rust
pub fn handle_join_request_system(
world: &World,
bridge: Res<GossipBridge>,
current_session: Res<CurrentSession>,
operation_log: Res<OperationLog>,
networked_entities: Query<(Entity, &NetworkedEntity)>,
type_registry: Res<AppTypeRegistry>,
node_clock: Res<NodeVectorClock>,
) {
while let Some(message) = bridge.try_recv() {
match message.message {
SyncMessage::JoinRequest {
node_id,
session_id,
session_secret,
last_known_clock,
join_type,
} => {
// Validate session ID matches
if session_id != current_session.session.id {
warn!("JoinRequest for wrong session: expected {}, got {}",
current_session.session.id.to_code(),
session_id.to_code());
continue;
}
1. **Session ID Check**: Verify the request is for the current session
- Mismatched session IDs are logged and rejected
- Prevents cross-session message pollution
2. **Secret Validation**: If the session has a secret, validate the provided credential
- Uses constant-time comparison via `validate_session_secret()`
- Rejects requests with invalid or missing secrets
// Validate session secret if configured
if let Some(expected_secret) = &current_session.session.secret {
match &session_secret {
Some(provided) if validate_session_secret(provided, expected_secret).is_ok() => {
info!("Session secret validated for node {}", node_id);
}
_ => {
error!("JoinRequest from {} rejected: invalid secret", node_id);
continue;
}
}
}
**Stage 2: Delta Feasibility Analysis**
info!("Handling JoinRequest from {} ({:?})", node_id, join_type);
The handler examines the `join_type` and `last_known_clock` fields:
// Decide: send deltas or full state?
let response = match (join_type, last_known_clock) {
(JoinType::Rejoin { .. }, Some(their_clock)) => {
// Check if we can send deltas
let missing_deltas = operation_log.get_all_operations_newer_than(&their_clock);
- **Fresh Join** (`join_type: Fresh` or `last_known_clock: None`):
- Always send `FullState` - no other option available
- Call `build_full_state_for_session()` to serialize all entities and components
const MAX_DELTA_OPS: usize = 1000;
if missing_deltas.len() <= MAX_DELTA_OPS {
info!("Sending {} deltas to rejoining node {}", missing_deltas.len(), node_id);
VersionedMessage::new(SyncMessage::MissingDeltas {
deltas: missing_deltas,
})
} else {
info!("Too many deltas ({}), sending FullState instead", missing_deltas.len());
build_full_state_for_session(
world,
&networked_entities,
&type_registry.read(),
&node_clock,
&session_id,
)
}
}
_ => {
// Fresh join - send full state
info!("Sending FullState to node {}", node_id);
build_full_state_for_session(
world,
&networked_entities,
&type_registry.read(),
&node_clock,
&session_id,
)
}
};
- **Rejoin** (`join_type: Rejoin` with `last_known_clock: Some(clock)`):
- Query operation log: `get_all_operations_newer_than(their_clock)`
- Count missing operations
- If count ≤ 1000: Send `MissingDeltas` message with operation list
- If count > 1000: Send `FullState` instead (more efficient than 1000+ small messages)
// Send response
if let Err(e) = bridge.send(response) {
error!("Failed to send join response: {}", e);
}
}
_ => {}
}
}
}
```
**Stage 3: Response Transmission**
Send the constructed response message (`FullState` or `MissingDeltas`) over the gossip bridge. Log errors if transmission fails.
**Design Rationale**:
The 1000-operation threshold is a heuristic based on message overhead: below this, individual delta messages are smaller than a full world snapshot. Above it, the cost of serializing and transmitting 1000+ small messages exceeds the cost of sending one large snapshot. This threshold can be tuned based on profiling.
## Temporary Lock-based Ownership
@@ -643,18 +471,20 @@ To prevent CRDT conflicts on complex operations (e.g., multi-step drawing, entit
**Design Principles:**
- **Initiator-driven**: Locks are requested immediately when user interaction begins (e.g., clicking an object), not after waiting for server approval
- **Optimistic**: The local client assumes the lock will succeed and allows immediate interaction; conflicts are resolved asynchronously
- **Temporary**: Locks auto-expire after 5 seconds (default) to prevent orphaned locks from crashed nodes
- **Temporary**: Locks auto-expire after 5 seconds to prevent orphaned locks from crashed nodes
- **Advisory**: Locks are checked before delta generation, but the underlying CRDT still handles conflicts if locks fail
- **Deterministic conflict resolution**: When two nodes request the same lock simultaneously, the higher node ID wins
- **Auto-release**: Disconnected nodes automatically lose all their locks
**Note**: The 5-second lock timeout is fixed in the initial implementation. Future versions may make this configurable per-entity-type or per-session based on UX requirements.
### Lock State Model
The lock system is implemented as a simple in-memory registry that tracks which entities are currently locked and by whom. Each lock contains:
- **Entity ID**: Which entity is locked
- **Holder**: Which node owns the lock
- **Acquisition timestamp**: When the lock was acquired
- **Timeout duration**: How long until auto-expiry (default 5 seconds)
- **Timeout duration**: How long until auto-expiry (5 seconds)
The `EntityLockRegistry` resource maintains a HashMap of entity ID to lock state, plus an acquisition history queue for rate limiting.
@@ -841,187 +671,85 @@ The integration is implemented through Bevy systems that run at specific lifecyc
The session lifecycle is managed through two primary Bevy systems: one for initialization on startup, and one for persisting state on shutdown.
```rust
/// Load or create session on startup
pub fn initialize_session_system(
mut commands: Commands,
db: Res<PersistenceDb>,
) {
let conn = db.lock().unwrap();
**Startup: `initialize_session_system`**
// Check for previous session
let session = match get_last_active_session(&conn) {
Ok(Some(session)) => {
info!("Resuming previous session: {}", session.id.to_code());
session
}
Ok(None) | Err(_) => {
info!("No previous session, creating new one");
let session = Session {
id: SessionId::new(),
name: None,
created_at: Utc::now(),
last_active: Utc::now(),
entity_count: 0,
state: SessionState::Created,
secret: None,
};
On application startup, this system queries the database for the most recent active session:
// Persist to database
if let Err(e) = save_session(&conn, &session) {
error!("Failed to save new session: {}", e);
}
1. **Session Discovery**: Query `sessions` table ordered by `last_active DESC` to find the most recent session
2. **Decision Point**:
- If a session exists: Resume it (enables automatic rejoin after crashes)
- If no session exists: Create a new session with a random UUID and default state
3. **Vector Clock Loading**: Load the session's vector clock from the database, or initialize an empty clock for new sessions
4. **Resource Initialization**: Insert `CurrentSession` resource containing session metadata and the saved vector clock
session
}
};
This enables crash recovery - if the app crashes and restarts, it automatically resumes the previous session and rejoins the network.
// Load vector clock for this session
let vector_clock = load_session_vector_clock(&conn, &session.id)
.unwrap_or_else(|_| VectorClock::new());
**Shutdown: `save_session_state_system`**
// Insert as resource
commands.insert_resource(CurrentSession {
session: session.clone(),
vector_clock_at_join: vector_clock.clone(),
});
On clean shutdown, this system persists the current session state:
info!("Session initialized: {}", session.id.to_code());
}
1. **Update Metadata**: Set `last_active` timestamp, count current entities, mark state as `Left`
2. **Save Session**: Write session metadata to the `sessions` table using `INSERT OR REPLACE`
3. **Save Vector Clock**: Transaction-based save that clears old clock entries and inserts current state for all known nodes
/// Save session state on shutdown
pub fn save_session_state_system(
current_session: Res<CurrentSession>,
node_clock: Res<NodeVectorClock>,
networked_entities: Query<&NetworkedEntity>,
db: Res<PersistenceDb>,
) {
let mut conn = db.lock().unwrap();
// Update session metadata
let mut session = current_session.session.clone();
session.last_active = Utc::now();
session.entity_count = networked_entities.iter().count();
session.state = SessionState::Left;
if let Err(e) = save_session(&conn, &session) {
error!("Failed to save session state: {}", e);
}
// Save vector clock
if let Err(e) = save_session_vector_clock(&conn, &session.id, &node_clock.clock) {
error!("Failed to save vector clock: {}", e);
}
info!("Session state saved for {}", session.id.to_code());
}
```
The vector clock save uses a transaction to ensure atomic updates - either all clock entries are saved, or none are. This prevents partial clock states that could cause sync issues on rejoin.
### Database Operations
```rust
/// Load the most recent active session
pub fn get_last_active_session(conn: &Connection) -> Result<Option<Session>> {
let row = conn.query_row(
"SELECT id, name, created_at, last_active, entity_count, state, secret
FROM sessions
ORDER BY last_active DESC
LIMIT 1",
[],
|row| {
Ok(Session {
id: SessionId(Uuid::from_slice(&row.get::<_, Vec<u8>>(0)?)?),
name: row.get(1)?,
created_at: timestamp_to_datetime(row.get(2)?),
last_active: timestamp_to_datetime(row.get(3)?),
entity_count: row.get(4)?,
state: parse_session_state(&row.get::<_, String>(5)?),
secret: row.get(6)?,
})
},
).optional()?;
The persistence layer provides several key database operations:
Ok(row)
}
**Session Queries**:
- `get_last_active_session()`: Queries the most recent session by `last_active DESC`, returns `Option<Session>`
- `save_session()`: Upserts session metadata using `INSERT OR REPLACE`, persisting all session fields
/// Save session metadata
pub fn save_session(conn: &Connection, session: &Session) -> Result<()> {
conn.execute(
"INSERT OR REPLACE INTO sessions (id, name, created_at, last_active, entity_count, state, secret)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)",
rusqlite::params![
session.id.0.as_bytes(),
session.name,
session.created_at.timestamp(),
session.last_active.timestamp(),
session.entity_count,
format!("{:?}", session.state).to_lowercase(),
session.secret,
],
)?;
**Vector Clock Persistence**:
- `load_session_vector_clock()`: Queries all `node_id`/`counter` pairs for a session, rebuilding the HashMap
- `save_session_vector_clock()`: Transactional save that deletes old entries then inserts current clock state
Ok(())
}
/// Load vector clock for a session
pub fn load_session_vector_clock(conn: &Connection, session_id: &SessionId) -> Result<VectorClock> {
let mut clock = VectorClock::new();
let mut stmt = conn.prepare(
"SELECT node_id, counter
FROM vector_clock
WHERE session_id = ?1"
)?;
let rows = stmt.query_map([session_id.0.as_bytes()], |row| {
let node_str: String = row.get(0)?;
let counter: u64 = row.get(1)?;
Ok((Uuid::parse_str(&node_str).unwrap(), counter))
})?;
for row in rows {
let (node_id, counter) = row?;
clock.clocks.insert(node_id, counter);
}
Ok(clock)
}
/// Save vector clock for a session
pub fn save_session_vector_clock(
conn: &mut Connection,
session_id: &SessionId,
clock: &VectorClock,
) -> Result<()> {
let tx = conn.transaction()?;
// Clear old clocks for this session
tx.execute(
"DELETE FROM vector_clock WHERE session_id = ?1",
[session_id.0.as_bytes()],
)?;
// Insert current clocks
for (node_id, counter) in &clock.clocks {
tx.execute(
"INSERT INTO vector_clock (session_id, node_id, counter, updated_at)
VALUES (?1, ?2, ?3, ?4)",
rusqlite::params![
session_id.0.as_bytes(),
node_id.to_string(),
counter,
Utc::now().timestamp(),
],
)?;
}
tx.commit()?;
Ok(())
}
```
All operations use parameterized queries to prevent SQL injection and handle optional fields (like `name` and `secret`) correctly. Session IDs are stored as 16-byte BLOBs for efficiency.
## Implementation Roadmap
### Documentation Standards
All public APIs should follow Rust documentation conventions with comprehensive docstrings. Expected format:
```rust
/// Derives a session-specific ALPN identifier for network isolation.
///
/// This function computes a deterministic 32-byte BLAKE3 hash from the session ID,
/// using a domain separation prefix to prevent collisions with other protocol uses.
/// All peers independently compute the same ALPN from a given session code, enabling
/// decentralized coordination without a central authority.
///
/// # Arguments
/// * `session_id` - The unique session identifier
///
/// # Returns
/// A 32-byte BLAKE3 hash suitable for use as an ALPN protocol identifier
///
/// # Example
/// ```
/// let session = SessionId::new();
/// let alpn = derive_alpn_from_session(&session);
/// assert_eq!(alpn.len(), 32);
/// ```
///
/// # Security
/// The domain separation prefix (`/app/v1/session-id/`) ensures ALPNs cannot
/// collide with other protocol uses of the same hash space.
pub fn derive_alpn_from_session(session_id: &SessionId) -> [u8; 32]
```
Key documentation elements:
- **Summary**: One-line description of purpose
- **Detailed explanation**: How it works and why
- **Arguments**: All parameters with types and descriptions
- **Returns**: What the function produces
- **Examples**: Working code demonstrating usage
- **Panics/Errors**: Document failure conditions
- **Security/Safety**: Highlight security-critical behavior
### Phase 1: Session Data Model & Persistence
- Create `SessionId`, `Session`, `SessionState` types
- Add database schema migration
@@ -1121,6 +849,24 @@ pub fn save_session_vector_clock(
- Vector clock + LWW determines final state
- Locks are advisory (CRDTs provide safety net)
**Convergence Behavior**:
When the partition heals, the gossip protocol reconnects and nodes exchange their full state. The CRDT merge process happens automatically:
1. Vector clocks from both partitions are compared
2. Operations with concurrent clocks are merged using Last-Write-Wins (LWW) based on timestamps
3. The final state reflects the operation with the highest timestamp
4. Convergence typically completes within 1-2 seconds after reconnection
**UX Implications**:
Users in the "losing" partition may see their changes overridden. To minimize surprise:
- Visual indicator shows when the app is disconnected (yellow/orange connection status)
- On reconnection, entities that changed display a brief animation/highlight
- A notification shows "Reconnected - syncing changes" with entity count
- Changes made during disconnection that were overridden could be logged for potential manual recovery (future enhancement)
The system prioritizes consistency over preserving every edit during splits, which is acceptable for collaborative creative work where real-time coordination is expected.
## Security Considerations
Security in a peer-to-peer collaborative environment requires careful balance between usability and protection. This RFC addresses two primary security concerns: session access control and protocol integrity.
@@ -1134,34 +880,22 @@ The secret is hashed using BLAKE3 before comparison, ensuring that:
- Timing analysis cannot reveal secret length or content
- Fast validation (BLAKE3 is extremely performant)
```rust
/// Validate session secret using constant-time comparison
pub fn validate_session_secret(provided: &[u8], expected: &[u8]) -> Result<(), AuthError> {
use subtle::ConstantTimeEq;
let provided_hash = blake3::hash(provided);
let expected_hash = blake3::hash(expected);
if provided_hash.as_bytes().ct_eq(expected_hash.as_bytes()).into() {
Ok(())
} else {
Err(AuthError::InvalidSecret)
}
}
```
The validation function uses the `subtle` crate's `ConstantTimeEq` trait to perform constant-time comparison of the hashed secrets, preventing timing-based attacks that could leak information about the secret.
### Rate Limiting
```rust
// In EntityLockRegistry
const MAX_LOCKS_PER_NODE: usize = 100;
const MAX_LOCK_REQUESTS_PER_SEC: usize = 10;
To prevent abuse and buggy clients from monopolizing resources, the lock system implements two rate limits:
// Prevent lock spamming
if self.locks_held_by(node_id) >= MAX_LOCKS_PER_NODE {
return Err(LockError::TooManyLocks);
}
```
1. **Total Locks per Node**: Maximum 100 concurrent locks per node
- Prevents a single node from locking every entity in the session
- Ensures entities remain available for other participants
2. **Acquisition Rate**: Maximum 10 lock requests per second per node
- Prevents rapid lock spamming attacks
- Tracked via a rolling 60-second acquisition history queue
- Old entries are pruned to prevent memory growth
When a rate limit is exceeded, the lock request returns a `LockError::RateLimited` error. The requesting node's UI should display appropriate feedback to the user.
## Performance Considerations
@@ -1210,6 +944,14 @@ This streaming approach provides several UX benefits:
The 100x bandwidth improvement for delta-based rejoins remains the primary optimization, but streaming ensures fresh joins also have good UX.
**Implementation Notes**:
Several implementation details are deferred to Phase 3 (Hybrid Join Protocol):
- **Dependency Ordering**: Entities with parent-child relationships or component references will be sorted before streaming to ensure dependencies arrive before dependents
- **Message Size Limits**: Batch size (50-100 entities) will be dynamically adjusted based on average entity serialization size to stay under QUIC message size limits (~1 MB)
- **Retry Mechanism**: Missing entity ranges are tracked and can be requested via `RequestMissingEntities { start_index, end_index }` if gaps are detected
- **Cancellation**: If the user abandons the join before completion, in-flight batches are discarded and the partial state is cleared
## Testing Strategy
### Unit Tests