Files

Sienna Meridian Satterwhite 16deb5d237

release: Storybook v0.2.0 - Major syntax and features update

BREAKING CHANGES:
- Relationship syntax now requires blocks for all participants
- Removed self/other perspective blocks from relationships
- Replaced 'guard' keyword with 'if' for behavior tree decorators

Language Features:
- Add tree-sitter grammar with improved if/condition disambiguation
- Add comprehensive tutorial and reference documentation
- Add SBIR v0.2.0 binary format specification
- Add resource linking system for behaviors and schedules
- Add year-long schedule patterns (day, season, recurrence)
- Add behavior tree enhancements (named nodes, decorators)

Documentation:
- Complete tutorial series (9 chapters) with baker family examples
- Complete reference documentation for all language features
- SBIR v0.2.0 specification with binary format details
- Added locations and institutions documentation

Examples:
- Convert all examples to baker family scenario
- Add comprehensive working examples

Tooling:
- Zed extension with LSP integration
- Tree-sitter grammar for syntax highlighting
- Build scripts and development tools

Version Updates:
- Main package: 0.1.0 → 0.2.0
- Tree-sitter grammar: 0.1.0 → 0.2.0
- Zed extension: 0.1.0 → 0.2.0
- Storybook editor: 0.1.0 → 0.2.0

2026-02-13 21:52:03 +00:00

13 KiB

Raw Blame History

LSP Lexer Usage Audit

Date: 2026-02-12 Auditor: LSP Test Engineer Purpose: Verify all LSP modules use the lexer exclusively (no ad-hoc parsing)

Executive Summary

Status: ⚠️ Mixed Compliance

3 modules properly use lexer ✅
8 modules use ad-hoc parsing ❌
Critical Risk: Hardcoded keyword lists in 3 modules

Risk Level: MEDIUM

Inconsistent behavior between LSP and compiler
Vulnerability to keyword changes (like recent behavior tree updates)
Maintenance burden from duplicate logic

Audit Results by Module

✅ Compliant Modules (Using Lexer)

1. completion.rs - ✅ GOOD (with caveats)

Status: Uses Lexer properly for tokenization

Lexer Usage:

use crate::syntax::lexer::{Lexer, Token};
let lexer = Lexer::new(before);

Found at lines: 135, 142, 269, 277, 410, 417

Issue: ⚠️ Contains hardcoded keyword strings (8 occurrences)

Lines with hardcoded keywords: "character", "template", "behavior", etc.
Risk: Keywords could get out of sync with lexer

Recommendation: Extract keywords from lexer Token enum instead of hardcoding

2. semantic_tokens.rs - ✅ EXCELLENT

Status: Uses Lexer exclusively, no manual parsing

Lexer Usage:

use crate::syntax::lexer::{Lexer, Token};

Assessment: Best practice example - Uses lexer for all tokenization, no hardcoded keywords, no manual string manipulation

3. code_actions.rs - ✅ GOOD (with caveats)

Status: Uses Lexer for tokenization

Issues:

⚠️ Contains 15 instances of manual string parsing (.split(), .chars(), .lines())
⚠️ Has 4 hardcoded keyword strings
Concern: Mixed approach - uses lexer sometimes, manual parsing other times

Recommendation: Refactor to use lexer consistently throughout

❌ Non-Compliant Modules (Ad-Hoc Parsing)

4. hover.rs - ❌ CRITICAL ISSUE

Status: NO lexer usage, extensive manual parsing

Problems:

A. Manual Character-by-Character Parsing

// Line 51-77: extract_word_at_position()
let chars: Vec<char> = line.chars().collect();
// Custom word boundary detection
while start > 0 && is_word_char(chars[start - 1]) {
    start -= 1;
}

Risk: Custom tokenization logic differs from compiler's lexer

B. Hardcoded Keyword List (Lines 20-39)

match word.as_str() {
    "character" => "**character** - Defines a character entity...",
    "template" => "**template** - Defines a reusable field template...",
    "life_arc" => "**life_arc** - Defines a state machine...",
    "schedule" => "**schedule** - Defines a daily schedule...",
    "behavior" => "**behavior** - Defines a behavior tree...",
    "institution" => "**institution** - Defines an organization...",
    "relationship" => "**relationship** - Defines a multi-party...",
    "location" => "**location** - Defines a place...",
    "species" => "**species** - Defines a species...",
    "enum" => "**enum** - Defines an enumeration...",
    "use" => "**use** - Imports declarations...",
    "from" => "**from** - Applies templates...",
    "include" => "**include** - Includes another template...",
    "state" => "**state** - Defines a state...",
    "on" => "**on** - Defines a transition...",
    "strict" => "**strict** - Enforces that a template...",
    _ => return None,
}

Risk: HIGH

16 hardcoded keywords that must stay in sync with lexer
If lexer adds/removes keywords, hover won't update automatically
Recent behavior tree changes could have broken this

Recommendation:

// BEFORE (current):
let word = extract_word_at_position(line_text, character)?;

// AFTER (proposed):
let lexer = Lexer::new(line_text);
let token_at_pos = find_token_at_position(&lexer, character)?;
match token_at_pos {
    Token::Character => "**character** - Defines...",
    Token::Template => "**template** - Defines...",
    // etc - using Token enum from lexer
}

5. formatting.rs - ❌ MANUAL PARSING

Status: NO lexer usage, line-by-line text processing

Problems:

// Line 53: Manual line processing
for line in text.lines() {
    let trimmed = line.trim();
    // Custom logic for braces, colons, etc.
}

Risk: MEDIUM

Custom formatting logic may not respect language semantics
Could format code incorrectly if it doesn't understand context

Recommendation: Use lexer to tokenize, then format based on tokens

Preserves semantic understanding
Respects string literals, comments, etc.

6. diagnostics.rs - ❌ MANUAL PARSING

Status: NO lexer usage, manual brace counting

Problems:

// Lines 34-37: Manual brace counting
for (line_num, line) in text.lines().enumerate() {
    let open_braces = line.chars().filter(|&c| c == '{').count();
    let close_braces = line.chars().filter(|&c| c == '}').count();
}

// Line 79: Character-by-character processing
for ch in text.chars() {
    // Custom logic
}

Risk: HIGH

Brace counting doesn't account for braces in strings: character Alice { name: "{" }
Doesn't respect comments: // This { is a comment
Could generate false diagnostics

Recommendation: Use lexer tokens to track brace pairs accurately

7. references.rs - ❌ MANUAL STRING PARSING

Status: NO lexer usage

Problems:

// Manual string parsing for word boundaries

Risk: MEDIUM

May not correctly identify symbol boundaries
Could match partial words

Recommendation: Use lexer to identify identifiers

8. rename.rs - ❌ HARDCODED KEYWORDS

Status: NO lexer usage, contains hardcoded strings

Problems:

2 hardcoded keyword strings
Manual symbol identification

Risk: MEDIUM

May incorrectly rename keywords or miss valid renames

Recommendation: Use lexer to distinguish keywords from identifiers

9. definition.rs - ⚠️ UNCLEAR

Status: No obvious lexer usage, no obvious manual parsing

Assessment: Likely uses AST-based approach (acceptable)

May rely on symbols extracted from AST
Acceptable if using Document.ast for symbol lookup

Recommendation: Verify it's using AST, not string parsing

10. inlay_hints.rs - ⚠️ UNCLEAR

Status: No lexer usage detected, no obvious parsing

Assessment: Minimal code inspection needed

May be stub or use AST-based approach

Recommendation: Full inspection needed

11. symbols.rs - ✅ LIKELY COMPLIANT

Status: No manual parsing detected

Assessment: Appears to extract symbols from AST

Acceptable approach - AST is the canonical representation
No need for lexer if working from AST

Summary Table

Module	Lexer Usage	Manual Parsing	Hardcoded Keywords	Risk Level
completion.rs	✅ Yes	❌ No	⚠️ 8 keywords	MEDIUM
semantic_tokens.rs	✅ Yes	❌ No	✅ None	LOW
code_actions.rs	✅ Yes	⚠️ 15 instances	⚠️ 4 keywords	MEDIUM
hover.rs	❌ No	⚠️ Extensive	⚠️ 16 keywords	HIGH
formatting.rs	❌ No	⚠️ Line-by-line	✅ None	MEDIUM
diagnostics.rs	❌ No	⚠️ Char counting	✅ None	HIGH
references.rs	❌ No	⚠️ Yes	✅ None	MEDIUM
rename.rs	❌ No	⚠️ Yes	⚠️ 2 keywords	MEDIUM
definition.rs	⚠️ Unknown	⚠️ Unknown	✅ None	LOW
inlay_hints.rs	⚠️ Unknown	⚠️ Unknown	✅ None	LOW
symbols.rs	N/A (AST)	❌ No	✅ None	LOW

Critical Findings

1. hover.rs - Highest Risk

16 hardcoded keywords
Custom tokenization logic
Impact: Recent behavior tree keyword changes may have broken hover
Fix Priority: HIGH

2. diagnostics.rs - High Risk

Manual brace counting fails in strings/comments
Could generate false errors
Fix Priority: HIGH

3. Hardcoded Keywords - Maintenance Burden

Total: 30 hardcoded keyword strings across 4 files
Risk: Keywords get out of sync with lexer
Recent Example: Behavior tree syntax changes could have broken these

Recommended Fixes

Priority 1: hover.rs Refactoring

Estimated Time: 2-3 hours

Changes:

Add lexer import: use crate::syntax::lexer::{Lexer, Token};
Replace extract_word_at_position() with lexer-based token finding
Replace keyword match with Token enum match
Remove hardcoded keyword strings

Benefits:

Automatic sync with lexer changes
Consistent tokenization with compiler
Easier maintenance

Example Code:

pub fn get_hover_info(text: &str, line: usize, character: usize) -> Option<Hover> {
    let line_text = text.lines().nth(line)?;
    let lexer = Lexer::new(line_text);

    // Find token at character position
    let token_at_pos = find_token_at_position(&lexer, character)?;

    let content = match token_at_pos {
        Token::Character => "**character** - Defines a character entity...",
        Token::Template => "**template** - Defines a reusable field template...",
        // Use Token enum - stays in sync automatically
    };

    Some(Hover { /* ... */ })
}

Priority 2: diagnostics.rs Refactoring

Estimated Time: 1-2 hours

Changes:

Use lexer to tokenize text
Count LBrace/RBrace tokens (not characters)
Ignore braces in strings and comments automatically

Benefits:

Correct handling of braces in all contexts
No false positives

Priority 3: Extract Keyword Definitions

Estimated Time: 1 hour

Changes:

Create src/syntax/keywords.rs module
Define keyword list from lexer Token enum
Use in completion, code_actions, rename

Benefits:

Single source of truth for keywords
Easy to keep in sync

Testing Recommendations

After fixes, add tests for:

Hover on keywords in strings - Should NOT show hover

character Alice { name: "character" }  // hovering "character" in string

Diagnostics with braces in strings - Should NOT count

character Alice { name: "{}" }  // should not report brace mismatch

Lexer consistency tests - Verify LSP uses same tokens as compiler

#[test]
fn test_hover_uses_lexer_keywords() {
    // Ensure hover keywords match lexer Token enum
}

Long-Term Recommendations

1. Establish Coding Standards

Document that LSP modules should:

✅ Use lexer for all tokenization
✅ Use AST for semantic analysis
❌ Never do manual string parsing
❌ Never hardcode keywords

2. Code Review Checklist

Add to PR reviews:

Does LSP code use lexer for tokenization?
Are there any hardcoded keyword strings?
Is there any manual .split(), .chars(), .find() parsing?

3. Shared Utilities

Create src/lsp/lexer_utils.rs with:

find_token_at_position() - Common utility
get_keyword_docs() - Centralized keyword documentation
is_keyword() - Token-based keyword checking

Impact Assessment

Current State Risks

Behavior Tree Changes (Recent):

Lexer/parser updated with new keywords
❌ hover.rs still has old hardcoded list
⚠️ Users may not see hover for new keywords

Future Keyword Changes:

Any new keywords require updates in 4 separate files
Easy to miss one, causing inconsistent behavior

Bug Examples:

// diagnostics.rs bug:
character Alice { description: "Contains } brace" }
// ^ Reports false "unmatched brace" error

// hover.rs bug:
character Alice { /* hover on "character" keyword here */ }
// ^ Works

"In a character block..."  // hover on "character" in string
// ^ Also shows hover (WRONG - it's just a word in a string)

Conclusion

Overall Assessment: NEEDS IMPROVEMENT

Compliance Rate: 27% (3/11 modules)
Risk Level: MEDIUM-HIGH
Recommended Action: Refactor hover.rs and diagnostics.rs as priority

Benefits of Fixing:

Consistency with compiler behavior
Automatic sync with language changes
Reduced maintenance burden
Fewer bugs from edge cases

Next Steps:

Create GitHub issues for each Priority 1-2 fix
Refactor hover.rs (highest risk)
Refactor diagnostics.rs (high risk)
Extract keyword definitions (prevents future issues)
Add lexer usage to code review checklist

Appendix: Files Analyzed

Total LSP modules audited: 11
✅ Compliant: 3 (27%)
⚠️ Partially compliant: 1 (9%)
❌ Non-compliant: 7 (64%)

Files examined:
- src/lsp/completion.rs
- src/lsp/hover.rs
- src/lsp/semantic_tokens.rs
- src/lsp/inlay_hints.rs
- src/lsp/definition.rs
- src/lsp/formatting.rs
- src/lsp/rename.rs
- src/lsp/references.rs
- src/lsp/code_actions.rs
- src/lsp/symbols.rs
- src/lsp/diagnostics.rs

13 KiB Raw Blame History

LSP Lexer Usage Audit

Executive Summary

Audit Results by Module

✅ Compliant Modules (Using Lexer)

1. completion.rs - ✅ GOOD (with caveats)

2. semantic_tokens.rs - ✅ EXCELLENT

3. code_actions.rs - ✅ GOOD (with caveats)

❌ Non-Compliant Modules (Ad-Hoc Parsing)

4. hover.rs - ❌ CRITICAL ISSUE

5. formatting.rs - ❌ MANUAL PARSING

6. diagnostics.rs - ❌ MANUAL PARSING

7. references.rs - ❌ MANUAL STRING PARSING

8. rename.rs - ❌ HARDCODED KEYWORDS

9. definition.rs - ⚠️ UNCLEAR

10. inlay_hints.rs - ⚠️ UNCLEAR

11. symbols.rs - ✅ LIKELY COMPLIANT

Summary Table

Critical Findings

1. hover.rs - Highest Risk

2. diagnostics.rs - High Risk

3. Hardcoded Keywords - Maintenance Burden

Recommended Fixes

Priority 1: hover.rs Refactoring

Priority 2: diagnostics.rs Refactoring

Priority 3: Extract Keyword Definitions

Testing Recommendations

Long-Term Recommendations

1. Establish Coding Standards

2. Code Review Checklist

3. Shared Utilities

Impact Assessment

Current State Risks

Conclusion

Appendix: Files Analyzed

13 KiB

Raw Blame History