BREAKING CHANGES: - Relationship syntax now requires blocks for all participants - Removed self/other perspective blocks from relationships - Replaced 'guard' keyword with 'if' for behavior tree decorators Language Features: - Add tree-sitter grammar with improved if/condition disambiguation - Add comprehensive tutorial and reference documentation - Add SBIR v0.2.0 binary format specification - Add resource linking system for behaviors and schedules - Add year-long schedule patterns (day, season, recurrence) - Add behavior tree enhancements (named nodes, decorators) Documentation: - Complete tutorial series (9 chapters) with baker family examples - Complete reference documentation for all language features - SBIR v0.2.0 specification with binary format details - Added locations and institutions documentation Examples: - Convert all examples to baker family scenario - Add comprehensive working examples Tooling: - Zed extension with LSP integration - Tree-sitter grammar for syntax highlighting - Build scripts and development tools Version Updates: - Main package: 0.1.0 → 0.2.0 - Tree-sitter grammar: 0.1.0 → 0.2.0 - Zed extension: 0.1.0 → 0.2.0 - Storybook editor: 0.1.0 → 0.2.0
13 KiB
LSP Lexer Usage Audit
Date: 2026-02-12 Auditor: LSP Test Engineer Purpose: Verify all LSP modules use the lexer exclusively (no ad-hoc parsing)
Executive Summary
Status: ⚠️ Mixed Compliance
- 3 modules properly use lexer ✅
- 8 modules use ad-hoc parsing ❌
- Critical Risk: Hardcoded keyword lists in 3 modules
Risk Level: MEDIUM
- Inconsistent behavior between LSP and compiler
- Vulnerability to keyword changes (like recent behavior tree updates)
- Maintenance burden from duplicate logic
Audit Results by Module
✅ Compliant Modules (Using Lexer)
1. completion.rs - ✅ GOOD (with caveats)
Status: Uses Lexer properly for tokenization
Lexer Usage:
use crate::syntax::lexer::{Lexer, Token};
let lexer = Lexer::new(before);
Found at lines: 135, 142, 269, 277, 410, 417
Issue: ⚠️ Contains hardcoded keyword strings (8 occurrences)
- Lines with hardcoded keywords: "character", "template", "behavior", etc.
- Risk: Keywords could get out of sync with lexer
Recommendation: Extract keywords from lexer Token enum instead of hardcoding
2. semantic_tokens.rs - ✅ EXCELLENT
Status: Uses Lexer exclusively, no manual parsing
Lexer Usage:
use crate::syntax::lexer::{Lexer, Token};
Assessment: Best practice example - Uses lexer for all tokenization, no hardcoded keywords, no manual string manipulation
3. code_actions.rs - ✅ GOOD (with caveats)
Status: Uses Lexer for tokenization
Issues:
- ⚠️ Contains 15 instances of manual string parsing (
.split(),.chars(),.lines()) - ⚠️ Has 4 hardcoded keyword strings
- Concern: Mixed approach - uses lexer sometimes, manual parsing other times
Recommendation: Refactor to use lexer consistently throughout
❌ Non-Compliant Modules (Ad-Hoc Parsing)
4. hover.rs - ❌ CRITICAL ISSUE
Status: NO lexer usage, extensive manual parsing
Problems:
A. Manual Character-by-Character Parsing
// Line 51-77: extract_word_at_position()
let chars: Vec<char> = line.chars().collect();
// Custom word boundary detection
while start > 0 && is_word_char(chars[start - 1]) {
start -= 1;
}
Risk: Custom tokenization logic differs from compiler's lexer
B. Hardcoded Keyword List (Lines 20-39)
match word.as_str() {
"character" => "**character** - Defines a character entity...",
"template" => "**template** - Defines a reusable field template...",
"life_arc" => "**life_arc** - Defines a state machine...",
"schedule" => "**schedule** - Defines a daily schedule...",
"behavior" => "**behavior** - Defines a behavior tree...",
"institution" => "**institution** - Defines an organization...",
"relationship" => "**relationship** - Defines a multi-party...",
"location" => "**location** - Defines a place...",
"species" => "**species** - Defines a species...",
"enum" => "**enum** - Defines an enumeration...",
"use" => "**use** - Imports declarations...",
"from" => "**from** - Applies templates...",
"include" => "**include** - Includes another template...",
"state" => "**state** - Defines a state...",
"on" => "**on** - Defines a transition...",
"strict" => "**strict** - Enforces that a template...",
_ => return None,
}
Risk: HIGH
- 16 hardcoded keywords that must stay in sync with lexer
- If lexer adds/removes keywords, hover won't update automatically
- Recent behavior tree changes could have broken this
Recommendation:
// BEFORE (current):
let word = extract_word_at_position(line_text, character)?;
// AFTER (proposed):
let lexer = Lexer::new(line_text);
let token_at_pos = find_token_at_position(&lexer, character)?;
match token_at_pos {
Token::Character => "**character** - Defines...",
Token::Template => "**template** - Defines...",
// etc - using Token enum from lexer
}
5. formatting.rs - ❌ MANUAL PARSING
Status: NO lexer usage, line-by-line text processing
Problems:
// Line 53: Manual line processing
for line in text.lines() {
let trimmed = line.trim();
// Custom logic for braces, colons, etc.
}
Risk: MEDIUM
- Custom formatting logic may not respect language semantics
- Could format code incorrectly if it doesn't understand context
Recommendation: Use lexer to tokenize, then format based on tokens
- Preserves semantic understanding
- Respects string literals, comments, etc.
6. diagnostics.rs - ❌ MANUAL PARSING
Status: NO lexer usage, manual brace counting
Problems:
// Lines 34-37: Manual brace counting
for (line_num, line) in text.lines().enumerate() {
let open_braces = line.chars().filter(|&c| c == '{').count();
let close_braces = line.chars().filter(|&c| c == '}').count();
}
// Line 79: Character-by-character processing
for ch in text.chars() {
// Custom logic
}
Risk: HIGH
- Brace counting doesn't account for braces in strings:
character Alice { name: "{" } - Doesn't respect comments:
// This { is a comment - Could generate false diagnostics
Recommendation: Use lexer tokens to track brace pairs accurately
7. references.rs - ❌ MANUAL STRING PARSING
Status: NO lexer usage
Problems:
// Manual string parsing for word boundaries
Risk: MEDIUM
- May not correctly identify symbol boundaries
- Could match partial words
Recommendation: Use lexer to identify identifiers
8. rename.rs - ❌ HARDCODED KEYWORDS
Status: NO lexer usage, contains hardcoded strings
Problems:
- 2 hardcoded keyword strings
- Manual symbol identification
Risk: MEDIUM
- May incorrectly rename keywords or miss valid renames
Recommendation: Use lexer to distinguish keywords from identifiers
9. definition.rs - ⚠️ UNCLEAR
Status: No obvious lexer usage, no obvious manual parsing
Assessment: Likely uses AST-based approach (acceptable)
- May rely on symbols extracted from AST
- Acceptable if using
Document.astfor symbol lookup
Recommendation: Verify it's using AST, not string parsing
10. inlay_hints.rs - ⚠️ UNCLEAR
Status: No lexer usage detected, no obvious parsing
Assessment: Minimal code inspection needed
- May be stub or use AST-based approach
Recommendation: Full inspection needed
11. symbols.rs - ✅ LIKELY COMPLIANT
Status: No manual parsing detected
Assessment: Appears to extract symbols from AST
- Acceptable approach - AST is the canonical representation
- No need for lexer if working from AST
Summary Table
| Module | Lexer Usage | Manual Parsing | Hardcoded Keywords | Risk Level |
|---|---|---|---|---|
| completion.rs | ✅ Yes | ❌ No | ⚠️ 8 keywords | MEDIUM |
| semantic_tokens.rs | ✅ Yes | ❌ No | ✅ None | LOW |
| code_actions.rs | ✅ Yes | ⚠️ 15 instances | ⚠️ 4 keywords | MEDIUM |
| hover.rs | ❌ No | ⚠️ Extensive | ⚠️ 16 keywords | HIGH |
| formatting.rs | ❌ No | ⚠️ Line-by-line | ✅ None | MEDIUM |
| diagnostics.rs | ❌ No | ⚠️ Char counting | ✅ None | HIGH |
| references.rs | ❌ No | ⚠️ Yes | ✅ None | MEDIUM |
| rename.rs | ❌ No | ⚠️ Yes | ⚠️ 2 keywords | MEDIUM |
| definition.rs | ⚠️ Unknown | ⚠️ Unknown | ✅ None | LOW |
| inlay_hints.rs | ⚠️ Unknown | ⚠️ Unknown | ✅ None | LOW |
| symbols.rs | N/A (AST) | ❌ No | ✅ None | LOW |
Critical Findings
1. hover.rs - Highest Risk
- 16 hardcoded keywords
- Custom tokenization logic
- Impact: Recent behavior tree keyword changes may have broken hover
- Fix Priority: HIGH
2. diagnostics.rs - High Risk
- Manual brace counting fails in strings/comments
- Could generate false errors
- Fix Priority: HIGH
3. Hardcoded Keywords - Maintenance Burden
- Total: 30 hardcoded keyword strings across 4 files
- Risk: Keywords get out of sync with lexer
- Recent Example: Behavior tree syntax changes could have broken these
Recommended Fixes
Priority 1: hover.rs Refactoring
Estimated Time: 2-3 hours
Changes:
- Add lexer import:
use crate::syntax::lexer::{Lexer, Token}; - Replace
extract_word_at_position()with lexer-based token finding - Replace keyword match with Token enum match
- Remove hardcoded keyword strings
Benefits:
- Automatic sync with lexer changes
- Consistent tokenization with compiler
- Easier maintenance
Example Code:
pub fn get_hover_info(text: &str, line: usize, character: usize) -> Option<Hover> {
let line_text = text.lines().nth(line)?;
let lexer = Lexer::new(line_text);
// Find token at character position
let token_at_pos = find_token_at_position(&lexer, character)?;
let content = match token_at_pos {
Token::Character => "**character** - Defines a character entity...",
Token::Template => "**template** - Defines a reusable field template...",
// Use Token enum - stays in sync automatically
};
Some(Hover { /* ... */ })
}
Priority 2: diagnostics.rs Refactoring
Estimated Time: 1-2 hours
Changes:
- Use lexer to tokenize text
- Count LBrace/RBrace tokens (not characters)
- Ignore braces in strings and comments automatically
Benefits:
- Correct handling of braces in all contexts
- No false positives
Priority 3: Extract Keyword Definitions
Estimated Time: 1 hour
Changes:
- Create
src/syntax/keywords.rsmodule - Define keyword list from lexer Token enum
- Use in completion, code_actions, rename
Benefits:
- Single source of truth for keywords
- Easy to keep in sync
Testing Recommendations
After fixes, add tests for:
-
Hover on keywords in strings - Should NOT show hover
character Alice { name: "character" } // hovering "character" in string -
Diagnostics with braces in strings - Should NOT count
character Alice { name: "{}" } // should not report brace mismatch -
Lexer consistency tests - Verify LSP uses same tokens as compiler
#[test] fn test_hover_uses_lexer_keywords() { // Ensure hover keywords match lexer Token enum }
Long-Term Recommendations
1. Establish Coding Standards
Document that LSP modules should:
- ✅ Use lexer for all tokenization
- ✅ Use AST for semantic analysis
- ❌ Never do manual string parsing
- ❌ Never hardcode keywords
2. Code Review Checklist
Add to PR reviews:
- Does LSP code use lexer for tokenization?
- Are there any hardcoded keyword strings?
- Is there any manual
.split(),.chars(),.find()parsing?
3. Shared Utilities
Create src/lsp/lexer_utils.rs with:
find_token_at_position()- Common utilityget_keyword_docs()- Centralized keyword documentationis_keyword()- Token-based keyword checking
Impact Assessment
Current State Risks
Behavior Tree Changes (Recent):
- Lexer/parser updated with new keywords
- ❌
hover.rsstill has old hardcoded list - ⚠️ Users may not see hover for new keywords
Future Keyword Changes:
- Any new keywords require updates in 4 separate files
- Easy to miss one, causing inconsistent behavior
Bug Examples:
// diagnostics.rs bug:
character Alice { description: "Contains } brace" }
// ^ Reports false "unmatched brace" error
// hover.rs bug:
character Alice { /* hover on "character" keyword here */ }
// ^ Works
"In a character block..." // hover on "character" in string
// ^ Also shows hover (WRONG - it's just a word in a string)
Conclusion
Overall Assessment: NEEDS IMPROVEMENT
- Compliance Rate: 27% (3/11 modules)
- Risk Level: MEDIUM-HIGH
- Recommended Action: Refactor hover.rs and diagnostics.rs as priority
Benefits of Fixing:
- Consistency with compiler behavior
- Automatic sync with language changes
- Reduced maintenance burden
- Fewer bugs from edge cases
Next Steps:
- Create GitHub issues for each Priority 1-2 fix
- Refactor hover.rs (highest risk)
- Refactor diagnostics.rs (high risk)
- Extract keyword definitions (prevents future issues)
- Add lexer usage to code review checklist
Appendix: Files Analyzed
Total LSP modules audited: 11
✅ Compliant: 3 (27%)
⚠️ Partially compliant: 1 (9%)
❌ Non-compliant: 7 (64%)
Files examined:
- src/lsp/completion.rs
- src/lsp/hover.rs
- src/lsp/semantic_tokens.rs
- src/lsp/inlay_hints.rs
- src/lsp/definition.rs
- src/lsp/formatting.rs
- src/lsp/rename.rs
- src/lsp/references.rs
- src/lsp/code_actions.rs
- src/lsp/symbols.rs
- src/lsp/diagnostics.rs