LSP_LEXER_AUDIT.md

# LSP Lexer Usage Audit
**Date:** 2026-02-12
**Auditor:** LSP Test Engineer
**Purpose:** Verify all LSP modules use the lexer exclusively (no ad-hoc parsing)

---

## Executive Summary

**Status:** ⚠️ **Mixed Compliance**

- **3 modules** properly use lexer ✅
- **8 modules** use ad-hoc parsing ❌
- **Critical Risk:** Hardcoded keyword lists in 3 modules

**Risk Level:** MEDIUM
- Inconsistent behavior between LSP and compiler
- Vulnerability to keyword changes (like recent behavior tree updates)
- Maintenance burden from duplicate logic

---

## Audit Results by Module

### ✅ Compliant Modules (Using Lexer)

#### 1. **completion.rs** - ✅ GOOD (with caveats)
**Status:** Uses Lexer properly for tokenization

**Lexer Usage:**
```rust
use crate::syntax::lexer::{Lexer, Token};
let lexer = Lexer::new(before);
```

**Found at lines:** 135, 142, 269, 277, 410, 417

**Issue:** ⚠️ Contains hardcoded keyword strings (8 occurrences)
- Lines with hardcoded keywords: "character", "template", "behavior", etc.
- **Risk:** Keywords could get out of sync with lexer

**Recommendation:** Extract keywords from lexer Token enum instead of hardcoding

---

#### 2. **semantic_tokens.rs** - ✅ EXCELLENT
**Status:** Uses Lexer exclusively, no manual parsing

**Lexer Usage:**
```rust
use crate::syntax::lexer::{Lexer, Token};
```

**Assessment:** **Best practice example** - Uses lexer for all tokenization, no hardcoded keywords, no manual string manipulation

---

#### 3. **code_actions.rs** - ✅ GOOD (with caveats)
**Status:** Uses Lexer for tokenization

**Issues:**
- ⚠️ Contains 15 instances of manual string parsing (`.split()`, `.chars()`, `.lines()`)
- ⚠️ Has 4 hardcoded keyword strings
- **Concern:** Mixed approach - uses lexer sometimes, manual parsing other times

**Recommendation:** Refactor to use lexer consistently throughout

---

### ❌ Non-Compliant Modules (Ad-Hoc Parsing)

#### 4. **hover.rs** - ❌ CRITICAL ISSUE
**Status:** NO lexer usage, extensive manual parsing

**Problems:**

**A. Manual Character-by-Character Parsing**
```rust
// Line 51-77: extract_word_at_position()
let chars: Vec<char> = line.chars().collect();
// Custom word boundary detection
while start > 0 && is_word_char(chars[start - 1]) {
    start -= 1;
}
```

**Risk:** Custom tokenization logic differs from compiler's lexer

**B. Hardcoded Keyword List** (Lines 20-39)
```rust
match word.as_str() {
    "character" => "**character** - Defines a character entity...",
    "template" => "**template** - Defines a reusable field template...",
    "life_arc" => "**life_arc** - Defines a state machine...",
    "schedule" => "**schedule** - Defines a daily schedule...",
    "behavior" => "**behavior** - Defines a behavior tree...",
    "institution" => "**institution** - Defines an organization...",
    "relationship" => "**relationship** - Defines a multi-party...",
    "location" => "**location** - Defines a place...",
    "species" => "**species** - Defines a species...",
    "enum" => "**enum** - Defines an enumeration...",
    "use" => "**use** - Imports declarations...",
    "from" => "**from** - Applies templates...",
    "include" => "**include** - Includes another template...",
    "state" => "**state** - Defines a state...",
    "on" => "**on** - Defines a transition...",
    "strict" => "**strict** - Enforces that a template...",
    _ => return None,
}
```

**Risk:** HIGH
- 16 hardcoded keywords that must stay in sync with lexer
- If lexer adds/removes keywords, hover won't update automatically
- Recent behavior tree changes could have broken this

**Recommendation:**
```rust
// BEFORE (current):
let word = extract_word_at_position(line_text, character)?;

// AFTER (proposed):
let lexer = Lexer::new(line_text);
let token_at_pos = find_token_at_position(&lexer, character)?;
match token_at_pos {
    Token::Character => "**character** - Defines...",
    Token::Template => "**template** - Defines...",
    // etc - using Token enum from lexer
}
```

---

#### 5. **formatting.rs** - ❌ MANUAL PARSING
**Status:** NO lexer usage, line-by-line text processing

**Problems:**
```rust
// Line 53: Manual line processing
for line in text.lines() {
    let trimmed = line.trim();
    // Custom logic for braces, colons, etc.
}
```

**Risk:** MEDIUM
- Custom formatting logic may not respect language semantics
- Could format code incorrectly if it doesn't understand context

**Recommendation:** Use lexer to tokenize, then format based on tokens
- Preserves semantic understanding
- Respects string literals, comments, etc.

---

#### 6. **diagnostics.rs** - ❌ MANUAL PARSING
**Status:** NO lexer usage, manual brace counting

**Problems:**
```rust
// Lines 34-37: Manual brace counting
for (line_num, line) in text.lines().enumerate() {
    let open_braces = line.chars().filter(|&c| c == '{').count();
    let close_braces = line.chars().filter(|&c| c == '}').count();
}

// Line 79: Character-by-character processing
for ch in text.chars() {
    // Custom logic
}
```

**Risk:** HIGH
- Brace counting doesn't account for braces in strings: `character Alice { name: "{" }`
- Doesn't respect comments: `// This { is a comment`
- Could generate false diagnostics

**Recommendation:** Use lexer tokens to track brace pairs accurately

---

#### 7. **references.rs** - ❌ MANUAL STRING PARSING
**Status:** NO lexer usage

**Problems:**
```rust
// Manual string parsing for word boundaries
```

**Risk:** MEDIUM
- May not correctly identify symbol boundaries
- Could match partial words

**Recommendation:** Use lexer to identify identifiers

---

#### 8. **rename.rs** - ❌ HARDCODED KEYWORDS
**Status:** NO lexer usage, contains hardcoded strings

**Problems:**
- 2 hardcoded keyword strings
- Manual symbol identification

**Risk:** MEDIUM
- May incorrectly rename keywords or miss valid renames

**Recommendation:** Use lexer to distinguish keywords from identifiers

---

#### 9. **definition.rs** - ⚠️ UNCLEAR
**Status:** No obvious lexer usage, no obvious manual parsing

**Assessment:** Likely uses AST-based approach (acceptable)
- May rely on symbols extracted from AST
- **Acceptable** if using `Document.ast` for symbol lookup

**Recommendation:** Verify it's using AST, not string parsing

---

#### 10. **inlay_hints.rs** - ⚠️ UNCLEAR
**Status:** No lexer usage detected, no obvious parsing

**Assessment:** Minimal code inspection needed
- May be stub or use AST-based approach

**Recommendation:** Full inspection needed

---

#### 11. **symbols.rs** - ✅ LIKELY COMPLIANT
**Status:** No manual parsing detected

**Assessment:** Appears to extract symbols from AST
- **Acceptable approach** - AST is the canonical representation
- No need for lexer if working from AST

---

## Summary Table

| Module | Lexer Usage | Manual Parsing | Hardcoded Keywords | Risk Level |
|--------|-------------|----------------|-------------------|------------|
| **completion.rs** | ✅ Yes | ❌ No | ⚠️ 8 keywords | MEDIUM |
| **semantic_tokens.rs** | ✅ Yes | ❌ No | ✅ None | LOW |
| **code_actions.rs** | ✅ Yes | ⚠️ 15 instances | ⚠️ 4 keywords | MEDIUM |
| **hover.rs** | ❌ No | ⚠️ Extensive | ⚠️ 16 keywords | **HIGH** |
| **formatting.rs** | ❌ No | ⚠️ Line-by-line | ✅ None | MEDIUM |
| **diagnostics.rs** | ❌ No | ⚠️ Char counting | ✅ None | **HIGH** |
| **references.rs** | ❌ No | ⚠️ Yes | ✅ None | MEDIUM |
| **rename.rs** | ❌ No | ⚠️ Yes | ⚠️ 2 keywords | MEDIUM |
| **definition.rs** | ⚠️ Unknown | ⚠️ Unknown | ✅ None | LOW |
| **inlay_hints.rs** | ⚠️ Unknown | ⚠️ Unknown | ✅ None | LOW |
| **symbols.rs** | N/A (AST) | ❌ No | ✅ None | LOW |

---

## Critical Findings

### 1. **hover.rs** - Highest Risk
- 16 hardcoded keywords
- Custom tokenization logic
- **Impact:** Recent behavior tree keyword changes may have broken hover
- **Fix Priority:** HIGH

### 2. **diagnostics.rs** - High Risk
- Manual brace counting fails in strings/comments
- Could generate false errors
- **Fix Priority:** HIGH

### 3. **Hardcoded Keywords** - Maintenance Burden
- **Total:** 30 hardcoded keyword strings across 4 files
- **Risk:** Keywords get out of sync with lexer
- **Recent Example:** Behavior tree syntax changes could have broken these

---

## Recommended Fixes

### Priority 1: hover.rs Refactoring
**Estimated Time:** 2-3 hours

**Changes:**
1. Add lexer import: `use crate::syntax::lexer::{Lexer, Token};`
2. Replace `extract_word_at_position()` with lexer-based token finding
3. Replace keyword match with Token enum match
4. Remove hardcoded keyword strings

**Benefits:**
- Automatic sync with lexer changes
- Consistent tokenization with compiler
- Easier maintenance

**Example Code:**
```rust
pub fn get_hover_info(text: &str, line: usize, character: usize) -> Option<Hover> {
    let line_text = text.lines().nth(line)?;
    let lexer = Lexer::new(line_text);

    // Find token at character position
    let token_at_pos = find_token_at_position(&lexer, character)?;

    let content = match token_at_pos {
        Token::Character => "**character** - Defines a character entity...",
        Token::Template => "**template** - Defines a reusable field template...",
        // Use Token enum - stays in sync automatically
    };

    Some(Hover { /* ... */ })
}
```

---

### Priority 2: diagnostics.rs Refactoring
**Estimated Time:** 1-2 hours

**Changes:**
1. Use lexer to tokenize text
2. Count LBrace/RBrace tokens (not characters)
3. Ignore braces in strings and comments automatically

**Benefits:**
- Correct handling of braces in all contexts
- No false positives

---

### Priority 3: Extract Keyword Definitions
**Estimated Time:** 1 hour

**Changes:**
1. Create `src/syntax/keywords.rs` module
2. Define keyword list from lexer Token enum
3. Use in completion, code_actions, rename

**Benefits:**
- Single source of truth for keywords
- Easy to keep in sync

---

## Testing Recommendations

After fixes, add tests for:

1. **Hover on keywords in strings** - Should NOT show hover
   ```rust
   character Alice { name: "character" }  // hovering "character" in string
   ```

2. **Diagnostics with braces in strings** - Should NOT count
   ```rust
   character Alice { name: "{}" }  // should not report brace mismatch
   ```

3. **Lexer consistency tests** - Verify LSP uses same tokens as compiler
   ```rust
   #[test]
   fn test_hover_uses_lexer_keywords() {
       // Ensure hover keywords match lexer Token enum
   }
   ```

---

## Long-Term Recommendations

### 1. Establish Coding Standards
Document that LSP modules should:
- ✅ Use lexer for all tokenization
- ✅ Use AST for semantic analysis
- ❌ Never do manual string parsing
- ❌ Never hardcode keywords

### 2. Code Review Checklist
Add to PR reviews:
- [ ] Does LSP code use lexer for tokenization?
- [ ] Are there any hardcoded keyword strings?
- [ ] Is there any manual `.split()`, `.chars()`, `.find()` parsing?

### 3. Shared Utilities
Create `src/lsp/lexer_utils.rs` with:
- `find_token_at_position()` - Common utility
- `get_keyword_docs()` - Centralized keyword documentation
- `is_keyword()` - Token-based keyword checking

---

## Impact Assessment

### Current State Risks

**Behavior Tree Changes (Recent):**
- Lexer/parser updated with new keywords
- ❌ `hover.rs` still has old hardcoded list
- ⚠️ Users may not see hover for new keywords

**Future Keyword Changes:**
- Any new keywords require updates in 4 separate files
- Easy to miss one, causing inconsistent behavior

**Bug Examples:**
```rust
// diagnostics.rs bug:
character Alice { description: "Contains } brace" }
// ^ Reports false "unmatched brace" error

// hover.rs bug:
character Alice { /* hover on "character" keyword here */ }
// ^ Works

"In a character block..."  // hover on "character" in string
// ^ Also shows hover (WRONG - it's just a word in a string)
```

---

## Conclusion

**Overall Assessment:** NEEDS IMPROVEMENT

- **Compliance Rate:** 27% (3/11 modules)
- **Risk Level:** MEDIUM-HIGH
- **Recommended Action:** Refactor hover.rs and diagnostics.rs as priority

**Benefits of Fixing:**
1. Consistency with compiler behavior
2. Automatic sync with language changes
3. Reduced maintenance burden
4. Fewer bugs from edge cases

**Next Steps:**
1. Create GitHub issues for each Priority 1-2 fix
2. Refactor hover.rs (highest risk)
3. Refactor diagnostics.rs (high risk)
4. Extract keyword definitions (prevents future issues)
5. Add lexer usage to code review checklist

---

## Appendix: Files Analyzed

```
Total LSP modules audited: 11
✅ Compliant: 3 (27%)
⚠️ Partially compliant: 1 (9%)
❌ Non-compliant: 7 (64%)

Files examined:
- src/lsp/completion.rs
- src/lsp/hover.rs
- src/lsp/semantic_tokens.rs
- src/lsp/inlay_hints.rs
- src/lsp/definition.rs
- src/lsp/formatting.rs
- src/lsp/rename.rs
- src/lsp/references.rs
- src/lsp/code_actions.rs
- src/lsp/symbols.rs
- src/lsp/diagnostics.rs
```
release: Storybook v0.2.0 - Major syntax and features update BREAKING CHANGES: - Relationship syntax now requires blocks for all participants - Removed self/other perspective blocks from relationships - Replaced 'guard' keyword with 'if' for behavior tree decorators Language Features: - Add tree-sitter grammar with improved if/condition disambiguation - Add comprehensive tutorial and reference documentation - Add SBIR v0.2.0 binary format specification - Add resource linking system for behaviors and schedules - Add year-long schedule patterns (day, season, recurrence) - Add behavior tree enhancements (named nodes, decorators) Documentation: - Complete tutorial series (9 chapters) with baker family examples - Complete reference documentation for all language features - SBIR v0.2.0 specification with binary format details - Added locations and institutions documentation Examples: - Convert all examples to baker family scenario - Add comprehensive working examples Tooling: - Zed extension with LSP integration - Tree-sitter grammar for syntax highlighting - Build scripts and development tools Version Updates: - Main package: 0.1.0 → 0.2.0 - Tree-sitter grammar: 0.1.0 → 0.2.0 - Zed extension: 0.1.0 → 0.2.0 - Storybook editor: 0.1.0 → 0.2.0 2026-02-13 21:52:03 +00:00			`# LSP Lexer Usage Audit`
			`Date: 2026-02-12`
			`Auditor: LSP Test Engineer`
			`Purpose: Verify all LSP modules use the lexer exclusively (no ad-hoc parsing)`

			`---`

			`## Executive Summary`

			`Status: ⚠️ Mixed Compliance`

			`- 3 modules properly use lexer ✅`
			`- 8 modules use ad-hoc parsing ❌`
			`- Critical Risk: Hardcoded keyword lists in 3 modules`

			`Risk Level: MEDIUM`
			`- Inconsistent behavior between LSP and compiler`
			`- Vulnerability to keyword changes (like recent behavior tree updates)`
			`- Maintenance burden from duplicate logic`

			`---`

			`## Audit Results by Module`

			`### ✅ Compliant Modules (Using Lexer)`

			`#### 1. completion.rs - ✅ GOOD (with caveats)`
			`Status: Uses Lexer properly for tokenization`

			`Lexer Usage:`
			```rust
			`use crate::syntax::lexer::{Lexer, Token};`
			`let lexer = Lexer::new(before);`
			```

			`Found at lines: 135, 142, 269, 277, 410, 417`

			`Issue: ⚠️ Contains hardcoded keyword strings (8 occurrences)`
			`- Lines with hardcoded keywords: "character", "template", "behavior", etc.`
			`- Risk: Keywords could get out of sync with lexer`

			`Recommendation: Extract keywords from lexer Token enum instead of hardcoding`

			`---`

			`#### 2. semantic_tokens.rs - ✅ EXCELLENT`
			`Status: Uses Lexer exclusively, no manual parsing`

			`Lexer Usage:`
			```rust
			`use crate::syntax::lexer::{Lexer, Token};`
			```

			`Assessment: Best practice example - Uses lexer for all tokenization, no hardcoded keywords, no manual string manipulation`

			`---`

			`#### 3. code_actions.rs - ✅ GOOD (with caveats)`
			`Status: Uses Lexer for tokenization`

			`Issues:`
			- ⚠️ Contains 15 instances of manual string parsing (`.split()`, `.chars()`, `.lines()`)
			`- ⚠️ Has 4 hardcoded keyword strings`
			`- Concern: Mixed approach - uses lexer sometimes, manual parsing other times`

			`Recommendation: Refactor to use lexer consistently throughout`

			`---`

			`### ❌ Non-Compliant Modules (Ad-Hoc Parsing)`

			`#### 4. hover.rs - ❌ CRITICAL ISSUE`
			`Status: NO lexer usage, extensive manual parsing`

			`Problems:`

			`A. Manual Character-by-Character Parsing`
			```rust
			`// Line 51-77: extract_word_at_position()`
			`let chars: Vec<char> = line.chars().collect();`
			`// Custom word boundary detection`
			`while start > 0 && is_word_char(chars[start - 1]) {`
			`start -= 1;`
			`}`
			```

			`Risk: Custom tokenization logic differs from compiler's lexer`

			`B. Hardcoded Keyword List (Lines 20-39)`
			```rust
			`match word.as_str() {`
			`"character" => "character - Defines a character entity...",`
			`"template" => "template - Defines a reusable field template...",`
			`"life_arc" => "life_arc - Defines a state machine...",`
			`"schedule" => "schedule - Defines a daily schedule...",`
			`"behavior" => "behavior - Defines a behavior tree...",`
			`"institution" => "institution - Defines an organization...",`
			`"relationship" => "relationship - Defines a multi-party...",`
			`"location" => "location - Defines a place...",`
			`"species" => "species - Defines a species...",`
			`"enum" => "enum - Defines an enumeration...",`
			`"use" => "use - Imports declarations...",`
			`"from" => "from - Applies templates...",`
			`"include" => "include - Includes another template...",`
			`"state" => "state - Defines a state...",`
			`"on" => "on - Defines a transition...",`
			`"strict" => "strict - Enforces that a template...",`
			`_ => return None,`
			`}`
			```

			`Risk: HIGH`
			`- 16 hardcoded keywords that must stay in sync with lexer`
			`- If lexer adds/removes keywords, hover won't update automatically`
			`- Recent behavior tree changes could have broken this`

			`Recommendation:`
			```rust
			`// BEFORE (current):`
			`let word = extract_word_at_position(line_text, character)?;`

			`// AFTER (proposed):`
			`let lexer = Lexer::new(line_text);`
			`let token_at_pos = find_token_at_position(&lexer, character)?;`
			`match token_at_pos {`
			`Token::Character => "character - Defines...",`
			`Token::Template => "template - Defines...",`
			`// etc - using Token enum from lexer`
			`}`
			```

			`---`

			`#### 5. formatting.rs - ❌ MANUAL PARSING`
			`Status: NO lexer usage, line-by-line text processing`

			`Problems:`
			```rust
			`// Line 53: Manual line processing`
			`for line in text.lines() {`
			`let trimmed = line.trim();`
			`// Custom logic for braces, colons, etc.`
			`}`
			```

			`Risk: MEDIUM`
			`- Custom formatting logic may not respect language semantics`
			`- Could format code incorrectly if it doesn't understand context`

			`Recommendation: Use lexer to tokenize, then format based on tokens`
			`- Preserves semantic understanding`
			`- Respects string literals, comments, etc.`

			`---`

			`#### 6. diagnostics.rs - ❌ MANUAL PARSING`
			`Status: NO lexer usage, manual brace counting`

			`Problems:`
			```rust
			`// Lines 34-37: Manual brace counting`
			`for (line_num, line) in text.lines().enumerate() {`
			`let open_braces = line.chars().filter(\|&c\| c == '{').count();`
			`let close_braces = line.chars().filter(\|&c\| c == '}').count();`
			`}`

			`// Line 79: Character-by-character processing`
			`for ch in text.chars() {`
			`// Custom logic`
			`}`
			```

			`Risk: HIGH`
			- Brace counting doesn't account for braces in strings: `character Alice { name: "{" }`
			- Doesn't respect comments: `// This { is a comment`
			`- Could generate false diagnostics`

			`Recommendation: Use lexer tokens to track brace pairs accurately`

			`---`

			`#### 7. references.rs - ❌ MANUAL STRING PARSING`
			`Status: NO lexer usage`

			`Problems:`
			```rust
			`// Manual string parsing for word boundaries`
			```

			`Risk: MEDIUM`
			`- May not correctly identify symbol boundaries`
			`- Could match partial words`

			`Recommendation: Use lexer to identify identifiers`

			`---`

			`#### 8. rename.rs - ❌ HARDCODED KEYWORDS`
			`Status: NO lexer usage, contains hardcoded strings`

			`Problems:`
			`- 2 hardcoded keyword strings`
			`- Manual symbol identification`

			`Risk: MEDIUM`
			`- May incorrectly rename keywords or miss valid renames`

			`Recommendation: Use lexer to distinguish keywords from identifiers`

			`---`

			`#### 9. definition.rs - ⚠️ UNCLEAR`
			`Status: No obvious lexer usage, no obvious manual parsing`

			`Assessment: Likely uses AST-based approach (acceptable)`
			`- May rely on symbols extracted from AST`
			- Acceptable if using `Document.ast` for symbol lookup

			`Recommendation: Verify it's using AST, not string parsing`

			`---`

			`#### 10. inlay_hints.rs - ⚠️ UNCLEAR`
			`Status: No lexer usage detected, no obvious parsing`

			`Assessment: Minimal code inspection needed`
			`- May be stub or use AST-based approach`

			`Recommendation: Full inspection needed`

			`---`

			`#### 11. symbols.rs - ✅ LIKELY COMPLIANT`
			`Status: No manual parsing detected`

			`Assessment: Appears to extract symbols from AST`
			`- Acceptable approach - AST is the canonical representation`
			`- No need for lexer if working from AST`

			`---`

			`## Summary Table`

			`\| Module \| Lexer Usage \| Manual Parsing \| Hardcoded Keywords \| Risk Level \|`
			`\|--------\|-------------\|----------------\|-------------------\|------------\|`
			`\| completion.rs \| ✅ Yes \| ❌ No \| ⚠️ 8 keywords \| MEDIUM \|`
			`\| semantic_tokens.rs \| ✅ Yes \| ❌ No \| ✅ None \| LOW \|`
			`\| code_actions.rs \| ✅ Yes \| ⚠️ 15 instances \| ⚠️ 4 keywords \| MEDIUM \|`
			`\| hover.rs \| ❌ No \| ⚠️ Extensive \| ⚠️ 16 keywords \| HIGH \|`
			`\| formatting.rs \| ❌ No \| ⚠️ Line-by-line \| ✅ None \| MEDIUM \|`
			`\| diagnostics.rs \| ❌ No \| ⚠️ Char counting \| ✅ None \| HIGH \|`
			`\| references.rs \| ❌ No \| ⚠️ Yes \| ✅ None \| MEDIUM \|`
			`\| rename.rs \| ❌ No \| ⚠️ Yes \| ⚠️ 2 keywords \| MEDIUM \|`
			`\| definition.rs \| ⚠️ Unknown \| ⚠️ Unknown \| ✅ None \| LOW \|`
			`\| inlay_hints.rs \| ⚠️ Unknown \| ⚠️ Unknown \| ✅ None \| LOW \|`
			`\| symbols.rs \| N/A (AST) \| ❌ No \| ✅ None \| LOW \|`

			`---`

			`## Critical Findings`

			`### 1. hover.rs - Highest Risk`
			`- 16 hardcoded keywords`
			`- Custom tokenization logic`
			`- Impact: Recent behavior tree keyword changes may have broken hover`
			`- Fix Priority: HIGH`

			`### 2. diagnostics.rs - High Risk`
			`- Manual brace counting fails in strings/comments`
			`- Could generate false errors`
			`- Fix Priority: HIGH`

			`### 3. Hardcoded Keywords - Maintenance Burden`
			`- Total: 30 hardcoded keyword strings across 4 files`
			`- Risk: Keywords get out of sync with lexer`
			`- Recent Example: Behavior tree syntax changes could have broken these`

			`---`

			`## Recommended Fixes`

			`### Priority 1: hover.rs Refactoring`
			`Estimated Time: 2-3 hours`

			`Changes:`
			1. Add lexer import: `use crate::syntax::lexer::{Lexer, Token};`
			2. Replace `extract_word_at_position()` with lexer-based token finding
			`3. Replace keyword match with Token enum match`
			`4. Remove hardcoded keyword strings`

			`Benefits:`
			`- Automatic sync with lexer changes`
			`- Consistent tokenization with compiler`
			`- Easier maintenance`

			`Example Code:`
			```rust
			`pub fn get_hover_info(text: &str, line: usize, character: usize) -> Option<Hover> {`
			`let line_text = text.lines().nth(line)?;`
			`let lexer = Lexer::new(line_text);`

			`// Find token at character position`
			`let token_at_pos = find_token_at_position(&lexer, character)?;`

			`let content = match token_at_pos {`
			`Token::Character => "character - Defines a character entity...",`
			`Token::Template => "template - Defines a reusable field template...",`
			`// Use Token enum - stays in sync automatically`
			`};`

			`Some(Hover { /* ... */ })`
			`}`
			```

			`---`

			`### Priority 2: diagnostics.rs Refactoring`
			`Estimated Time: 1-2 hours`

			`Changes:`
			`1. Use lexer to tokenize text`
			`2. Count LBrace/RBrace tokens (not characters)`
			`3. Ignore braces in strings and comments automatically`

			`Benefits:`
			`- Correct handling of braces in all contexts`
			`- No false positives`

			`---`

			`### Priority 3: Extract Keyword Definitions`
			`Estimated Time: 1 hour`

			`Changes:`
			1. Create `src/syntax/keywords.rs` module
			`2. Define keyword list from lexer Token enum`
			`3. Use in completion, code_actions, rename`

			`Benefits:`
			`- Single source of truth for keywords`
			`- Easy to keep in sync`

			`---`

			`## Testing Recommendations`

			`After fixes, add tests for:`

			`1. Hover on keywords in strings - Should NOT show hover`
			```rust
			`character Alice { name: "character" } // hovering "character" in string`
			```

			`2. Diagnostics with braces in strings - Should NOT count`
			```rust
			`character Alice { name: "{}" } // should not report brace mismatch`
			```

			`3. Lexer consistency tests - Verify LSP uses same tokens as compiler`
			```rust
			`#[test]`
			`fn test_hover_uses_lexer_keywords() {`
			`// Ensure hover keywords match lexer Token enum`
			`}`
			```

			`---`

			`## Long-Term Recommendations`

			`### 1. Establish Coding Standards`
			`Document that LSP modules should:`
			`- ✅ Use lexer for all tokenization`
			`- ✅ Use AST for semantic analysis`
			`- ❌ Never do manual string parsing`
			`- ❌ Never hardcode keywords`

			`### 2. Code Review Checklist`
			`Add to PR reviews:`
			`- [ ] Does LSP code use lexer for tokenization?`
			`- [ ] Are there any hardcoded keyword strings?`
			- [ ] Is there any manual `.split()`, `.chars()`, `.find()` parsing?

			`### 3. Shared Utilities`
			Create `src/lsp/lexer_utils.rs` with:
			- `find_token_at_position()` - Common utility
			- `get_keyword_docs()` - Centralized keyword documentation
			- `is_keyword()` - Token-based keyword checking

			`---`

			`## Impact Assessment`

			`### Current State Risks`

			`Behavior Tree Changes (Recent):`
			`- Lexer/parser updated with new keywords`
			- ❌ `hover.rs` still has old hardcoded list
			`- ⚠️ Users may not see hover for new keywords`

			`Future Keyword Changes:`
			`- Any new keywords require updates in 4 separate files`
			`- Easy to miss one, causing inconsistent behavior`

			`Bug Examples:`
			```rust
			`// diagnostics.rs bug:`
			`character Alice { description: "Contains } brace" }`
			`// ^ Reports false "unmatched brace" error`

			`// hover.rs bug:`
			`character Alice { /* hover on "character" keyword here */ }`
			`// ^ Works`

			`"In a character block..." // hover on "character" in string`
			`// ^ Also shows hover (WRONG - it's just a word in a string)`
			```

			`---`

			`## Conclusion`

			`Overall Assessment: NEEDS IMPROVEMENT`

			`- Compliance Rate: 27% (3/11 modules)`
			`- Risk Level: MEDIUM-HIGH`
			`- Recommended Action: Refactor hover.rs and diagnostics.rs as priority`

			`Benefits of Fixing:`
			`1. Consistency with compiler behavior`
			`2. Automatic sync with language changes`
			`3. Reduced maintenance burden`
			`4. Fewer bugs from edge cases`

			`Next Steps:`
			`1. Create GitHub issues for each Priority 1-2 fix`
			`2. Refactor hover.rs (highest risk)`
			`3. Refactor diagnostics.rs (high risk)`
			`4. Extract keyword definitions (prevents future issues)`
			`5. Add lexer usage to code review checklist`

			`---`

			`## Appendix: Files Analyzed`

			```
			`Total LSP modules audited: 11`
			`✅ Compliant: 3 (27%)`
			`⚠️ Partially compliant: 1 (9%)`
			`❌ Non-compliant: 7 (64%)`

			`Files examined:`
			`- src/lsp/completion.rs`
			`- src/lsp/hover.rs`
			`- src/lsp/semantic_tokens.rs`
			`- src/lsp/inlay_hints.rs`
			`- src/lsp/definition.rs`
			`- src/lsp/formatting.rs`
			`- src/lsp/rename.rs`
			`- src/lsp/references.rs`
			`- src/lsp/code_actions.rs`
			`- src/lsp/symbols.rs`
			`- src/lsp/diagnostics.rs`
			```