Commit Graph

13 Commits

Author SHA1 Message Date
2eda81ef2e feat: gitea branch lifecycle tests
- gitea: create_branch, delete_branch, list verification on studio/sol
2026-03-24 16:05:10 +00:00
ef040aae38 feat: conversation registry compaction, reset, context hint tests
- needs_compaction: verify threshold triggers after token accumulation
- set_agent_id / get_agent_id: round-trip agent ID storage
- reset_all: verify all rooms cleared, SQLite and memory
- context_hint: verify conversation receives recent history context
  when creating new conversation with hint parameter
2026-03-24 15:43:50 +00:00
f338444087 feat: matrix room, room_info, research execute, bootstrap improvements
- bootstrap: create integration test room in Tuwunel, send bootstrap
  message, print room ID in summary
- room_info: list_rooms and get_room_members against live Tuwunel
- research: execute with empty tasks against real Matrix room + Mistral
- identity: fix flaky list_users_tool test (use search instead of
  unbounded list to avoid pagination)
2026-03-24 15:38:12 +00:00
b5c83b7c34 feat: gitea SDK extended tests + research type coverage
- gitea: list_org_repos, get_issue, list_notifications, list_orgs,
  get_org — exercises authed_get paths for more API endpoints
- research: empty tasks parse, multiple tasks parse, depth boundary
  edge cases
2026-03-24 15:15:56 +00:00
f1009ddda4 feat: deeper script sandbox and research type tests
- script tool: async operations, sol.fs.list, console.log/error/warn/
  info, return value capture, additional sandbox coverage
- research tool: tool_definition schema validation, depth boundary
  exhaustive testing, ResearchTask/ResearchResult roundtrips,
  output format verification
- matrix_utils: extract_image returns None for text messages
2026-03-24 15:04:39 +00:00
e59b55e6a9 feat: matrix, script, evaluator, and devtools integration tests
- matrix_utils: construct ruma events in tests, verify extract_body
  (text/notice/emote/unsupported), extract_reply_to, extract_thread_id,
  extract_edit, extract_image, make_reply_content, make_thread_reply
- script tool: full run_script against live Tuwunel + OpenSearch —
  basic math, TypeScript transpilation, filesystem sandbox read/write,
  error capture, output truncation, invalid args
- evaluator: DM/mention/silence short-circuits, LLM evaluation path
  with Mistral API, reply-to-human suppression
- agent registry: list/get_id, prompt reuse, prompt-change recreation
- devtools: tool dispatch for list_repos, get_repo, list_issues,
  get_file, list_branches, list_comments, list_orgs
- conversations: token tracking, multi-turn context recall, room
  isolation
2026-03-24 14:48:13 +00:00
5dc739b800 feat: integration test suite — 416 tests, 61% coverage
Add OpenBao and Kratos to docker-compose dev stack with bootstrap
seeding. Full integration tests hitting real services:

- Vault SDK: KV read/write/delete, re-auth on bad token, new_with_token
  constructor for dev mode
- Kratos SDK: list/get/create/disable/enable users, session listing
- Token store: PAT lifecycle with OpenBao backing, expiry handling
- Identity tools: full tool dispatch through Kratos admin API
- Gitea SDK: resolve_username, ensure_token (PAT auto-provisioning),
  list/get repos, issues, comments, branches, file content
- Devtools: tool dispatch for all gitea_* tools against live Gitea
- Archive indexer: batch flush, periodic flush task, edit/redact/reaction
  updates against OpenSearch
- Memory store: set/query/get_recent with user scoping in OpenSearch
- Room history: context retrieval by timestamp and event_id, access
  control enforcement
- Search archive: keyword search with room/sender filters, room scoping
- Code search: language filter, repo filter, branch scoping
- Breadcrumbs: symbol retrieval, empty index handling, token budget
- Bridge: full event lifecycle mapping, request ID filtering
- Evaluator: DM/mention/silence short-circuits, LLM evaluation path,
  reply-to-human suppression
- Agent registry: list/get_id, prompt reuse, prompt-change recreation
- Conversations: token tracking, multi-turn context recall, room
  isolation

Bug fixes caught by tests:
- AgentRegistry in-memory cache skipped hash comparison on prompt change
- KratosClient::set_state sent bare PUT without traits (400 error)
- find_code_session returns None on NULL conversation_id
2026-03-24 14:34:03 +00:00
4528739a5f feat: deterministic Gitea integration tests + mutation lifecycle
Bootstrap:
- Creates test issue + comment on studio/sol for deterministic test data
- Mirrors 6 real repos from src.sunbeam.pt

Devtools tests (13, all deterministic):
- Read: list_repos, get_repo, get_file, list_branches, list_issues,
  list_pulls, list_comments, list_notifications, list_org_repos,
  get_org, unknown_tool
- Mutation lifecycle: create_repo → create_issue → create_comment →
  create_branch → create_pull → get_pull → edit_issue →
  delete_branch → cleanup (all arg names verified against tool impls)

Additional tests:
- Script sandbox: basic math, string manipulation, JSON output
- Archive search: arg parsing, OpenSearch query
- Persistence: agent CRUD, service user CRUD
- gRPC bridge: event filtering, tool mapping
2026-03-24 12:45:01 +00:00
0efd3e32c3 feat: devtools + tool dispatch tests, search_code tool definition fix
Devtools integration tests (6 new, all via live Gitea):
- gitea_list_repos, get_repo, get_file, list_branches, list_issues, list_orgs
- Tests exercise the full Gitea SDK → tool handler → JSON response path

Tool dispatch tests (8 new unit tests):
- tool_definitions: base, gitea, kratos, all-enabled variants
- agent_tool_definitions conversion
- minimal registry creation
- unknown tool error handling
- search_code without OpenSearch error

search_code: added to tool_definitions() (was only in execute dispatch)
2026-03-24 12:06:39 +00:00
495c465a01 refactor: remove legacy responder + agent_ux, add Gitea integration tests
Legacy removal:
- DELETE src/brain/responder.rs (900 lines) — replaced by orchestrator
- DELETE src/agent_ux.rs (184 lines) — UX moved to transport bridges
- EXTRACT chat_blocking() to src/brain/chat.rs (standalone utility)
- sync.rs: uses ConversationRegistry directly (no responder)
- main.rs: holds ToolRegistry + Personality directly (no Responder wrapper)
- research.rs: progress updates via tracing (no AgentProgress)

Gitea integration testing:
- docker-compose: added Gitea service with healthcheck
- bootstrap-gitea.sh: creates admin, org, mirrors 6 real repos from
  src.sunbeam.pt (sol, cli, proxy, storybook, admin-ui, mistralai-client-rs)
- PAT provisioning for SDK testing without Vault
- code_index/gitea.rs: fixed directory listing (direct API calls instead
  of SDK's single-object parser), proper base64 file decoding

New integration tests:
- Gitea: list_repos, get_repo, get_file, directory listing, code indexing
- Web search: SearXNG query with result verification
- Conversation registry: lifecycle + send_message round-trip
- Evaluator: rule matching (DM, own message)
- gRPC bridge: event filtering, tool call mapping, thinking→status
2026-03-24 11:45:43 +00:00
ec55984fd8 feat: Phase 5 polish — conditional LSP tools, capabilities, sidecar hooks
- ToolSide enum: documented Sidecar future variant
- StartSession.capabilities: client reports LSP availability
- Client detects LSP binaries on PATH, sends ["lsp_rust", "lsp_typescript"]
- build_tool_definitions() conditionally registers LSP tools only when
  client has LSP capability — model won't hallucinate unavailable tools
- CodeSession stores capabilities, has_lsp(), has_capability() accessors
- git_branch() reads from git for breadcrumb scoping
- ToolRegistry.gitea_client() accessor for reindex endpoint
2026-03-24 09:54:14 +00:00
c213d74620 feat: code search tool + breadcrumb context injection + integration tests
search_code tool:
- Server-side tool querying sol_code OpenSearch index
- BM25 search across symbol_name, signature, docstring, content
- Branch-aware with boost for current branch, mainline fallback
- Registered in ToolRegistry execute dispatch

Breadcrumb injection:
- build_context_header() now async, injects adaptive breadcrumbs
- Hybrid search: _analyze → wildcard symbol matching → BM25
- Token budget enforcement (default outline + relevant expansion)
- Graceful degradation when OpenSearch unavailable

GrpcState:
- Added Option<OpenSearch> for breadcrumb retrieval
- code_index_name() accessor

Integration tests (6 new, 226 total):
- Index + search: bulk index symbols, verify BM25 retrieval
- Breadcrumb outline: aggregation query returns project structure
- Breadcrumb expansion: substantive query triggers relevant symbols
- Token budget: respects character limit
- Branch scoping: feat/code symbols preferred over mainline
- Branch deletion: cleanup removes branch symbols, mainline survives
2026-03-24 00:19:17 +00:00
40a6772f99 feat: 13 e2e integration tests against real Mistral API
Orchestrator tests:
- Simple chat roundtrip with token usage verification
- Event ordering (Started → Thinking → Done)
- Metadata pass-through (opaque bag appears in Started event)
- Token usage accuracy (longer prompts → more tokens)
- Conversation continuity (multi-turn recall)
- Client-side tool dispatch + mock result submission
- Failed tool result handling (is_error: true)
- Server-side tool execution (search_web via conversation)

gRPC tests:
- Full roundtrip (StartSession → UserInput → Status → TextDone)
- Client tool relay (ToolCall → ToolResult through gRPC stream)
- Token counts in TextDone (non-zero verification)
- Session resume (same room_id, resumed flag)
- Clean disconnect (EndSession → SessionEnd)

Infrastructure:
- ToolRegistry::new_minimal() — no OpenSearch/Matrix needed
- ToolRegistry fields now Option for testability
- GrpcState.matrix now Option
- grpc_bridge moved to src/grpc/bridge.rs
- TestHarness loads API key from .env
2026-03-23 20:54:28 +00:00