LibreChat/api/server/services/Endpoints/agents/initialize.spec.js
Danny Avila d83cb84f59 🪆 feat: Subagent configuration in Agent Builder (#12725)
* 🪆 feat: Subagents configuration (isolated-context child agents)

Surfaces the new @librechat/agents `SubagentConfig` primitive in the Agent
Builder. Subagents let a supervisor delegate a focused subtask to a child
graph running in an isolated context window: verbose tool output stays in
the child, only a filtered summary returns to the parent.

Data model: new `subagents: { enabled, allowSelf, agent_ids }` on Agent,
wired through the Zod, Mongoose, and form schemas plus a new
`AgentCapabilities.subagents` capability (enabled by default).

Backend: `initialize.js` loads explicit subagent configs alongside handoff
agents, and drops subagent-only references from the parallel/handoff maps
so they don't leak into the supervisor's graph. `run.ts` emits
`SubagentConfig[]` on the primary `AgentInputs` — a self-spawn entry when
`allowSelf` is enabled plus one entry per configured agent.

UI: an "Advanced" panel section with an enable toggle, a self-spawn
toggle, and an agent picker (capped at 10). Enabling without adding
agents still yields self-spawn; disabling self-spawn with no agents shows
a warning. A capability flag gates the whole section.

* 🪆 feat: Stream subagent progress to UI (dialog + inline ticker)

Pairs with the @librechat/agents SDK change that forwards child-graph
events through the parent's handler registry (danny-avila/agents#107):

- Self-spawn and explicit subagents can now use event-driven tools,
  because child `ON_TOOL_EXECUTE` dispatches reach our ToolService via
  the parent's registered handler.
- The same forwarding path wraps the child's run_step / run_step_delta
  / run_step_completed / message_delta / reasoning_delta dispatches in
  a new `ON_SUBAGENT_UPDATE` envelope, with start/stop/error bookends.

Backend: `callbacks.js` registers an `ON_SUBAGENT_UPDATE` handler that
forwards each envelope straight to the SSE stream.

Frontend:
- `useStepHandler` consumes `ON_SUBAGENT_UPDATE` events and merges them
  into a per-tool_call Recoil atom (`subagentProgressByToolCallId`).
  First-seen `subagentRunId` claims the most-recent unclaimed `subagent`
  tool call in the active response message — a temporal mapping, no SDK
  wire-format change needed to correlate child runs with parent tool
  calls.
- New `SubagentCall` part component replaces the default `ToolCall`
  rendering when `toolCall.name === Constants.SUBAGENT`: compact status
  ticker showing the 3 most recent update labels, clickable to open a
  dialog with the full activity log + final markdown-rendered result.
- Adds `Constants.SUBAGENT`, `StepEvents.ON_SUBAGENT_UPDATE`, and
  `SubagentUpdateEvent` type in data-provider.

Tests:
- `packages/api npx jest run-summarization` — 23 pass
- `api npx jest initialize` — 16 pass
- `npm run build` — clean

Dependency note: bumps `@librechat/agents` to `^3.1.67-dev.1` — requires
the SDK PR (danny-avila/agents#107) to be merged to dev and published
before this PR merges. `ON_SUBAGENT_UPDATE` is absent from dev.0, so the
handler registration would be a no-op with the older SDK but would not
crash.

* 🪆 fix: address Codex review and review audit on subagents

Stacks on top of the SDK change in danny-avila/agents#107 (bumped to
`^3.1.67-dev.2`).

- **P1 (`initialize.js`)**: subagent-only agents were being deleted from
  both `agentConfigs` AND `agentToolContexts`. The tool-execute handler
  resolves execution context (agent, tool_resources, skill ACLs) from
  `agentToolContexts`, so explicit subagents would run without their
  configured resources and skip action tools. Now only `agentConfigs`
  is pruned — tool context stays intact.
- **P2 (`AgentSubagents.tsx`)**: toggling subagents off set the form
  field to `undefined`; `removeNullishValues` stripped it from the
  PATCH, leaving the server copy enabled. Now it persists an explicit
  `{ enabled: false, ... }` so the update actually clears state.

- **Finding 1 (MAJOR)** — `agent_ids` Zod schema gains `.max()` via a
  new `MAX_SUBAGENTS` export from `data-provider` (shared with the UI
  cap). Crafted payloads can't trigger hundreds of `processAgent`
  calls.
- **Finding 2 (MAJOR)** — `subagentProgressByToolCallId` atomFamily
  atoms are now tracked in a ref and reset from `clearStepMaps` via a
  `useRecoilCallback({ reset })`. No monotonic growth across a session.
- **Finding 3 (MAJOR)** — early-arriving `ON_SUBAGENT_UPDATE` events
  whose parent `tool_call_id` is not yet mapped are now buffered in
  `pendingSubagentBuffer` (keyed by `subagentRunId`) and replayed in
  arrival order once correlation completes. Mirrors the existing
  `pendingDeltaBuffer` pattern.
- **Finding 4 (MAJOR)** — switched to deterministic correlation via
  the new `parentToolCallId` that SDK `3.1.67-dev.2` threads through
  from `ToolRunnableConfig.toolCall.id`. Temporal fallback now iterates
  oldest-unclaimed-first (forward), matching tool-call creation order,
  so concurrent spawns map correctly.
- **Finding 6 (MINOR)** — `agent_ids` are deduped on the backend via
  `new Set(...)` before the load loop. Duplicates no longer produce
  duplicate `SubagentConfig` entries visible to the LLM.
- **Finding 7 (MINOR)** — events array inside each Recoil atom is
  capped at 200 entries. Long-running subagents no longer replay O(n)
  spreads on every update; the dialog log still shows the cap window.
- **Finding 8 (MINOR)** — documented: subagents are loaded only for
  the primary agent this release (handoff children get self-spawn but
  not explicit sub-subagents). In-code comment added so the next
  maintainer doesn't wonder.
- **Finding 9 (NIT)** — removed `{!isSubmitting && null}` dead code
  and the misleading announce-polite comment in `SubagentCall`.

- New `validation.spec.ts` — 9 tests covering the cap on
  `agent_ids.length` at the subagent schema, agent-create, and
  agent-update layers.
- `run-summarization` — 23 pass, `initialize` — 16 pass, total backend
  package: 103 pass across touched areas.

Findings 5 (component tests) and 10 (micro-allocation) are tracked
but deferred; the former needs a Recoil-RenderHook harness that isn't
in this PR's scope, and the latter has negligible impact (one `Array.from`
per subagent run).

* 🧪 test: integration coverage for subagent correlation + backend loading

Addresses the follow-up audit on #12725 with real-code tests (no mock
handlers, only the existing setMessages/getMessages spies and the
standard mongodb-memory-server harness).

Six new tests under a dedicated `describe('subagent loading')`:
- loads a configured subagent, populates `subagentAgentConfigs`, keeps
  it out of `agentConfigs`
- **P1 regression guard**: drives the real `toolExecuteOptions.loadTools`
  closure with the subagent id and asserts `loadToolsForExecution` is
  called with `agent: <subagent>`, `tool_resources`, `actionsEnabled`.
  If anyone deletes `agentToolContexts` again, this fails.
- dedup: three copies of the same id load the agent once
- overlap: agent referenced both as handoff target and subagent stays in
  `agentConfigs`
- capability gate: admin disabling `subagents` suppresses loading even
  when the agent has a config
- per-agent disable: `subagents.enabled: false` skips loading entirely

Five new tests under `describe('on_subagent_update event')` using a
real `RecoilRoot` and a companion `useRecoilCallback` reader so writes
from the hook are observable:
- deterministic correlation via `parentToolCallId` (happy path with
  SDK dev.2+)
- fallback: oldest-unclaimed tool call wins for concurrent spawns
  without `parentToolCallId`
- early-arrival buffer: updates with no mapping get buffered and
  replayed once the tool call appears
- event cap: 205 updates collapse to 200 retained, oldest dropped
- `clearStepMaps` resets tracked atoms back to their null default

- F2 — added explicit `// TODO` marker for handoff-subagent-loading
  extension (matches the comment that referenced it).
- F3 — dropped the unnecessary `MAX_SUBAGENTS as MAX_SUBAGENTS_CAP`
  alias; just import the constant directly.
- Bumped `@librechat/agents` to `^3.1.67-dev.3` to pick up the SDK's
  paired test additions.

- `api/server/services/Endpoints/agents/initialize.spec.js` — 22 pass
  (6 new + 16 existing)
- `packages/api/src/agents/validation.spec.ts` +
  `run-summarization.test.ts` — 103 pass
- `client/src/hooks/SSE/__tests__/useStepHandler.spec.ts` — 48 pass
  (5 new + 43 existing)

* 🪆 fix: strip parent run summary + discovered tools from subagent inputs

Codex P1 on #12725: `buildSubagentConfigs` reused the shared
`buildAgentInput` factory for each explicit child, and that factory
always stamps the parent run's `initialSummary` (cross-run conversation
summary) and `discoveredTools` (tool names the parent's LLM searched
earlier) onto every `AgentInputs` it returns. When subagents were
enabled on a conversation that had already been summarized, every
child inherited that summary — silently defeating the isolated-context
contract and burning extra tokens on unrelated prior chat.

Fix in `run.ts`: after `buildAgentInput(child)`, explicitly clear
`childInputs.initialSummary` and `childInputs.discoveredTools` before
attaching to the `SubagentConfig`. The parent keeps both — that's how
the supervisor receives cross-turn context — but the child starts
fresh.

Paired with danny-avila/agents#107 (bumped to `^3.1.67-dev.4`), which
adds the equivalent strip inside `buildChildInputs` to cover the
self-spawn path where the SDK clones parent `_sourceInputs` directly
and LibreChat never sees the intermediate shape. Belt and suspenders.

Regression test (new):
- `does NOT leak the parent run initialSummary into an explicit child
  (Codex P1 regression)` — sets `initialSummary` on the run, enables
  subagents with an explicit child, asserts the parent still has the
  summary but `childConfig.agentInputs.initialSummary` is `undefined`.
  Same for `discoveredTools`. 24 pass.

* 🪆 fix: capability gate applies to handoff agents + parallel subagent test

### Codex P2 — handoff agents kept `subagents` after capability disabled
The endpoint-level `AgentCapabilities.subagents` gate only cleared
`subagents` on `primaryConfig`. Handoff agents loaded into
`agentConfigs` retained their persisted `subagents.enabled: true`,
and because `run.ts` calls `buildSubagentConfigs` for every agent
input, self-spawn would still fire on a handoff target even when the
admin had disabled the capability globally.

Fix in `initialize.js`: after the subagent loading block, when the
capability is off, iterate `agentConfigs.values()` and clear
`subagents` + `subagentAgentConfigs` on every loaded config.

Regression test: `clears subagents on handoff agents too when
capability is disabled (Codex P2 regression)` — seeds a handoff target
with its own `subagents.enabled: true`, disables the capability at
the endpoint, asserts both primary AND handoff have `subagents`
undefined in the client args. 23 init tests pass.

### Parallel subagent correlation — user-requested verification
Added `keeps parallel subagent streams independent when events
interleave` to `useStepHandler.spec.ts`. Two `subagent` tool calls
seeded side by side, 6 interleaved `ON_SUBAGENT_UPDATE` envelopes
dispatched (a-start, b-start, a-step, b-step, a-stop, b-step), each
carrying its own `parentToolCallId`. Asserts each `tool_call_id`'s
Recoil bucket accumulates only its own run's events, statuses reflect
each run independently (`call_a` → stop, `call_b` → run_step), no
cross-contamination. 49 step-handler tests pass.

* 🪆 fix: SubagentCall detects cancelled / errored states (Codex P2)

Codex P2 on #12725: the old `running` check only consulted
`initialProgress` and the subagent's phase. A user stop, dropped
stream, or backend crash before a terminal `stop`/`error` envelope
arrived would leave the ticker permanently stuck on "working…". Other
*Call components (ToolCall.tsx) already model this via
`!isSubmitting && !finished` → cancelled.

Mirror that pattern. Re-introduce `isSubmitting` on `SubagentCallProps`
(the prop was dropped earlier as 'unused' — that was a bug) and resolve
status as a tri-state:

- `finished`  — initialProgress >= 1, or subagent `stop`/`error`
- `cancelled` — `!isSubmitting && !finished`
- `running`   — neither

New locale keys `com_ui_subagent_cancelled` + `com_ui_subagent_errored`
swap in the right header text per state.

Tests: new `SubagentCall.test.tsx` covers all four states with a real
`RecoilRoot` and a `useRecoilCallback` seeder — no mocked store — 5/5
pass. Includes an explicit P2 regression test that simulates the
`isSubmitting=false, progress.status='run_step', initialProgress<1`
scenario and asserts the cancelled label renders.

* 🪆 feat: semantic ticker + aggregated content-part dialog for subagents

Two rounds of feedback on #12725:

### Ticker — user-readable lines, not raw event names

The old ticker showed \`on_run_step\`, \`on_message_delta\`, etc. — not
meaningful to users. Replaced with \`buildSubagentTickerLines\`, a pure
helper that walks the \`SubagentUpdateEvent\` stream and emits:

- message/reasoning deltas → a single live "Writing: <last 60 chars>"
  (or "Reasoning: …") line that updates in place as chunks arrive
- run_step with tool_calls → "Using calculator(expression=42*58)" for
  a single call, "Using tool: a, b" for parallel (args dropped when
  multiple so the line stays short)
- run_step_completed → "calculator → 42*58 = 2436" (output truncated
  to 48 chars; falls back to "Tool X complete" when output is empty)
- error → "Error: <message>"
- start / stop / run_step_delta → suppressed (too granular / lifecycle-only)

Args and output pass through \`summarizeArgs\` / \`summarizeOutput\`
which flatten JSON to \`key=value\` pairs and head-truncate long
strings so a 200-line tool output never bloats the ticker.

### Dialog — aggregated content parts via leaf renderers

\`aggregateSubagentContent\` folds the raw event stream into
\`TMessageContentParts[]\` — text/reasoning delta streaks collapse into
single \`TEXT\` / \`THINK\` parts, tool calls become \`TOOL_CALL\` parts,
and \`run_step\` boundaries correctly break text runs around tool
calls. The dialog iterates those parts through a \`SubagentDialogPart\`
renderer that delegates to the existing \`Text\`, \`Reasoning\`, and
\`ToolCall\` leaf components — the same sub-components \`<Part />\` uses
— wrapped in a minimal \`MessageContext\` so reasoning expand state and
cursor animation work.

Leaf components are used directly rather than importing \`<Part />\`
itself to avoid a module cycle (Part → Parts/index → SubagentCall →
Part) and to sidestep a hypothetical nested-subagent rendering.

### Tests

- \`subagentContent.test.ts\` — 19 pure-function tests covering the
  aggregator (text concat, reasoning concat, tool call lifecycle,
  interleaving, phase suppression, late-arriving completions) and the
  ticker builder (live preview truncation, args/output snippets,
  parallel-call handling, output truncation, i18n formatter override).
- \`SubagentCall.test.tsx\` — 9 component tests: 5 status-resolution
  (existing) + 2 ticker (semantic text, delta collapse) + 2 dialog
  (aggregated parts routed to leaf renderers, raw-output fallback).

### Locale keys

New: \`com_ui_subagent_ticker_writing\`, \`…_reasoning\`, \`…_error\`,
\`…_using\`, \`…_using_with_args\`, \`…_tool_complete\`,
\`…_tool_output\`. Preserves i18n at the display layer while the
helper stays pure.

* chore: drop unused com_ui_subagent_activity_log locale key

The dialog no longer renders an "Activity log" section — the new
content-parts renderer replaced it. Also tweaks the dialog description
copy to match.

* 🪆 fix: subagent dialog order, persistence, auto-scroll, width

Follow-up pass addressing the four issues observed in real runs
against a live subagent-using parent.

### Aggregator ordering (reasoning appearing after text it preceded)

Reproducible pattern: LLM emits reasoning → text → tool call in that
order, but the dialog rendered text BEFORE reasoning in the content
array. Root cause: `aggregateSubagentContent` maintained `currentText`
and `currentThink` buffers in parallel and only flushed them at a
`run_step` boundary in a fixed (text, think) order, losing the actual
arrival order.

Fix: when a text chunk arrives, close any open think buffer first
(pushes it into the content array right then); symmetric for think →
text. Two new regression tests cover the exact reasoning → text →
tool_call sequence from the screenshot and the repeated
reasoning ↔ text flow across a turn.

### Content persists after completion (markdown not rendering when done)

`clearStepMaps` was calling `resetSubagentAtoms()` at stream end,
which wiped every `subagentProgressByToolCallId` entry. Once reset,
`contentParts.length === 0` and the dialog fell back to rendering the
raw `output` string with plain text — hence the literal `##`/`**` in
the completed-state screenshot. Stopped resetting; the atoms are
bounded per-call (200-event cap) and per-conversation (one per
subagent spawn) so growth matches the rest of the conversation state.
`resetSubagentAtoms` is kept for a future conversation-switch caller.

Also: routed the raw-`output` fallback (older subagent runs recorded
before the event forwarder existed) through the same
`SubagentDialogPart` → `Text` leaf that content parts use, so its
markdown renders the same way.

### Auto-scroll to bottom while running

Added a `scrollRef` on the dialog body and a `useEffect` that pins
`scrollTop = scrollHeight` while the dialog is open AND the subagent
is running. Triggers on `contentParts.length` (new tool calls / part
boundaries) and `events.length` (intra-part deltas) so the cursor
tracks text streaming. Disabled post-completion so re-opening a
finished run doesn't yank to the bottom.

### Wider dialog

Went from `max-w-2xl` (42rem / 672px — too cramped on maximized
laptop windows) to `w-[min(95vw,64rem)] max-w-[min(95vw,64rem)]`.
Narrow on phones, scales up to 64rem on desktop, always leaves a bit
of margin from the viewport edge. Bumped `max-h-[65vh]` on the scroll
area to give the extra width room to breathe vertically too.

### Tests

- `subagentContent.test.ts` — 21 pass (2 new ordering regressions).
- `useStepHandler.spec.ts` — 49 pass (1 updated to assert atoms are
  *preserved* on clearStepMaps).
- `SubagentCall.test.tsx` — 9 pass (unchanged; aggregator-level tests
  cover the ordering).

* 🪆 feat: persist subagent_content via SDK createContentAggregator

Per-request map of createContentAggregator instances keyed by the
parent's tool_call_id. ON_SUBAGENT_UPDATE handler feeds each event
into the matching aggregator (phase → GraphEvent mapping); AgentClient
harvests contentParts onto the subagent tool_call at message save so
the child's reasoning / tool calls / final text survive a page refresh.

Reusing the SDK's battle-tested aggregator instead of a bespoke one
keeps the persisted shape identical to the parent graph's output and
drops ~100 lines of custom aggregation code.

* 🪆 fix: incremental subagent aggregation + dialog render parity

**Disappearing tool_calls**: the Recoil atom trimmed events to a 200-long
rolling window, so verbose subagents could shed the `run_step` that
originally created a tool_call part — rebuilding content from the trimmed
window then produced only the surviving text/reasoning. Fix: fold each
envelope into `contentParts` incrementally in the atom as it arrives
(new `foldSubagentEvent` + cursor state). Event trim window now affects
only the ticker, never the dialog.

**Render parity**: dialog now applies `groupSequentialToolCalls` and
renders single parts through `Container` + grouped batches through
`ToolCallGroup` — same spacing and "Used N tools" collapsing the main
message view uses.

**Width**: `min(96vw, 80rem)` — wider on big screens, still responsive.

**Labels**: "Subagent: X" is jargon. Named subagents render as
`Running "{name}" agent` / `Ran "{name}" agent` (past tense on
completion); self-spawns use `Running subtask` / `Ran subtask` since
`Running "self" agent` reads badly.

* 🪆 polish: subagent dialog parity + agent avatar in header

**Labels**: drop "subtask" framing. Self-spawn shows `Running agent` /
`Ran agent` (past tense on completion); named subagents stay
`Running "X" agent` / `Ran "X" agent`.

**Dialog render parity**: stop wrapping every part in `Container`.
TEXT keeps its `Container` (gap-3 + `mt-5` sibling margin), THINK and
TOOL_CALL render bare so their own wrappers set the full-column width
the regular message view gives them — matches the main `<Part>` dispatch.
Outer scroll region now uses `px-4 py-3` padding and a
`max-w-full flex-grow flex-col gap-0` inner wrapper, mirroring the
`MessageParts` container the main conversation uses.

**Avatar**: header icon now renders the subagent's configured avatar
via `MessageIcon` when `useAgentsMapContext()` has the child agent,
falling back to the `Users` SVG (which keeps its running-state pulse).
Same icon-left-of-label pattern the tool UI uses.

* 🪆 polish: subagent group label, ticker throttle + tail-ellipsis, scroll button

**Grouped label**: ToolCallGroup now detects all-subagent batches and
labels them "Running N agents" / "Ran N agents" instead of "Used N
tools". Mixed batches keep the existing label. The tool-name summary
is suppressed for all-subagent groups (every entry dedupes to
"subagent", which adds nothing).

**Ticker width + tail-ellipsis**: raise the preview cap to 300 chars so
wide containers aren't half-empty, and flip the ticker `<li>` to
`dir="rtl"` so `text-overflow: ellipsis` clips the *oldest* characters
(visually the left edge) — the newest tokens stay pinned to the right
regardless of container width. Bidi lays out the Latin text LTR
internally, the rtl only affects which side gets the ellipsis.

**Throttle**: `useThrottledValue` hook (trailing-edge, 1.2s) smooths
the live `Writing: …` preview so tokens no longer strobe past the
eye faster than they can be read. Ref-based internals (not `useState`)
avoid infinite-update loops when the upstream value is a new-reference
each render; `NEGATIVE_INFINITY` sentinel ensures the very first value
passes through synchronously so tests and first paint aren't delayed.

**Scroll-to-bottom**: dialog tracks `isAtBottom` with a 120px
threshold; auto-scroll only engages when the user is already following
along, and a persistent jump-to-latest button appears whenever they
scroll up — no more fighting the auto-scroll to read back.

* 🪆 polish: snappier ticker, prefix-safe labels, agents icon, readable lines

**Ticker lines are now incrementally aggregated in the atom** — same
pattern as contentParts. The raw-events rolling window is gone; event
volume no longer caps what the ticker can display. Verbose subagents
that used to drop early tool_call lines out of the window now keep the
full 3-line history (using_tool, tool_complete, writing).

**Discriminated-union ticker lines** split a constant prefix (e.g.
"Writing:") from a tail-truncatable body. The prefix lives in a
`shrink-0` span so it never gets clipped when the body overflows; the
body uses `dir="rtl"` only on itself — scoped so non-streaming lines
(e.g. "Waiting for first update…") can't get their trailing ellipsis
flipped by bidi.

**Content-aware throttle**: 800ms interval (down from 1200ms), skipped
entirely while the live buffer is below 120 chars. Early tokens now
appear immediately — no more "Reasoning: I" sitting blank for a full
second before the next heartbeat. Once the preview is long enough to
fill the container, throttling kicks in at the tighter interval.

**Header label** is now a constant verb + optional muted sub-label.
Base reads "Running agent" / "Ran agent" / "Cancelled agent" / "Agent
errored" for every subagent; named subagents get the configured agent
name rendered to the right in secondary text (self-spawns and
unresolved names omit it — "Running self agent" is nonsense).

**ToolCallGroup** now detects `allSubagents` and swaps `StackedToolIcons`
for a single `Users` glyph — otherwise the group header shows a wrench
("tool") icon next to "Ran 5 agents", which reads wrong.

* 🪆 feat: delimiter-aware tool labels in ticker + full-width tool lines

New shared `parseToolName` helper in `client/src/utils/toolLabels.ts`
— single source of truth for splitting `<tool>_mcp_<server>` ids and
mapping native tool names (web_search, execute_code, …) to their
friendly translation keys. `ToolCallGroup` drops its inline copy and
pulls from this helper.

Ticker tool lines now use the shared parser + a new `ToolIdentifier`
sub-renderer so the live log reads like the main tool UI:

  - MCP tool  → `<server> · <code-badge:tool>` (e.g. "github · `search_code`")
  - Native   → friendly name from `TOOL_FRIENDLY_NAME_KEYS`
  - Unknown  → bare `<code>` badge of the raw id

The `using_tool` / `tool_complete` rows now render with a
`flex w-full items-baseline gap-1 overflow-hidden` layout matching
the writing/reasoning rows — they take the full container width
instead of collapsing to content size. Output snippets on
`tool_complete` get the same tail-side `dir="rtl"` ellipsis so the
newest characters stay flush-right when the container is narrow.

Dropped the now-unused template i18n keys
(`com_ui_subagent_ticker_using_with_args`,
`com_ui_subagent_ticker_tool_complete`,
`com_ui_subagent_ticker_tool_output`) in favor of tokens the JSX
composes structurally. Only English is touched per the project rule;
other locales follow externally.

* 🪆 fix: dialog scroll button + auto-scroll during streaming deltas

Two race/trigger bugs in the dialog's scroll behavior:

**Button never showed**: `addEventListener('scroll', …)` in a
`useEffect` ran before Radix's portal had actually committed the
scroll container, so `scrollRef.current` was still null — the listener
never attached, `isAtBottom` stayed stuck at its initial `true`, and
the jump-to-latest button was never rendered. Swap to React's
`onScroll` prop on the element itself so the handler wires up as part
of DOM commit, not a post-commit effect.

**Auto-scroll stalled during text streaming**: the pin-to-bottom
effect only re-fired on `contentParts.length` changes. Message/reasoning
deltas extend the last TEXT/THINK part's `.text` without changing the
array length — so the view would drift up as tokens piled in and
never catch back up. Replace the length-dep effect with a
`ResizeObserver` on the inner content div; every height change (new
part or in-place growth) triggers a scroll-pin when the user is still
at the bottom.

* 🪆 fix: drop leading ellipsis from ticker body

truncatePreview was prepending ... to the tail when the buffer
exceeded 300 chars. The component's CSS already produces a left-side
ellipsis for overflow via dir=rtl + text-overflow: ellipsis — stacking
a data-level ellipsis on top renders a stray dot character right after
the Writing: / Reasoning: label (Writing: .Sure!), which looks like a
typo to the reader.

Data now returns just the last 300 chars when truncating; CSS handles
the visual cue whenever the body actually overflows its container.

* 🪆 fix: Codex review — subagent isolation + concurrent-safe throttle

Three findings from the @codex review pass, all valid:

**P1 — buildAgentInput leaks parent discovered-tool state into subagent
children.** `buildAgentInput` mutates `agent.toolRegistry`
(`overrideDeferLoadingForDiscoveredTools` flips `defer_loading:true→false`
on tools the parent previously searched for) and appends those tools'
definitions to the returned `toolDefinitions` before the function
returns. `buildSubagentConfigs` was clearing the reported
`initialSummary` / `discoveredTools` fields on the returned
AgentInputs, but that happened post-return — the registry writes and
extra tool definitions persisted on the child, silently defeating
context isolation and inflating the child's prompt.

Fix: `buildAgentInput` now takes an `isSubagent` flag that gates the
registry-mutation block and omits `initialSummary` /
`discoveredTools` at the source. `buildSubagentConfigs` passes
`{ isSubagent: true }` for every explicit child; no post-hoc cleanup
needed.

**P2 — ToolCallGroup labels a finished subagent group as still
running when the child returned no output.** `getToolMeta` computed
`hasOutput` as `!!tc.output`, which is `false` for a completed
subagent that returned empty text (the UI already has an "empty
result" fallback for that case). `allCompleted` would stay `false`
and the group header stuck on "Running N agents" forever.

Fix: treat `tc.progress === 1` as completion too — progress is the
authoritative lifecycle signal, output is just content.

**P2 — useThrottledValue schedules `setTimeout` during render.**
Discarded renders under Strict Mode / Concurrent rendering would
leave orphan timers firing against stale trees.

Fix: move `setTimeout` into a `useEffect` keyed on `[value, intervalMs,
enabled]`. Render-time still mutates refs (idempotent), but timer
scheduling lives post-commit. Cleanup on unmount and on passthrough
transitions is preserved.

* 🪆 fix: Codex P2 — wipe subagent atoms on conversation switch

`clearStepMaps()` intentionally doesn't reset `subagentProgressByToolCallId`
so a user can reopen a completed subagent's dialog mid-conversation,
but `resetSubagentAtoms` was defined and never exposed / called —
so each completed run's aggregated `contentParts` + `tickerState`
stayed resident in the `atomFamily` for the whole app session.
Unbounded growth across multi-conversation sessions.

Expose `resetSubagentAtoms` from `useStepHandler` and fire it from
`useEventHandlers` whenever the URL's `conversationId` changes.
That's the correct cleanup boundary: historical subagent dialogs
rehydrate from persisted `subagent_content` on each `tool_call` at
message-save time, so wiping live atoms on switch doesn't lose any
viewable history — it just releases per-tool-call state that the
old conversation's components no longer subscribe to.

* 🪆 fix: Codex round 3 — subagent registry isolation + post-run label

Two more valid findings.

**P1 — parent-order registry mutation leaks into subagent inputs.**
`overrideDeferLoadingForDiscoveredTools` mutates `agent.toolRegistry`
in place (the Map *and* the LCTool objects inside it). When an agent
appears both as a handoff target (normal graph node) AND an explicit
subagent child, a subagent build that ran before the parent's build
captures a reference to the same registry — the parent's later mutation
leaks through to the child.

Fix: for subagent children (`isSubagent`), clone the `toolRegistry`
Map and shallow-clone each LCTool inside before returning the inputs.
`defer_loading` flips on parent-graph registry mutations can't
propagate across the clone boundary. `toolDefinitions` also gets a
shallow-copy pass so the same isolation holds for definitions the
child carries directly.

**P2 — "Running N agents" label stuck after cancel/error.**
ToolCallGroup's all-subagent label was gated only on `allCompleted`,
which requires every child to have `hasOutput || progress === 1`.
A subagent that gets cancelled (stream ends, no `stop` phase, no
output) never satisfies that — so even after `isSubmitting` flips
false, the header stays on "Running N agents" while each individual
card correctly shows "Cancelled agent".

Fix: derive a `subagentsDone` flag as `allCompleted || !isSubmitting`
and gate the past-tense label on that. Matches the tri-state each
SubagentCall card already resolves (finished / cancelled / running).

* 🪆 fix: Codex P2 — ACL-check subagents.agent_ids on create/update

Codex flagged that `subagents.agent_ids` was accepted as arbitrary
strings on the create/update routes while `edges` got a
`validateEdgeAgentAccess` pass — so users could save subagent
references to agents they can't VIEW. At runtime `initializeClient`'s
`processAgent` ACL gate silently drops those, so the persisted
configuration and the actual behavior diverged in a way that is
difficult to diagnose.

Refactor: extract the id-set → unauthorized-ids check into a shared
`collectUnauthorizedAgentIds`, wrap it with a dedicated
`validateSubagentAccess`, and plumb the same 403-on-failure response
the edge path already returns. Applied on both POST /agents and
PATCH /agents/:id.

* 🪆 fix: Codex round 5 — ACL-disable escape hatch + ticker order

Two valid findings.

**P1 — can't disable subagents after losing access to a child.**
The subagent ACL check ran on every create/update that echoed back
the `agent_ids` list, even when the user was explicitly disabling
the feature. The UI keeps the list intact when toggling `enabled:
false`, so a user who subsequently lost VIEW on any child would be
locked in a 403 loop — every edit (including the one that turns
subagents off) bounces.

Fix: gate the ACL check on `subagents.enabled !== false` at both
the POST /agents and PATCH /agents/:id handlers. Empty list stays
a no-op. Disabling the feature is always permitted.

**P2 — ticker fold merges out-of-order previews across delta
switches.** `foldSubagentEventIntoTicker` carried `textLineIdx` /
`thinkLineIdx` across a reasoning → text → reasoning transition,
so the second reasoning chunk appended to the original reasoning
line instead of starting a new chronological one.

Fix: close the opposite buffer + cursor when a delta-type switch
is detected (same rule the content-parts reducer already applies).
Added a regression test.

* 🪆 fix: Codex round 6 — preserve mid-stream atoms + honor sequential suppression

Two valid findings.

**P2 — atom reset fires on initial chat URL assignment.**
`useEventHandlers` initialized `lastConversationIdRef` from the URL's
current `paramId`, then reset subagent atoms whenever the ref and
`paramId` disagreed. For a brand-new conversation the URL stamp goes
from `undefined → "abc123"` while the first response is still
streaming, which used to drop subagent ticker/content state mid-run
and leave dialogs missing earlier updates.

Fix: only reset when *both* the old and new IDs are non-null and
differ — i.e. a user-initiated switch between two established
conversations. The initial assignment passes through without
clearing.

**P2 — ON_SUBAGENT_UPDATE bypassed `hide_sequential_outputs`.**
Every other streaming handler in `callbacks.js`
(`ON_RUN_STEP`, `ON_MESSAGE_DELTA`, etc.) gates emission on
`checkIfLastAgent` + `metadata?.hide_sequential_outputs`, but the
subagent forwarder did an unconditional `emitEvent` — so intermediate
agents in a sequential chain were leaking their children's activity
to the client even when the chain was configured to suppress
intermediates.

Fix: accept `metadata` and apply the same `isLastAgent ||
!hide_sequential_outputs` gate. Aggregation still runs regardless of
visibility (persistence + dialog depend on it); only the SSE forward
is suppressed.

* 🪆 fix: Codex P2 — gate subagent ACL check on endpoint capability

`validateSubagentAccess` ran on every create/update where
`subagents.enabled !== false`, regardless of the endpoint-level
`subagents` capability. When the capability is off at the appConfig
level, `initializeClient` already strips the `subagents` block at
runtime — so persisted `agent_ids` are inert — but the validation
could still 403 on a legacy record whose referenced child is no
longer viewable, blocking unrelated edits.

Fix: add `isSubagentsCapabilityEnabled(req)` that reads the agents
endpoint's capabilities from `req.config` and gate both the create
and update ACL checks on it. Capability-off environments can update
agents with stale `subagents` data freely; capability-on keeps the
full ACL protection.

* 🪆 fix: Codex P2 — reset subagent atoms on id→null navigation too

Previous guard (both-established) skipped the reset whenever
`paramId` became null/undefined, so navigating from an existing
chat to a "new chat" route left stale subagent progress resident
in the `atomFamily` until the user picked a specific different
chat.

Swap the both-established check for a one-time flag: skip only the
very first `undefined → id` transition (the brand-new-chat URL
stamp that happens mid-stream), then reset on any subsequent
change — id→id, id→null, null→id-after-reset. If the user started
on an established chat the flag is true at mount, so the guard is
a no-op and every navigation resets normally.

* 🪆 fix: Codex round 9 — subagent persistence gate + handoff children

Two valid findings.

**P1 — hide_sequential_outputs also gates persistence.**
The previous fix gated the SSE forward on
`isLastAgent || !hide_sequential_outputs` but still ran the
per-tool-call `createContentAggregator` aggregation unconditionally.
`finalizeSubagentContent` would then attach the hidden intermediate
agent's child reasoning / tool output to the saved message, so a
page refresh could reveal activity that was intentionally suppressed
live. Move the visibility gate to the top of the handler — hidden
agents now skip both aggregation and emission, so
"hide_sequential_outputs" is a consistent "don't record" rule for
subagent traces.

**P2 — handoff agents' explicit subagents were silently dropped.**
`initializeClient` only resolved `subagentAgentConfigs` for the
primary config, so an agent used via handoff that had its own
`subagents.agent_ids` saved in the builder would get self-spawn
only; every explicit child was quietly ignored, creating a
saved-config / runtime mismatch the user couldn't diagnose.

Extract the resolution into a shared `loadSubagentsFor(config)`
helper and invoke it for the primary and every handoff agent in
`agentConfigs`. The `edgeAgentIds` precomputation stays outside
the helper (it's loop-invariant). Capability-off shortcuts return
empty early so the existing strip-on-capability-off path still
holds.

* 🪆 fix: Codex P2 — recursive subagent build for multi-level delegation

Previously only the outer `agents[]` loop attached `subagentConfigs`
to its inputs, so a child used as a subagent (invoked via the
`subagent` tool) lost every explicit spawn target of its own. A
user-valid configuration like A → B → C would only run the top
layer; B could never actually delegate to C from inside A's run.

Recursively build `subagentConfigs` for each child inside
`buildSubagentConfigs`, passing the child's freshly-constructed
`childInputs` down so its own `subagents.enabled` children get
resolved too. Added cycle protection via an `ancestors` Set — a
configuration like A → B → A is safely cut off at the second
encounter of A rather than recursing forever (the existing
`child.id === agent.id` guard already prevents the direct self-loop).

* 🪆 fix: Codex P2 — reset subagent atoms on useEventHandlers unmount

The effect that resets subagent atoms only fired on `paramId` change,
so unmounting the chat container (route change away from /c) never
flushed the atoms. `knownSubagentAtomKeys` lives in a ref inside
`useStepHandler` — once the hook unmounts the ref is gone, so a
subsequent remount can't clean atoms it never registered.

Added a second `useEffect` that only runs cleanup on unmount (empty
deps aside from the stable `resetSubagentAtoms` callback). Keeps
`atomFamily` bounded across full route teardowns too.

* 🪆 fix: Codex round 13 — cyclic subagent guard + prefer persisted

Two valid findings.

**P1 — cyclic subagent ref reloads the primary.** A configuration
like `A ↔ B` (B lists A as its own subagent) would send
`loadSubagentsFor` down a path that couldn't find A in
`agentConfigs` (the primary isn't stored there), so it called
`processAgent(A)` a second time. That inserts a fresh config for
the primary id, which downstream duplicates in
`[primary, ...agentConfigs.values()]` and can replace the primary's
tool context with the reloaded copy.

Fix: short-circuit when a subagent ref points back at
`primaryConfig.id` — reuse the already-loaded primary config.
Primary is always an edge id so no pruning bookkeeping needed.

**P2 — live atom preferred over canonical persisted trace.** The
dialog picked `progress.contentParts` ahead of
`persistedContent`, but the Recoil bucket is best-effort — after
a disconnect/reconnect it can be stale or partial. The server's
`subagent_content` on the `tool_call` is the canonical record
refreshed on sync. Preferring live could hide completed
tool/reasoning history that was actually persisted.

Fix: flip the preference order. Persisted wins when it's
non-empty; live covers the mid-stream window (before the parent
message saves, persisted is empty) and the older-runs fallback.
Updated the test that enforced the old order to lock the new
semantics in (separate mid-stream live-fallback assertion kept).

* 🪆 fix: Codex P2 — subagent atom reset rule simplified to 'leaving established id'

The `hasEstablishedConversationRef` + check for initial undefined→id
covered the first navigation but missed the equivalent mid-stream
URL stamp when a user goes from an existing chat to a new chat and
sends a message there (`id → null → newId`). The null → newId
transition was still hitting the reset branch and wiping the
in-flight subagent ticker/content for that first turn.

Simpler rule: only reset when the PREVIOUS paramId is an established
id. Every transition AWAY from an established chat clears (id→id2,
id→null, id→undefined); every transition FROM null/undefined passes
through (initial mount, new-chat URL stamp mid-stream). Drop the
`hasEstablishedConversationRef` machinery in favor of that single
condition.

* 🪆 fix: Codex P2 — match runtime's strict subagent enable check in ACL

Runtime (`initializeClient` + `run.ts`) treats `subagents?.enabled`
as a truthy predicate — `undefined`, `null`, missing, and `false`
all short-circuit. The ACL gate was using `!== false` which
accepted `undefined` / missing as "enabled" and could 403 a payload
whose subagent tool would be inert at runtime.

Swap both create and update to `enabled === true`. Only a strictly-
enabled payload triggers the ACL check; the disable path (`false`)
still passes through so a user who lost VIEW on a child can still
save the disable edit.

* 🪆 fix: Codex P2 — reject missing subagent references with 400

`validateSubagentAccess` collapsed through `collectUnauthorizedAgentIds`,
which returns an empty list for ids with no DB record — so typos and
references to deleted agents passed validation silently, and
`initializeClient` later dropped them at runtime. Saved config would
then list spawn targets that the backend never honored, a hard-to-
diagnose drift.

Refactor the helper into `classifyAgentReferences(ids, …)` which
returns `{ missing, unauthorized }` separately. `validateEdgeAgentAccess`
keeps its old semantics (missing is intentional — a self-referential
`from` names the agent being created). `validateSubagentReferences`
surfaces both buckets so the create/update handlers can 400 on
missing and 403 on unauthorized with distinct error messages and
`agent_ids` lists.

* 🪆 polish: tighten subagent dialog grid gap to gap-2

OGDialogContent's grid default is `gap-4`, which renders the title,
description, and scroll area as three visually separated panels.
Drop to `gap-2` so they read as one block.

* 🪆 polish: swap Subagents above Handoffs in Advanced panel

Subagents is the more common knob users reach for, so show it first.
Handoffs keep the same Controller wiring, just move below.
2026-04-25 04:02:01 -04:00

590 lines
18 KiB
JavaScript

const mongoose = require('mongoose');
const {
ResourceType,
PermissionBits,
PrincipalType,
PrincipalModel,
} = require('librechat-data-provider');
const { MongoMemoryServer } = require('mongodb-memory-server');
const mockInitializeAgent = jest.fn();
const mockValidateAgentModel = jest.fn();
jest.mock('@librechat/agents', () => ({
...jest.requireActual('@librechat/agents'),
createContentAggregator: jest.fn(() => ({
contentParts: [],
aggregateContent: jest.fn(),
})),
}));
jest.mock('@librechat/api', () => ({
...jest.requireActual('@librechat/api'),
initializeAgent: (...args) => mockInitializeAgent(...args),
validateAgentModel: (...args) => mockValidateAgentModel(...args),
GenerationJobManager: { setCollectedUsage: jest.fn() },
getCustomEndpointConfig: jest.fn(),
createSequentialChainEdges: jest.fn(),
}));
/** Captured by the `getDefaultHandlers` mock so tests can drive the
* `ON_TOOL_EXECUTE` pipeline with a real subagent id and observe whether
* the tool context (agent, tool_resources, skill ACLs) was preserved. */
let capturedToolExecuteOptions;
jest.mock('~/server/controllers/agents/callbacks', () => ({
createToolEndCallback: jest.fn(() => jest.fn()),
getDefaultHandlers: jest.fn((opts) => {
capturedToolExecuteOptions = opts?.toolExecuteOptions;
return {};
}),
}));
const mockLoadToolsForExecution = jest.fn();
jest.mock('~/server/services/ToolService', () => ({
loadAgentTools: jest.fn(),
loadToolsForExecution: (...args) => mockLoadToolsForExecution(...args),
}));
jest.mock('~/server/controllers/ModelController', () => ({
getModelsConfig: jest.fn().mockResolvedValue({}),
}));
let agentClientArgs;
jest.mock('~/server/controllers/agents/client', () => {
return jest.fn().mockImplementation((args) => {
agentClientArgs = args;
return {};
});
});
jest.mock('./addedConvo', () => ({
processAddedConvo: jest.fn().mockResolvedValue({ userMCPAuthMap: undefined }),
}));
jest.mock('~/cache', () => ({
logViolation: jest.fn(),
}));
const { initializeClient } = require('./initialize');
const { User, AclEntry } = require('~/db/models');
const { createAgent } = require('~/models');
const PRIMARY_ID = 'agent_primary';
const TARGET_ID = 'agent_target';
const AUTHORIZED_ID = 'agent_authorized';
describe('initializeClient — processAgent ACL gate', () => {
let mongoServer;
let testUser;
beforeAll(async () => {
mongoServer = await MongoMemoryServer.create();
await mongoose.connect(mongoServer.getUri());
});
afterAll(async () => {
await mongoose.disconnect();
await mongoServer.stop();
});
beforeEach(async () => {
await mongoose.connection.dropDatabase();
jest.clearAllMocks();
agentClientArgs = undefined;
testUser = await User.create({
email: 'test@example.com',
name: 'Test User',
username: 'testuser',
role: 'USER',
});
mockValidateAgentModel.mockResolvedValue({ isValid: true });
});
const makeReq = () => ({
user: { id: testUser._id.toString(), role: 'USER' },
body: { conversationId: 'conv_1', files: [] },
config: { endpoints: {} },
_resumableStreamId: null,
});
const makeEndpointOption = () => ({
agent: Promise.resolve({
id: PRIMARY_ID,
name: 'Primary',
provider: 'openai',
model: 'gpt-4',
tools: [],
}),
model_parameters: { model: 'gpt-4' },
endpoint: 'agents',
});
const makePrimaryConfig = (edges) => ({
id: PRIMARY_ID,
endpoint: 'agents',
edges,
toolDefinitions: [],
toolRegistry: new Map(),
userMCPAuthMap: null,
tool_resources: {},
resendFiles: true,
maxContextTokens: 4096,
});
it('should skip handoff agent and filter its edge when user lacks VIEW access', async () => {
await createAgent({
id: TARGET_ID,
name: 'Target Agent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: [],
});
const edges = [{ from: PRIMARY_ID, to: TARGET_ID, edgeType: 'handoff' }];
mockInitializeAgent.mockResolvedValue(makePrimaryConfig(edges));
await initializeClient({
req: makeReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(mockInitializeAgent).toHaveBeenCalledTimes(1);
expect(agentClientArgs.agent.edges).toEqual([]);
});
it('should initialize handoff agent and keep its edge when user has VIEW access', async () => {
const authorizedAgent = await createAgent({
id: AUTHORIZED_ID,
name: 'Authorized Agent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: [],
});
await AclEntry.create({
principalType: PrincipalType.USER,
principalId: testUser._id,
principalModel: PrincipalModel.USER,
resourceType: ResourceType.AGENT,
resourceId: authorizedAgent._id,
permBits: PermissionBits.VIEW,
grantedBy: testUser._id,
});
const edges = [{ from: PRIMARY_ID, to: AUTHORIZED_ID, edgeType: 'handoff' }];
const handoffConfig = {
id: AUTHORIZED_ID,
edges: [],
toolDefinitions: [],
toolRegistry: new Map(),
userMCPAuthMap: null,
tool_resources: {},
};
let callCount = 0;
mockInitializeAgent.mockImplementation(() => {
callCount++;
return callCount === 1
? Promise.resolve(makePrimaryConfig(edges))
: Promise.resolve(handoffConfig);
});
await initializeClient({
req: makeReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(mockInitializeAgent).toHaveBeenCalledTimes(2);
expect(agentClientArgs.agent.edges).toHaveLength(1);
expect(agentClientArgs.agent.edges[0].to).toBe(AUTHORIZED_ID);
});
});
describe('initializeClient — subagent loading', () => {
const SUBAGENT_ID = 'agent_subagent_1';
const DUPLICATE_SUBAGENT_ID = 'agent_subagent_dup';
const HANDOFF_AND_SUB_ID = 'agent_handoff_and_sub';
let mongoServer;
let testUser;
beforeAll(async () => {
mongoServer = await MongoMemoryServer.create();
await mongoose.connect(mongoServer.getUri());
});
afterAll(async () => {
await mongoose.disconnect();
await mongoServer.stop();
});
beforeEach(async () => {
await mongoose.connection.dropDatabase();
jest.clearAllMocks();
agentClientArgs = undefined;
capturedToolExecuteOptions = undefined;
mockLoadToolsForExecution.mockReset();
mockLoadToolsForExecution.mockResolvedValue({ loadedTools: [] });
testUser = await User.create({
email: 'subagent@example.com',
name: 'Subagent User',
username: 'subuser',
role: 'USER',
});
mockValidateAgentModel.mockResolvedValue({ isValid: true });
});
/** Grant the test user VIEW on an agent so processAgent loads it. */
const grantView = async (agentDoc) => {
await AclEntry.create({
principalType: PrincipalType.USER,
principalId: testUser._id,
principalModel: PrincipalModel.USER,
resourceType: ResourceType.AGENT,
resourceId: agentDoc._id,
permBits: PermissionBits.VIEW,
grantedBy: testUser._id,
});
};
/** Build a request with the `subagents` capability enabled. */
const makeSubagentReq = () => ({
user: { id: testUser._id.toString(), role: 'USER' },
body: { conversationId: 'conv_sub', files: [] },
config: {
endpoints: {
agents: {
capabilities: ['subagents'],
},
},
},
_resumableStreamId: null,
});
const makeEndpointOption = () => ({
agent: Promise.resolve({
id: PRIMARY_ID,
name: 'Primary',
provider: 'openai',
model: 'gpt-4',
tools: [],
}),
model_parameters: { model: 'gpt-4' },
endpoint: 'agents',
});
const makePrimaryConfig = ({ edges = [], subagents, agent_ids }) => ({
id: PRIMARY_ID,
endpoint: 'agents',
edges,
toolDefinitions: [],
toolRegistry: new Map(),
userMCPAuthMap: null,
tool_resources: {},
resendFiles: true,
maxContextTokens: 4096,
subagents,
agent_ids,
});
const makeSubagentConfig = (id) => ({
id,
endpoint: 'agents',
edges: [],
toolDefinitions: [{ name: 'web', description: 'web', parameters: {} }],
toolRegistry: new Map([['web', { name: 'web' }]]),
userMCPAuthMap: null,
tool_resources: { file_search: { file_ids: ['file_1'] } },
accessibleSkillIds: ['skill_1'],
actionsEnabled: true,
resendFiles: false,
maxContextTokens: 4096,
});
it('loads a configured subagent, populates `subagentAgentConfigs`, and keeps it out of `agentConfigs`', async () => {
const subAgent = await createAgent({
id: SUBAGENT_ID,
name: 'Explicit Subagent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: ['web'],
});
await grantView(subAgent);
const primaryConfig = makePrimaryConfig({
subagents: { enabled: true, allowSelf: true, agent_ids: [SUBAGENT_ID] },
});
const subagentConfig = makeSubagentConfig(SUBAGENT_ID);
let call = 0;
mockInitializeAgent.mockImplementation(() =>
Promise.resolve(++call === 1 ? primaryConfig : subagentConfig),
);
await initializeClient({
req: makeSubagentReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(mockInitializeAgent).toHaveBeenCalledTimes(2);
/** The subagent's AgentConfig is attached to the primary for run.ts to
* turn into `SubagentConfig[]` on the parent's `AgentInputs`. */
expect(agentClientArgs.agent.subagentAgentConfigs).toHaveLength(1);
expect(agentClientArgs.agent.subagentAgentConfigs[0].id).toBe(SUBAGENT_ID);
/** Subagent-only agents must NOT appear in `agentConfigs` — otherwise the
* graph would treat them as a parallel/handoff node. */
expect(agentClientArgs.agentConfigs).toBeDefined();
expect(agentClientArgs.agentConfigs.has(SUBAGENT_ID)).toBe(false);
});
it('preserves subagent tool context for ON_TOOL_EXECUTE (Codex P1 regression guard)', async () => {
/** Verifies the Codex P1 fix: `agentToolContexts.delete(subagentId)` is
* NOT called for subagent-only agents, so when the child dispatches
* `ON_TOOL_EXECUTE` the parent can still resolve its tool context
* (agent, tool_resources, skill ACLs, actionsEnabled) to run tools
* with the right scope. We drive the real `loadTools` closure that
* `initializeClient` wires into `toolExecuteOptions`. */
const subAgent = await createAgent({
id: SUBAGENT_ID,
name: 'Explicit Subagent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: ['web'],
});
await grantView(subAgent);
const primaryConfig = makePrimaryConfig({
subagents: { enabled: true, allowSelf: false, agent_ids: [SUBAGENT_ID] },
});
const subagentConfig = makeSubagentConfig(SUBAGENT_ID);
let call = 0;
mockInitializeAgent.mockImplementation(() =>
Promise.resolve(++call === 1 ? primaryConfig : subagentConfig),
);
await initializeClient({
req: makeSubagentReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(capturedToolExecuteOptions?.loadTools).toBeInstanceOf(Function);
/** Invoke the real closure with the subagent's id. If `agentToolContexts`
* had been deleted (as in the pre-fix code), this would call
* `loadToolsForExecution` with `agent: undefined` — actions/resource-
* scoped tools would silently drop. */
await capturedToolExecuteOptions.loadTools(['web'], SUBAGENT_ID);
expect(mockLoadToolsForExecution).toHaveBeenCalledTimes(1);
const arg = mockLoadToolsForExecution.mock.calls[0][0];
expect(arg.agent).toBeDefined();
expect(arg.agent.id).toBe(SUBAGENT_ID);
expect(arg.toolRegistry).toBeInstanceOf(Map);
expect(arg.tool_resources).toEqual({ file_search: { file_ids: ['file_1'] } });
expect(arg.actionsEnabled).toBe(true);
});
it('deduplicates repeated ids in subagents.agent_ids', async () => {
const subAgent = await createAgent({
id: DUPLICATE_SUBAGENT_ID,
name: 'Dup Subagent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: [],
});
await grantView(subAgent);
const primaryConfig = makePrimaryConfig({
subagents: {
enabled: true,
allowSelf: false,
/** Same id three times — the backend must not load the agent
* repeatedly and must not emit three SubagentConfig entries. */
agent_ids: [DUPLICATE_SUBAGENT_ID, DUPLICATE_SUBAGENT_ID, DUPLICATE_SUBAGENT_ID],
},
});
const subagentConfig = makeSubagentConfig(DUPLICATE_SUBAGENT_ID);
let initCalls = 0;
mockInitializeAgent.mockImplementation(() => {
initCalls += 1;
return Promise.resolve(initCalls === 1 ? primaryConfig : subagentConfig);
});
await initializeClient({
req: makeSubagentReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
/** One call for primary, one for the subagent — not four. */
expect(mockInitializeAgent).toHaveBeenCalledTimes(2);
expect(agentClientArgs.agent.subagentAgentConfigs).toHaveLength(1);
});
it('keeps an agent in `agentConfigs` when it is BOTH a handoff target and a subagent', async () => {
/** Overlap case: the same child is used both via handoff edges (needs to
* be in agentConfigs) and as a subagent (needs to be in
* subagentAgentConfigs, and its tool context preserved). The pipeline
* shouldn't silently drop it from the handoff map. */
const shared = await createAgent({
id: HANDOFF_AND_SUB_ID,
name: 'Shared Agent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: [],
});
await grantView(shared);
const edges = [{ from: PRIMARY_ID, to: HANDOFF_AND_SUB_ID, edgeType: 'handoff' }];
const primaryConfig = makePrimaryConfig({
edges,
subagents: {
enabled: true,
allowSelf: false,
agent_ids: [HANDOFF_AND_SUB_ID],
},
});
const sharedConfig = makeSubagentConfig(HANDOFF_AND_SUB_ID);
let call = 0;
mockInitializeAgent.mockImplementation(() =>
Promise.resolve(++call === 1 ? primaryConfig : sharedConfig),
);
await initializeClient({
req: makeSubagentReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(agentClientArgs.agent.subagentAgentConfigs).toHaveLength(1);
/** Shared agent must stay in agentConfigs — it's still the handoff target. */
expect(agentClientArgs.agentConfigs.has(HANDOFF_AND_SUB_ID)).toBe(true);
});
it('clears subagents config on primary when the capability is disabled', async () => {
/** Admin can turn subagents off at the endpoint level even if an agent was
* configured for them. The primary's `subagents` field should be
* suppressed so run.ts never builds a SubagentConfig. */
const primaryConfig = makePrimaryConfig({
subagents: { enabled: true, allowSelf: true, agent_ids: [] },
});
mockInitializeAgent.mockResolvedValue(primaryConfig);
const req = makeSubagentReq();
/** Remove the capability from the admin config. */
req.config.endpoints.agents.capabilities = [];
await initializeClient({
req,
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
expect(agentClientArgs.agent.subagents).toBeUndefined();
expect(agentClientArgs.agent.subagentAgentConfigs).toEqual([]);
});
it('clears subagents on handoff agents too when capability is disabled (Codex P2 regression)', async () => {
/** Codex P2: the capability gate must suppress `subagents` on EVERY
* loaded config, not just the primary. `run.ts` iterates all agents
* and calls `buildSubagentConfigs` per agent, so a handoff target
* with `subagents.enabled: true` persisted on its document would
* otherwise still expose self-spawn even when the admin has disabled
* the capability globally. */
const authorized = await createAgent({
id: AUTHORIZED_ID,
name: 'Handoff Agent',
provider: 'openai',
model: 'gpt-4',
author: new mongoose.Types.ObjectId(),
tools: [],
});
await grantView(authorized);
const edges = [{ from: PRIMARY_ID, to: AUTHORIZED_ID, edgeType: 'handoff' }];
const primaryConfig = makePrimaryConfig({
edges,
subagents: { enabled: true, allowSelf: true, agent_ids: [] },
});
const handoffConfig = {
id: AUTHORIZED_ID,
endpoint: 'agents',
edges: [],
toolDefinitions: [],
toolRegistry: new Map(),
userMCPAuthMap: null,
tool_resources: {},
/** Handoff agent document has subagents enabled of its own accord. */
subagents: { enabled: true, allowSelf: true, agent_ids: [] },
};
let callCount = 0;
mockInitializeAgent.mockImplementation(() =>
Promise.resolve(++callCount === 1 ? primaryConfig : handoffConfig),
);
const req = makeSubagentReq();
/** Capability OFF at endpoint level. */
req.config.endpoints.agents.capabilities = [];
await initializeClient({
req,
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
/** Primary cleared. */
expect(agentClientArgs.agent.subagents).toBeUndefined();
/** Handoff target must ALSO be cleared — otherwise its self-spawn
* would still fire when run.ts iterates agentConfigs. */
const handoffLoaded = agentClientArgs.agentConfigs.get(AUTHORIZED_ID);
expect(handoffLoaded).toBeDefined();
expect(handoffLoaded.subagents).toBeUndefined();
expect(handoffLoaded.subagentAgentConfigs).toBeUndefined();
});
it('skips subagent loading entirely when the feature is disabled on the agent', async () => {
const primaryConfig = makePrimaryConfig({
subagents: { enabled: false, allowSelf: true, agent_ids: [SUBAGENT_ID] },
});
mockInitializeAgent.mockResolvedValue(primaryConfig);
await initializeClient({
req: makeSubagentReq(),
res: {},
signal: new AbortController().signal,
endpointOption: makeEndpointOption(),
});
/** Only one initializeAgent call — for the primary. No subagent loaded. */
expect(mockInitializeAgent).toHaveBeenCalledTimes(1);
expect(agentClientArgs.agent.subagentAgentConfigs).toEqual([]);
});
});