LibreChat/api
Danny Avila 93c4ef4ba8
🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960)
* 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache

Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces
the magic string `${session_id}/${file_id}?entity_id=...` with a typed,
discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi
#1455 and agents #148; no legacy aliases retained.

## Final shape

```ts
type CodeEnvRef =
  | { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number }
  | { kind: 'agent'; id: string; storage_session_id: string; file_id: string }
  | { kind: 'user';  id: string; storage_session_id: string; file_id: string };
```

`kind` drives codeapi's sessionKey: `<tenant>:<kind>:<id>[✌️<version>]`
for shared kinds, `<tenant>:user:<userId>` for user-private (auth context
provides `userId`). `version` is statically required for `kind: 'skill'`
and forbidden otherwise via discriminated union — constraint holds at
compile time on every consumer, not just codeapi's runtime validator.

`id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational
only for `'user'` (codeapi resolves user identity from auth context).

## What changed

- `packages/data-provider/src/codeEnvRef.ts` — discriminated union +
  `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union
  locked together.
- Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums
  tightened to `['skill', 'agent', 'user']`.
- `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`,
  `version: skill.version`. Cache-hit path reads `codeEnvRef`
  directly. Bumping `skill.version` on edit naturally invalidates
  the prior cache entry under the new sessionKey.
- `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output
  bucket is always user-scoped, regardless of which skill the
  execution invoked. New regression test pins the asymmetry.
- `primeFiles` reupload preserves `kind`/`id`/`version?` from the
  existing ref so a skill-cache-miss reupload doesn't silently demote
  to user bucket.
- `crud.js` upload functions (`uploadCodeEnvFile` /
  `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the
  multipart form (codeapi #1455 option α). Without these on the wire,
  codeapi falls back to user bucketing and skill-cache invalidation
  never fires. Client-side validation mirrors codeapi's validator.
- `Files/process.js` — chat attachments use `kind: 'user'`; agent
  setup files use `kind: 'agent'`.
- Drops `entity_id` everywhere (struct, schema sub-docs, write paths,
  upload form fields). Drops `'system'` from the kind enum (no emitter
  ever existed).

## Test plan

- [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` — 4 / 4
- [x] `cd packages/data-schemas && npx jest` — 1447 / 1447
- [x] `cd packages/api && npx jest src/agents` — 81 / 81 in skillFiles +
  handlers + resources
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  436 / 436
- [x] `cd api && npx jest server/services/Files/Code` — 98 / 98 (incl.
  new "outputs are user-scoped regardless of which skill the execution
  invoked" regression and "reupload forwards kind/id/version from
  existing ref")
- [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json
  && npx tsc --noEmit -p packages/api/tsconfig.json` — clean (only
  pre-existing unrelated dev errors in storage/balance, untouched here)

## Deploy notes

- **24h cache-miss burst** on first deploy. Inputs (skill caches re-prime
  under new sessionKey shape) and outputs (any pre-Phase C skill-output
  cached files become unreadable). Bounded by codeapi's 24h TTL.
- **Lockstep with codeapi #1455 and agents #148.** Either repo can land
  first since no aliases to drain, but the three deploys must overlap
  within the same maintenance window.
- **`@librechat/agents` bump to `3.1.79-dev.0`** required after agents
  #148 lands and is published.

## What this enables

Auth bridge work (JWT-based tenant/user identity between LC and codeapi)
— codeapi now derives sessionKey purely from `req.codeApiAuthContext.{
tenantId, userId}`, so the next chapter is replacing the header-asserted
user identity with a verified-claim path.

* 🩹 fix: persist execute_code uploads under codeEnvRef metadata key

Codex review P1 (chatgpt-codex-connector). `Files/process.js` was
storing the upload result under `metadata.fileIdentifier` even though:
- `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`,
  not the legacy magic string.
- The post-cutover schema (`File.metadata.codeEnvRef`) only declares
  `codeEnvRef` — mongoose strict mode silently strips unknown keys.
- All readers (`primeFiles`, `getCodeFilesByIds`,
  `categorizeFileForToolResources`, controller filtering) check
  `metadata.codeEnvRef`.

Net effect of the bug: chat-attached and agent-setup execute_code files
would lose their sandbox reference on save, and primeFiles would skip
them on subsequent code-execution turns — the file blob would still be
available locally but never re-mounted in the sandbox.

Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id,
file_id }`) at the write site and persist under `metadata.codeEnvRef`.
`BaseClient`'s "is this a code-env file" presence check accepts the new
shape alongside the legacy `fileIdentifier` for back-compat with any
pre-cutover records still in the database. Mirrors the same change in
`processAttachments.spec.ts` (which re-implements the BaseClient logic
for testability).

New regression tests in `process.spec.js` cover three cases:
- chat attachments (`messageAttachment=true`) → `kind: 'user'`
- agent setup (`messageAttachment=false`) → `kind: 'agent'`
- legacy `fileIdentifier` key is NOT persisted (would be schema-stripped)

* 🩹 fix: read storage_session_id on primed file refs (Codex P1)

Codex review (chatgpt-codex-connector). After Phase B's per-file
`session_id` → `storage_session_id` rename, `primeFiles` emits the
new field — but `seedCodeFilesIntoSessions` was still reading
`files[0].session_id` for the representative session and `f.session_id`
for the dedupe key. In runs with only primed attachments (no skill
seed), `representativeSessionId` was `undefined`, the function
returned the unchanged map, and `seedCodeFilesIntoSessions` silently
dropped the entire batch. The first `execute_code` call then started
without `_injected_files` and the agent couldn't see prior-turn
artifacts.

Fix:
- `codeFilesSession.ts`: read `f.storage_session_id` for both the
  dedupe key and the representative session id. JSDoc updated to
  match the new field name.
- `callbacks.js`: the two output-file persistence paths read
  `file.session_id` to pass to `processCodeOutput` — switch to
  `file.storage_session_id`. The original comment explicitly says
  this should be the STORAGE session, which is exactly the field
  Phase B renamed.
- `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id`
  and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape.

Lockstep coordination: this matches the post-bump shape of
`@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned
3.1.78 are expected and resolve when the dep bumps in this PR before
merge.

* 📦 chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files

* 🪪 fix: thread kind/id/version through codeapi /download URLs (Phase C α)

Symmetric fix for the upload-side wire change in 537725a. Codeapi's
`sessionAuth` middleware now requires `kind`/`id`/`version?` on every
download/freshness URL — without them it 400s with "kind must be one
of: skill, agent, user" before serving the file.

Three sites construct codeapi-side URLs that go through `sessionAuth`:

- `processCodeOutput` (`Files/Code/process.js`): `/download/<sess>/<id>`
  for freshly-generated sandbox outputs. Always `kind: 'user'` +
  `id: req.user.id` — code-output files are always user-private,
  regardless of which skill the run invoked.
- `getSessionInfo` (`Files/Code/process.js`): `/sessions/<sess>/objects/<id>`
  for the 23h freshness check. Pulls kind/id/version straight off the
  `codeEnvRef` already in scope — skill files stay skill-bucketed,
  user files stay user-bucketed.
- `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`):
  proxies to codeapi for manual downloads. Code-output files only on
  this route, so `kind: 'user'` + `id: req.user.id`.

The `getCodeOutputDownloadStream` helper in `crud.js` now takes an
`identity` param, validated by a `buildCodeEnvDownloadQuery` helper
that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required
from the closed `{skill, agent, user}` set, version required for
'skill' and forbidden otherwise. Bad callers fail fast on the client
instead of round-tripping a 400.

Also cleans up two log-noise sources reported alongside the 400:

- `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping
  `error.response.data` raw. With `responseType: 'arraybuffer'` that's
  a `Buffer` (~4 chars per byte after JSON-serialization); with
  `responseType: 'stream'` it's a `Readable` whose internal state
  serializes the entire ring buffer + socket. New `renderResponseData`
  decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams
  as `'[stream]'`. Diagnostics stay useful, log lines stop being
  megabytes.
- `/code/download` route's catch was bare `logger.error('...', error)`,
  bypassing the redactor. Switched to `logAxiosError` so it benefits
  from the same buffer/stream handling.

Tests updated to match the new contract:
- crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`;
  new cases cover skill identity (with version), bad kind rejection,
  skill-without-version rejection.
- process.spec: `getSessionInfo` test passes a full `codeEnvRef` object.

* ♻️ refactor: extract codeEnv identity helpers into packages/api

Per the project convention that new backend code lives in TypeScript
under `packages/api`, moves `appendCodeEnvFileIdentity` and
`buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js`
into a new `packages/api/src/files/code/identity.ts` module.

Both helpers are pure validators that mirror codeapi's
`parseUploadSessionKeyInput` server-side rules (closed kind set,
`version` required for `'skill'` and forbidden otherwise) — they
deserve TS support and a dedicated spec rather than living as
JSDoc-typed helpers in the legacy `/api` workspace. The new module:

- Exports a `CodeEnvIdentity` interface using the
  `librechat-data-provider` `CodeEnvKind` discriminated union.
- Adds 13 unit tests in `identity.spec.ts` covering the validation
  matrix (skill+version, agent, user, and every rejection path) plus
  URL encoding for the download query.
- Re-exported from `packages/api/src/files/code/index.ts` alongside
  `classify`, `extract`, and `form`.

Consumer updates:
- `api/server/services/Files/Code/crud.js`: drops the local helpers
  and imports them from `@librechat/api`. Net -64 lines.
- `api/server/services/Files/Code/process.js`: same.
- Test mocks for `@librechat/api` in three spec files now stub the
  helpers' validation behavior locally rather than pulling them
  through `requireActual` (which would drag in provider-config
  init-time side effects). The package's `exports` field only
  surfaces the root barrel, so leaf imports aren't reachable from
  legacy `/api` test setup.

No runtime behavior change. Identity validation rules and emitted
form/query shapes are byte-for-byte identical pre/post.

* 🪪 fix: emit resource_id alongside id on _injected_files (skill 403 fix)

Companion to codeapi #1455 fix and agents 3.1.80-dev.1 — the wire
shape for shared-kind files now requires `resource_id` distinct from
the storage `id`. Without this LC change, codeapi's sessionKey
re-derivation on every shared-kind /exec rejects with 403
session_key_mismatch:

    cached:  legacy:skill:69dcf561...✌️59  (signed at upload, skill _id)
    derived: legacy:skill:ysPwEURuPk-...✌️59  (storage nanoid)

Emit sites updated:

- `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the
  persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id`
  unchanged (storage uuid).
- `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()`
  on every primed file (the `allPrimedFiles` builder type now carries
  the field).
- `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id`
  — for `kind: 'user'` this is informational (codeapi derives
  sessionKey from auth context) but emitted for shape uniformity
  with shared kinds.

Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that
ships the matching `CodeEnvFile.resource_id` field).

## Test plan

- [x] `cd packages/api && npx jest src/agents` — 67 / 67 pass
  (skillFiles fixtures updated to assert `resource_id` on the
  emitted CodeSessionContext.files).
- [x] `cd api && npx jest server/services/Files server/controllers/agents` —
  445 / 445 pass (process.spec fixtures updated for the reupload
  + cache-hit emission).
- [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean.

* fix(skill-tool-call): carry resource_id through primeSkillFiles → artifact

Codeapi was 400ing every /exec following a `handle_skill` tool call
with `resource_id is invalid` (`type: 'undefined'`). Both code paths
in `primeSkillFiles` (cache-hit + fresh-upload) returned files
without `resource_id`/`kind`/`version`, and the artifact in
`handlers.ts` forwarded the stripped shape into
`tc.codeSessionContext.files` → `_injected_files`.

`primeInvokedSkills` (the NL-detected loader) had already been fixed
end-to-end; this commit aligns the tool-invoked path with the same
contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`,
`version` = the skill's monotonic counter.

Tests added to `skillFiles.spec.ts` lock the contract on
`primeSkillFiles` directly so future refactors can't silently drop
the resource identity again.

* fix(handlers.spec): align session_id → storage_session_id rename + kind discriminator

Pre-existing TS errors against the post-rename `CodeEnvFile` shape:
the test file still used `session_id` on per-file objects (renamed to
`storage_session_id` in agents Phase B/C) and was missing the `kind`
discriminator the discriminated union requires. Both inputs and the
matching `expect.toEqual(...)` mirrors updated together so the
runtime equality check still holds.

Lines 723-732 stay as-is — they sit behind `as unknown as
ToolCallRequest` and TS already skipped them.

* chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files

* chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json

* chore: bump `@librechat/agents` to version 3.1.80-dev.2

* feat(observability): trace file priming chain from primeCodeFiles to _injected_files

Diagnosing the user-upload "files=[] on first /exec" bug requires
seeing where in the LC chain a file ref disappears. Prior to this
patch the chain (primeCodeFiles → primedCodeFiles → initialSessions
→ CodeSessionContext → _injected_files) was opaque end-to-end:
  - primeCodeFiles silently dropped files without `metadata.codeEnvRef`
  - reuploadFile catches all errors and continues with no signal
  - the handlers.ts handoff to codeapi never logged what it was sending

After this patch, a single grep on `[primeCodeFiles]` plus
`[code-env:inject]` shows the full per-file path:

  [primeCodeFiles] in: file_ids=N resourceFiles=M
  [primeCodeFiles] file=<id> path=skip reason=no-codeenvref filename=...
  [primeCodeFiles] file=<id> path=cache-hit-by-session storage_session_id=...
  [primeCodeFiles] file=<id> path=reupload reason=no-uploadtime ...
  [primeCodeFiles] file=<id> path=reupload reason=stale ...
  [primeCodeFiles] file=<id> path=reupload-success oldSession=... newSession=... newFileId=...
  [primeCodeFiles] file=<id> path=reupload-failed session=...
  [primeCodeFiles] file=<id> path=fresh-active storage_session_id=...
  [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K

  [code-env:inject] tool=<name> files=N missingResourceId=K     (debug)
  [code-env:inject] M/N files missing resource_id ...           (warn)
  [code-env:inject] tool=<name> _injected_files=0 ...           (warn)

The boundary log warns when LC sends zero injected files on a
code-execution tool call — that's the user's actual symptom showing
up at the LC side instead of having to correlate against codeapi's
`Request received { files: [] }`.

Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to
avoid collision with the app-level "handoff" semantic (subagent
handoff workflow).

Structural cleanup in primeFiles: replaced the `if (ref) { ... }`
nesting with an early `if (!ref) continue` so the per-path
instrumentation hooks land at top-level scope instead of indented
inside a conditional. Behavior unchanged; pushFile / reuploadFile
identical.

Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated
to include `resource_id` on `CodeEnvFile` literals — required by
the post-3.1.80-dev.2 type now installed.

## Test plan

- [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` — 69/69 pass
- [x] `cd api && npx jest server/services/Files/Code/process.spec.js` — 84/84 pass
- [x] `npx tsc --noEmit -p packages/api` — clean
- [x] `npx eslint` on all four touched files — clean

* chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration

* fix(files): align codeapi upload filename with LC's sanitized DB filename

User-attached files for code execution were uploading to codeapi
under `file.originalname` (raw upload filename, may contain spaces /
special chars) while LC's DB record stored the sanitized form
(`sanitizeFilename(file.originalname)`, underscores). Codeapi
preserves whatever filename the upload sent, so the sandbox saw
`/mnt/data/<originalname>` while LC's `primeFiles` toolContext text
+ `_injected_files.name` referenced `file.filename` (sanitized).

Visible failure: agent gets system prompt saying

    /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx

…tries that path, hits `FileNotFoundError`, then notices the
sandbox's actual `Available files` line says

    /mnt/data/librechat code api - active customer - 2025-11-05.xlsx

…retries with spaces, succeeds. Wastes a tool call per upload and
leaks raw filenames into model context.

Fix: sanitize once and use the sanitized form in both the codeapi
upload AND the LC DB record. Sandbox path = LC toolContext text =
in-memory ref name. No drift.

Reupload path (`Code/process.js` line 867 `filename: file.filename`)
already uses the sanitized DB name, so it stays consistent with the
fresh-upload path after this change.

## Test plan

- [x] `cd api && npx jest server/services/Files/process` — 32/32 pass
- [x] `npx eslint` on the touched file — clean

* chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json
2026-05-08 12:29:43 -04:00
..
app 🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960) 2026-05-08 12:29:43 -04:00
cache 🚦 fix: ERR_ERL_INVALID_IP_ADDRESS and IPv6 Key Collisions in IP Rate Limiters (#12319) 2026-03-19 21:48:03 -04:00
config 🔊 fix: Preserve Log Metadata on Console for Warn/Error Levels (#12737) 2026-04-19 21:49:41 -07:00
db 🐛 fix: Resolve MeiliSearch Startup Sync Failure from Model Loading Order (#12397) 2026-03-25 14:02:44 -04:00
models 🗑️ chore: Remove Action Test Suite and Update Mock Implementations (#12268) 2026-03-21 14:28:55 -04:00
server 🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960) 2026-05-08 12:29:43 -04:00
strategies 🔐 feat: Add Signed CloudFront File Downloads (#12970) 2026-05-06 19:48:30 -04:00
test 🌱 fix: Inject Code-Tool Files Into Graph Sessions on First Call (+ read_file Sandbox Fallback) (#12831) 2026-04-27 08:56:39 +09:00
utils 🧮 feat: Add GPT-5.5 Token Definitions (#12973) 2026-05-06 10:50:16 -04:00
jest.config.js 📏 refactor: Add File Size Limits to Conversation Imports (#12221) 2026-03-14 03:06:29 -04:00
jsconfig.json
package.json 🧱 refactor: typed CodeEnvRef + kind discriminator + principal-aware sandbox cache (#12960) 2026-05-08 12:29:43 -04:00
typedefs.js 🧬 chore: Align LibreChat With Agents LangChain Upgrade (#12922) 2026-05-03 12:46:01 -04:00