mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-05-13 16:07:30 +00:00
* 📂 fix: Preserve Nested Folder Paths for Code-Execution Artifacts When codeapi reports a generated file at a nested path (`a/b/file.txt`), `processCodeOutput` was running it through `sanitizeFilename` — which calls `path.basename()` and then collapses `/` to `_`. The DB row ended up with `filename: "file.txt"`, `primeFiles` shipped that flat name back to the next sandbox session, and `cat /mnt/data/a/b/file.txt` 404'd. Fix: split the sanitizer into two helpers in `packages/api/src/utils/files.ts`: - `sanitizeArtifactPath` — segment-wise sanitize while preserving `/`. Falls back to basename on `..` traversal, absolute paths, and other malformed inputs. The DB record uses this so the next prime() can recreate the nested path in the sandbox. - `flattenArtifactPath` — encode `/` as `__` for the local `saveBuffer` strategies, which key by single-component filename and would otherwise create unintended subdirectories under uploads/. `process.js` is updated to use both: DB filename keeps the path, storage key flattens. `claimCodeFile` is also keyed on `safeName` so the (filename, conversationId) compound key stays consistent with the record `createFile` writes. Tests: +13 unit tests in `files.spec.ts` (sanitizeArtifactPath table, flattenArtifactPath round-trip). +1 integration test in `process.spec.js` asserting the DB-row vs storage-key split for a nested path. Updated `process-traversal.spec.js` to mock the new helpers. 64 pass / 0 fail across `Files/Code/`; 36 pass / 0 fail in `packages/api/src/utils/files.spec.ts`. Companion: ClickHouse/ai#1327 — the codeapi-side counterpart that stops phantom file IDs from reaching this code path in the first place. These two are independent but the matplotlib bug is most cleanly resolved when both ship. * 🛡️ fix: Re-add 255-char per-segment cap in sanitizeArtifactPath (codex review P2) `sanitizeArtifactPath` dropped the 255-char basename cap that `sanitizeFilename` enforces. Long artifact names then flowed unbounded into `processCodeOutput`'s storage key (`${file_id}__${flatName}`) and tripped `ENAMETOOLONG` on filesystems that enforce `NAME_MAX` — saveBuffer fails, and the file falls back to a download URL instead of persisting / priming. This was a regression specifically for flat filenames that the original `sanitizeFilename` would have truncated safely. Re-add the cap as a per-path-component limit so it applies cleanly to both flat and nested paths: - Leaf segment: extension-preserving truncation, matching `sanitizeFilename`'s shape (`<truncated-stem>-<6 hex>.<ext>`). - Non-leaf (directory) segments: plain truncate-and-disambiguate (`<truncated-name>-<6 hex>`); directory names don't carry semantic extensions worth preserving. - Defensive fallback when `path.extname` returns a pathologically long "extension" (e.g. `_.aaaa…aaa` after the dotfile underscore prefix rewrite turns a long hidden file into a non-dotfile with a 300-char "extension"): collapse to whole-segment truncation rather than leaving the cap unmet. +6 unit tests covering: long leaf (regression case), long leaf under a preserved directory, long non-leaf segment, deeply nested mixed-length, exact-255 boundary (no truncation), and the dotfile + truncation interaction. * 🛡️ fix: Cap flattened storage key against NAME_MAX in processCodeOutput (codex review P1) Per-segment caps on the path-preserving form aren't enough. Once segments are joined with `__` for the storage key, deeply-nested or moderately long paths can still produce a flat form that overflows once `${file_id}__` is prepended — `${file_id}__a__b__c.csv` for a 3-level 100-char-each path is ~344 chars, well past filesystem NAME_MAX (255). saveBuffer then trips ENAMETOOLONG and falls back to a download URL, and the artifact never persists / primes. `flattenArtifactPath` gets an optional `maxLength` parameter. When set, the function truncates the flat form to fit, preserving the leaf extension with the same disambiguating-hex-suffix shape sanitizeFilename uses. Default (`undefined`) keeps existing call sites uncapped — the cap is opt-in for callers that are actually building a filesystem key. Pathologically long "extensions" from `path.extname` (e.g. `.aaaa…aaa`) fall back to whole-key truncation rather than leaving the cap unmet. processCodeOutput composes the storage key after `file_id` is known and passes `255 - file_id.length - 2` as the budget so the full `${file_id}__${flatName}` string fits in one filesystem path component. +7 unit tests in files.spec.ts: - Pass-through when no maxLength supplied (cap is opt-in). - Pass-through when flat form fits within maxLength. - Truncation with leaf extension preserved (the regression case). - Leaf-only overflow with extension preservation. - Pathological long-extension fallback (whole-key truncation). - No-extension stem truncation. - Boundary equality (off-by-one guard). +1 integration test in process.spec.js: processCodeOutput passes the file_id-aware budget (`255 - file_id.length - 2`) to flattenArtifactPath. 114/114 across files.spec.ts + Files/Code (49 + 65). * 🛡️ fix: Determinize + clamp artifact-path truncation (codex review P2 ×2) Two follow-ups to Codex review on the path/flat-key cap: 1. **Deterministic truncation suffixes**. The previous helpers used `crypto.randomBytes(3)` for the disambiguator, mirroring `sanitizeFilename`'s shape. That made the truncated form non- deterministic: a re-upload of the same long filename would compute a *different* storage key, orphaning the previous on-disk file under the reused `file_id` returned by `claimCodeFile`. New `deterministicHexSuffix(input)` helper hashes the input with SHA-256 and takes the first 6 hex chars. Same input → same suffix (storage key stable across re-uploads); different inputs sharing a truncation prefix still get different suffixes (collision avoidance). 24 bits ≈ 16M values is collision-safe for our scale (single-digit artifacts per turn per (filename, conversationId) bucket). Applied to `truncateLeafSegment`, `truncateDirSegment`, and `flattenArtifactPath` — every truncation site in the new helpers. `sanitizeFilename` (pre-existing) is intentionally left alone; its tests rely on the random-bytes mock and it's outside this PR's scope. 2. **Final clamp on flattenArtifactPath result**. The old `Math.max(1, maxLength - ext.length - 7)` floor could let the result slip past `maxLength` when the extension was nearly as large as the budget (e.g. `maxLength=5`, `ext=".txt"`: budget computed as 0, but result was `-<6 hex>.txt` = 11 chars). Drop the `Math.max(1, …)` floor and add a final `truncated.slice(0, maxLength)` so the contract holds for any input. Also short-circuit `maxLength <= 0` to `''` for pathological budgets. Tests updated to compute the expected hash inline (the existing `randomBytes` mock doesn't apply to the new code path), plus 4 new regression tests: - sanitizeArtifactPath: same input → same output, different inputs → different outputs (determinism + collision avoidance). - flattenArtifactPath: same input → same output, different inputs sharing a truncation prefix → different outputs. - flattenArtifactPath: clamp holds when ext.length > maxLength - 7. - flattenArtifactPath: returns '' for maxLength <= 0. 53 unit tests pass. 65 integration tests pass. * 🛡️ fix: Total-path cap + basename for classifier (codex P2 + comprehensive review) Four follow-ups from the latest reviews on this PR: 1. **Codex P2: total-path cap in sanitizeArtifactPath**. Per-segment caps weren't enough — a deeply nested path (3+ at-cap segments) can still produce a joined form past Mongo's 1024-byte indexed-key limit (4.0 and earlier reject; later versions configurable). Added `ARTIFACT_PATH_TOTAL_MAX = 512` and a leaf-only fallback when the joined form exceeds it. Same shape as the absolute-path / `..`-traversal fallbacks above; the leaf is already segment-capped to ≤255, so the final result stays within bounds. 2. **Codex P2: pass basename to classifier/extractor in process.js**. With the path-preserving sanitizer, `safeName` can now be a nested string like `reports.v1/Makefile`. The classifier's `extensionOf` reads that as `v1/Makefile` (the slice after the dot in the directory name) and the bare-name branch rejects because it sees a `.` anywhere. Result: extensionless artifacts under dotted folders (Makefile, Dockerfile, etc.) get misclassified as `other` and skip text extraction. Pass `path.basename(safeName)` to both `classifyCodeArtifact` and `extractCodeArtifactText` so classification matches what the old flat-name flow produced. 3. **Review nit: drop dead `sanitizeFilename` mock in process.spec.js**. process.js no longer imports `sanitizeFilename`; the mock was misleading dead code. 4. **Review nit: rename misleading `'embedded parent traversal'` test**. `path.posix.normalize('a/../escape.txt')` resolves to `escape.txt` which goes through the normal segment-split path, not the `sanitizeFilename` fallback. Test name now says "resolves embedded parent traversal via path normalization" to match the actual code path. +3 regression tests: - sanitizeArtifactPath falls back to leaf-only when joined > 512. - sanitizeArtifactPath keeps nested path within the 512 budget. - process.spec: passes basename (`Makefile` from `reports.v1/Makefile`) to classifyCodeArtifact + extractCodeArtifactText. Existing "caps every segment in a deeply-nested path" test now uses 2 segments (not 3) so the joined form stays under the new total cap; the 3-segment scenario is covered by the new fallback test instead. 55 unit + 66 integration = 121/121 pass. * 📝 docs: Correct sanitizeArtifactPath JSDoc to match actual schema index Two doc-only fixes from the latest comprehensive review (both NIT): 1. **Index field list was wrong**. JSDoc claimed the compound unique index was `{ file_id, filename, conversationId, context }`. The actual index in `packages/data-schemas/src/schema/file.ts:92-95` is `{ filename, conversationId, context, tenantId }` with a partial filter for `context: FileContext.execute_code`. The cap rationale (Mongo 4.0 indexed-key limit) is correct and unchanged; just the field list was wrong. Added the schema file path so future readers can find the source of truth. 2. **Trade-off acknowledgement**. The reviewer noted that the leaf-only fallback loses directory structure, which means the model's `cat /mnt/data/<deep>/<path>/file.txt` would 404 on the pathological-depth case — partially re-introducing the original flat-name bug for >512-char paths. This is intentional (DB write failure is strictly worse than losing structure), but the trade-off wasn't called out explicitly in the JSDoc. Added a paragraph acknowledging it and noting that the cap is monotonically better than the pre-PR behavior, where ALL artifacts were treated this way regardless of depth. No code or test changes — pure JSDoc correction. Tests still 55/0. * 🛡️ fix: Disambiguate sanitized artifact names to keep claimCodeFile keys unique (codex P2) `sanitizeArtifactPath` is not injective — multiple raw inputs can collapse onto the same regex-and-normalize output. Codex's example: `reports 2026/out.csv` and `reports_2026/out.csv` both sanitize to `reports_2026/out.csv`. `claimCodeFile` is keyed on the schema's compound unique `(filename, conversationId, context, tenantId)` index, so the later upload silently matches the earlier record and overwrites the first artifact's bytes via the reused `file_id` — a single conversation can drop files when both names are valid in the sandbox. This collision space isn't strictly new — pre-PR `sanitizeFilename` (basename-only) had the same property — but the path-preserving form gives us enough information to fix it for the first time. **Fix.** When character-level sanitization changed something (regex replacement, path normalization, dotfile prefix, empty-segment collapse), embed a deterministic SHA-256 prefix of the **raw input** in the leaf segment via the new `embedDisambiguatorInLeaf` helper. Same raw input → same safe form (idempotent for re-uploads); different raw inputs that would have collided → different safe forms. **Why "character-level"** specifically: - The disambiguator fires when `preCapJoined !== inputName` (post-regex + dotfile + empty-segment, BUT pre-truncation). - Truncation alone is already disambiguated by `truncateLeafSegment`'s own seg-hash; firing the input-hash branch on truncation would just stack a second hash for no collision-avoidance benefit and clutter human-readable filenames. **Three known collision shapes covered:** 1. `out 1.csv` vs `out_1.csv` (and `out@1.csv` vs `out#1.csv`, etc.) 2. `dir//file.txt` vs `dir/file.txt` (empty-segment collapse) 3. `.x` vs `_.x` (dotfile-prefix step) **Disambiguator + truncation interaction:** for very long mutated leaves, `truncateLeafSegment` caps at 255 first, then `embedDisambiguatorInLeaf` re-trims to insert the input hash. The seg-hash from the first pass is replaced by the input-hash from the second pass — that's intentional (input-hash is the load-bearing collision-avoidance suffix; seg-hash was only ever decorative once the input-hash exists). Final clamp ensures the result never exceeds `ARTIFACT_PATH_SEGMENT_MAX` regardless of input. **Disambiguator + total-cap fallback:** when joined > 512, we fall back to the leaf-only form. The leaf has already had the disambiguator embedded, so collision avoidance survives the pathological-depth case. **`embedDisambiguatorInLeaf`** uses `dot <= 1` to detect "no real extension" (covers extensionless names AND dotfile-prefixed leaves like `_.hidden` — without this, `_.hidden` would split as stem `_` + ext `.hidden` and produce the awkward `_-<hash>.hidden`). **Updated 5 existing tests** that asserted the old collision-prone outputs — they now verify the disambiguator-included form. The character-level-only firing rule was load-bearing here: tests for "clean inputs (no mutation)" and "long inputs (truncation only)" still pass without any disambiguator clutter. **+7 regression tests** in a new `collision avoidance (Codex review P2)` describe block: 1. Different raw inputs sanitizing to the same form get distinct safes 2. Whitespace-vs-underscore in directory segment 3. Dotfile-prefix collision 4. Idempotency: same raw → same safe across calls 5. Clean inputs skip the disambiguator (cosmetic guarantee) 6. Disambiguator survives leaf truncation (long mutated leaf) 7. Disambiguator survives total-cap fallback (pathological depth) 62 unit + 66 integration = 128/128 pass.
1102 lines
42 KiB
JavaScript
1102 lines
42 KiB
JavaScript
// Configurable file size limit for tests - use a getter so it can be changed per test
|
|
const fileSizeLimitConfig = { value: 20 * 1024 * 1024 }; // Default 20MB
|
|
|
|
// Mock librechat-data-provider with configurable file size limit
|
|
jest.mock('librechat-data-provider', () => {
|
|
const actual = jest.requireActual('librechat-data-provider');
|
|
return {
|
|
...actual,
|
|
mergeFileConfig: jest.fn((config) => {
|
|
const merged = actual.mergeFileConfig(config);
|
|
// Override the serverFileSizeLimit with our test value
|
|
return {
|
|
...merged,
|
|
get serverFileSizeLimit() {
|
|
return fileSizeLimitConfig.value;
|
|
},
|
|
};
|
|
}),
|
|
getEndpointFileConfig: jest.fn((options) => {
|
|
const config = actual.getEndpointFileConfig(options);
|
|
// Override fileSizeLimit with our test value
|
|
return {
|
|
...config,
|
|
get fileSizeLimit() {
|
|
return fileSizeLimitConfig.value;
|
|
},
|
|
};
|
|
}),
|
|
};
|
|
});
|
|
|
|
const { FileContext } = require('librechat-data-provider');
|
|
|
|
// Mock uuid
|
|
jest.mock('uuid', () => ({
|
|
v4: jest.fn(() => 'mock-uuid-1234'),
|
|
}));
|
|
|
|
// Mock axios — process.js now uses createAxiosInstance() from @librechat/api
|
|
const mockAxios = jest.fn();
|
|
mockAxios.post = jest.fn();
|
|
mockAxios.isAxiosError = jest.fn(() => false);
|
|
|
|
const mockClassifyCodeArtifact = jest.fn(() => 'other');
|
|
const mockExtractCodeArtifactText = jest.fn(async () => null);
|
|
jest.mock('@librechat/api', () => {
|
|
const http = require('http');
|
|
const https = require('https');
|
|
return {
|
|
logAxiosError: jest.fn(),
|
|
getBasePath: jest.fn(() => ''),
|
|
sanitizeArtifactPath: jest.fn((name) => name),
|
|
flattenArtifactPath: jest.fn((name) => name.replace(/\//g, '__')),
|
|
createAxiosInstance: jest.fn(() => mockAxios),
|
|
/**
|
|
* Arrow-function indirection (vs. a direct `jest.fn()` reference) so
|
|
* tests can per-case `mockReturnValueOnce` / `mockImplementationOnce`
|
|
* on `mockClassifyCodeArtifact` / `mockExtractCodeArtifactText`.
|
|
* `jest.mock(...)` is hoisted above the outer `const` declarations
|
|
* at parse time, so a direct reference here would capture
|
|
* `undefined`; the arrow defers the binding to call time. The
|
|
* direct-`jest.fn()` mocks below stay constant per file.
|
|
*/
|
|
classifyCodeArtifact: (...args) => mockClassifyCodeArtifact(...args),
|
|
extractCodeArtifactText: (...args) => mockExtractCodeArtifactText(...args),
|
|
codeServerHttpAgent: new http.Agent({ keepAlive: false }),
|
|
codeServerHttpsAgent: new https.Agent({ keepAlive: false }),
|
|
};
|
|
});
|
|
|
|
jest.mock('@librechat/data-schemas', () => ({
|
|
logger: {
|
|
warn: jest.fn(),
|
|
debug: jest.fn(),
|
|
error: jest.fn(),
|
|
},
|
|
}));
|
|
|
|
jest.mock('@librechat/agents', () => ({
|
|
getCodeBaseURL: jest.fn(() => 'https://code-api.example.com'),
|
|
}));
|
|
|
|
// Mock models
|
|
const mockClaimCodeFile = jest.fn();
|
|
jest.mock('~/models', () => ({
|
|
createFile: jest.fn().mockResolvedValue({}),
|
|
getFiles: jest.fn(),
|
|
updateFile: jest.fn(),
|
|
claimCodeFile: (...args) => mockClaimCodeFile(...args),
|
|
}));
|
|
|
|
// Mock permissions (must be before process.js import)
|
|
jest.mock('~/server/services/Files/permissions', () => ({
|
|
filterFilesByAgentAccess: jest.fn((options) => Promise.resolve(options.files)),
|
|
}));
|
|
|
|
// Mock strategy functions
|
|
jest.mock('~/server/services/Files/strategies', () => ({
|
|
getStrategyFunctions: jest.fn(),
|
|
}));
|
|
|
|
// Mock convertImage
|
|
jest.mock('~/server/services/Files/images/convert', () => ({
|
|
convertImage: jest.fn(),
|
|
}));
|
|
|
|
// Mock determineFileType
|
|
jest.mock('~/server/utils', () => ({
|
|
determineFileType: jest.fn(),
|
|
}));
|
|
|
|
const http = require('http');
|
|
const https = require('https');
|
|
const { createFile, getFiles } = require('~/models');
|
|
const { getStrategyFunctions } = require('~/server/services/Files/strategies');
|
|
const { convertImage } = require('~/server/services/Files/images/convert');
|
|
const { determineFileType } = require('~/server/utils');
|
|
const { logger } = require('@librechat/data-schemas');
|
|
const { codeServerHttpAgent, codeServerHttpsAgent } = require('@librechat/api');
|
|
|
|
const { processCodeOutput, getSessionInfo, readSandboxFile, primeFiles } = require('./process');
|
|
|
|
describe('Code Process', () => {
|
|
const mockReq = {
|
|
user: { id: 'user-123' },
|
|
config: {
|
|
fileConfig: {},
|
|
fileStrategy: 'local',
|
|
imageOutputType: 'webp',
|
|
},
|
|
};
|
|
|
|
const baseParams = {
|
|
req: mockReq,
|
|
id: 'file-id-123',
|
|
name: 'test-file.txt',
|
|
apiKey: 'test-api-key',
|
|
toolCallId: 'tool-call-123',
|
|
conversationId: 'conv-123',
|
|
messageId: 'msg-123',
|
|
session_id: 'session-123',
|
|
};
|
|
|
|
beforeEach(() => {
|
|
jest.clearAllMocks();
|
|
// Default mock: atomic claim returns a new file record (no existing file)
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'mock-uuid-1234',
|
|
user: 'user-123',
|
|
});
|
|
getFiles.mockResolvedValue(null);
|
|
createFile.mockResolvedValue({});
|
|
getStrategyFunctions.mockReturnValue({
|
|
saveBuffer: jest.fn().mockResolvedValue('/uploads/mock-file-path.txt'),
|
|
});
|
|
determineFileType.mockResolvedValue({ mime: 'text/plain' });
|
|
});
|
|
|
|
describe('atomic file claim (via processCodeOutput)', () => {
|
|
it('should reuse file_id from existing record via atomic claim', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-file-id',
|
|
filename: 'test-file.txt',
|
|
usage: 2,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
});
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(mockClaimCodeFile).toHaveBeenCalledWith({
|
|
filename: 'test-file.txt',
|
|
conversationId: 'conv-123',
|
|
file_id: 'mock-uuid-1234',
|
|
user: 'user-123',
|
|
});
|
|
|
|
expect(result.file_id).toBe('existing-file-id');
|
|
expect(result.usage).toBe(3);
|
|
expect(result.createdAt).toBe('2024-01-01T00:00:00.000Z');
|
|
});
|
|
|
|
it('should create new file when no existing file found', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'mock-uuid-1234',
|
|
user: 'user-123',
|
|
});
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.file_id).toBe('mock-uuid-1234');
|
|
expect(result.usage).toBe(1);
|
|
});
|
|
});
|
|
|
|
describe('processCodeOutput', () => {
|
|
describe('image file processing', () => {
|
|
it('should process image files using convertImage', async () => {
|
|
const imageParams = { ...baseParams, name: 'chart.png' };
|
|
const imageBuffer = Buffer.alloc(500);
|
|
mockAxios.mockResolvedValue({ data: imageBuffer });
|
|
|
|
const convertedFile = {
|
|
filepath: '/uploads/converted-image.webp',
|
|
bytes: 400,
|
|
};
|
|
convertImage.mockResolvedValue(convertedFile);
|
|
|
|
const result = await processCodeOutput(imageParams);
|
|
|
|
expect(convertImage).toHaveBeenCalledWith(
|
|
mockReq,
|
|
imageBuffer,
|
|
'high',
|
|
'mock-uuid-1234.png',
|
|
);
|
|
expect(result.type).toBe('image/webp');
|
|
expect(result.context).toBe(FileContext.execute_code);
|
|
expect(result.filename).toBe('chart.png');
|
|
});
|
|
|
|
it('should update existing image file with cache-busted filepath', async () => {
|
|
const imageParams = { ...baseParams, name: 'chart.png' };
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-img-id',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
});
|
|
|
|
const imageBuffer = Buffer.alloc(500);
|
|
mockAxios.mockResolvedValue({ data: imageBuffer });
|
|
convertImage.mockResolvedValue({ filepath: '/images/user-123/existing-img-id.webp' });
|
|
|
|
const result = await processCodeOutput(imageParams);
|
|
|
|
expect(convertImage).toHaveBeenCalledWith(
|
|
mockReq,
|
|
imageBuffer,
|
|
'high',
|
|
'existing-img-id.png',
|
|
);
|
|
expect(result.file_id).toBe('existing-img-id');
|
|
expect(result.usage).toBe(2);
|
|
expect(result.filepath).toMatch(/^\/images\/user-123\/existing-img-id\.webp\?v=\d+$/);
|
|
expect(logger.debug).toHaveBeenCalledWith(
|
|
expect.stringContaining('Updating existing file'),
|
|
);
|
|
});
|
|
});
|
|
|
|
describe('non-image file processing', () => {
|
|
it('should process non-image files using saveBuffer', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const mockSaveBuffer = jest.fn().mockResolvedValue('/uploads/saved-file.txt');
|
|
getStrategyFunctions.mockReturnValue({ saveBuffer: mockSaveBuffer });
|
|
determineFileType.mockResolvedValue({ mime: 'text/plain' });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(mockSaveBuffer).toHaveBeenCalledWith({
|
|
userId: 'user-123',
|
|
buffer: smallBuffer,
|
|
fileName: 'mock-uuid-1234__test-file.txt',
|
|
basePath: 'uploads',
|
|
});
|
|
expect(result.type).toBe('text/plain');
|
|
expect(result.filepath).toBe('/uploads/saved-file.txt');
|
|
expect(result.bytes).toBe(100);
|
|
});
|
|
|
|
it('preserves nested directory paths in the DB record while flattening the storage key', async () => {
|
|
/* Regression test for the silent-data-loss path: when codeapi reports a
|
|
* file with a nested name like "test_folder/test_file.txt", LibreChat
|
|
* used to feed it through `sanitizeFilename` (basename-only), which
|
|
* persisted "test_file.txt" to the DB and made the file un-locatable on
|
|
* the next prime() (cat /mnt/data/test_folder/test_file.txt would
|
|
* 404). The fix: keep the path on the DB record (so primeFiles can
|
|
* place it back at the same nested location), but flatten it for the
|
|
* storage key (saveBuffer strategies key by single component). */
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
const mockSaveBuffer = jest.fn().mockResolvedValue('/uploads/saved.txt');
|
|
getStrategyFunctions.mockReturnValue({ saveBuffer: mockSaveBuffer });
|
|
|
|
const result = await processCodeOutput({
|
|
...baseParams,
|
|
name: 'test_folder/test_file.txt',
|
|
});
|
|
|
|
// Storage key flattens `/` to `__` so on-disk strategies don't
|
|
// accidentally create real subdirectories under uploads/.
|
|
expect(mockSaveBuffer).toHaveBeenCalledWith(
|
|
expect.objectContaining({
|
|
fileName: 'mock-uuid-1234__test_folder__test_file.txt',
|
|
}),
|
|
);
|
|
// DB row keeps the nested path verbatim — that's what primeFiles
|
|
// ships back to the sandbox on the next turn.
|
|
expect(result.filename).toBe('test_folder/test_file.txt');
|
|
// Claim is also keyed by the path-preserving name so the
|
|
// (filename, conversationId) compound key stays consistent.
|
|
expect(mockClaimCodeFile).toHaveBeenCalledWith(
|
|
expect.objectContaining({ filename: 'test_folder/test_file.txt' }),
|
|
);
|
|
});
|
|
|
|
it('passes a NAME_MAX-aware budget to flattenArtifactPath when composing the storage key', async () => {
|
|
/* Codex review P1: per-segment caps on the path-preserving form
|
|
* aren't enough — once the segments are joined with `__` for the
|
|
* storage key, deeply-nested or moderately long paths can still
|
|
* exceed filesystem NAME_MAX (255) and cause `ENAMETOOLONG` in
|
|
* saveBuffer. processCodeOutput must pass a file_id-aware budget
|
|
* to flattenArtifactPath so the cap holds end-to-end. The unit
|
|
* tests in `packages/api/src/utils/files.spec.ts` cover the
|
|
* truncation logic itself; this test covers the integration. */
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
const mockSaveBuffer = jest.fn().mockResolvedValue('/uploads/saved.bin');
|
|
getStrategyFunctions.mockReturnValue({ saveBuffer: mockSaveBuffer });
|
|
|
|
const flattenSpy = require('@librechat/api').flattenArtifactPath;
|
|
flattenSpy.mockClear();
|
|
|
|
await processCodeOutput({ ...baseParams, name: 'a/b/c.csv' });
|
|
|
|
// The handler should call flattenArtifactPath with both the
|
|
// safeName AND a budget = NAME_MAX (255) minus the prefix
|
|
// (`${file_id}__`). file_id mock is `mock-uuid-1234` (14 chars),
|
|
// so the budget should be 255 - 14 - 2 = 239.
|
|
expect(flattenSpy).toHaveBeenCalledWith(expect.any(String), 239);
|
|
});
|
|
|
|
it('passes the basename (not the full nested path) to classifyCodeArtifact and extractCodeArtifactText', async () => {
|
|
/* Codex review P2: with the path-preserving sanitizer, `safeName`
|
|
* can be a nested string like `reports.v1/Makefile`. The
|
|
* classifier reads `extensionOf` against the full string, which
|
|
* sees `.v1/Makefile` after the dotted-dir's first dot and
|
|
* misclassifies the file as `other` (so text extraction is
|
|
* skipped). Pass `path.basename(safeName)` instead. */
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
const mockSaveBuffer = jest.fn().mockResolvedValue('/uploads/saved.txt');
|
|
getStrategyFunctions.mockReturnValue({ saveBuffer: mockSaveBuffer });
|
|
|
|
await processCodeOutput({
|
|
...baseParams,
|
|
name: 'reports.v1/Makefile',
|
|
});
|
|
|
|
expect(mockClassifyCodeArtifact).toHaveBeenCalledWith('Makefile', expect.any(String));
|
|
expect(mockExtractCodeArtifactText).toHaveBeenCalledWith(
|
|
expect.any(Buffer),
|
|
'Makefile',
|
|
expect.any(String),
|
|
expect.any(String),
|
|
);
|
|
});
|
|
|
|
it('should detect MIME type from buffer', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
determineFileType.mockResolvedValue({ mime: 'application/pdf' });
|
|
|
|
const result = await processCodeOutput({ ...baseParams, name: 'document.pdf' });
|
|
|
|
expect(determineFileType).toHaveBeenCalledWith(smallBuffer, true);
|
|
expect(result.type).toBe('application/pdf');
|
|
});
|
|
|
|
it('should fallback to application/octet-stream for unknown types', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
determineFileType.mockResolvedValue(null);
|
|
|
|
const result = await processCodeOutput({ ...baseParams, name: 'unknown.xyz' });
|
|
|
|
expect(result.type).toBe('application/octet-stream');
|
|
});
|
|
});
|
|
|
|
describe('inline text extraction', () => {
|
|
it('should populate text on the file when extractor returns content', async () => {
|
|
const buffer = Buffer.from('hello world\n', 'utf-8');
|
|
mockAxios.mockResolvedValue({ data: buffer });
|
|
determineFileType.mockResolvedValue({ mime: 'text/plain' });
|
|
mockClassifyCodeArtifact.mockReturnValueOnce('utf8-text');
|
|
mockExtractCodeArtifactText.mockResolvedValueOnce('hello world\n');
|
|
|
|
const result = await processCodeOutput({ ...baseParams, name: 'note.txt' });
|
|
|
|
expect(mockClassifyCodeArtifact).toHaveBeenCalledWith('note.txt', 'text/plain');
|
|
expect(mockExtractCodeArtifactText).toHaveBeenCalledWith(
|
|
buffer,
|
|
'note.txt',
|
|
'text/plain',
|
|
'utf8-text',
|
|
);
|
|
expect(result.text).toBe('hello world\n');
|
|
expect(createFile).toHaveBeenCalledWith(
|
|
expect.objectContaining({ text: 'hello world\n' }),
|
|
true,
|
|
);
|
|
});
|
|
|
|
it('should set text to null when extractor returns null so updates clear stale values', async () => {
|
|
const buffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: buffer });
|
|
determineFileType.mockResolvedValue({ mime: 'application/octet-stream' });
|
|
mockClassifyCodeArtifact.mockReturnValueOnce('other');
|
|
mockExtractCodeArtifactText.mockResolvedValueOnce(null);
|
|
|
|
const result = await processCodeOutput({ ...baseParams, name: 'archive.zip' });
|
|
|
|
expect(result.text).toBeNull();
|
|
const createCall = createFile.mock.calls[0][0];
|
|
expect(createCall.text).toBeNull();
|
|
});
|
|
|
|
it('should overwrite a previously-stored text value when re-emitting a now-binary file', async () => {
|
|
// Same filename + conversationId already has a stored text value;
|
|
// claimCodeFile returns the existing record (isUpdate path).
|
|
mockClaimCodeFile.mockResolvedValueOnce({
|
|
file_id: 'existing-id',
|
|
filename: 'output.bin',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
});
|
|
const binaryBuffer = Buffer.from([0x00, 0xff, 0x00, 0xff]);
|
|
mockAxios.mockResolvedValue({ data: binaryBuffer });
|
|
determineFileType.mockResolvedValue({ mime: 'application/octet-stream' });
|
|
mockClassifyCodeArtifact.mockReturnValueOnce('other');
|
|
mockExtractCodeArtifactText.mockResolvedValueOnce(null);
|
|
|
|
await processCodeOutput({ ...baseParams, name: 'output.bin' });
|
|
|
|
// null (not omitted) so $set clears any prior `text` value.
|
|
const createCall = createFile.mock.calls[0][0];
|
|
expect(createCall).toHaveProperty('text', null);
|
|
});
|
|
|
|
it('should not invoke text extraction for image files', async () => {
|
|
const imageBuffer = Buffer.alloc(500);
|
|
mockAxios.mockResolvedValue({ data: imageBuffer });
|
|
convertImage.mockResolvedValue({ filepath: '/uploads/x.webp', bytes: 400 });
|
|
|
|
await processCodeOutput({ ...baseParams, name: 'chart.png' });
|
|
|
|
expect(mockClassifyCodeArtifact).not.toHaveBeenCalled();
|
|
expect(mockExtractCodeArtifactText).not.toHaveBeenCalled();
|
|
});
|
|
});
|
|
|
|
describe('file size limit enforcement', () => {
|
|
it('should fallback to download URL when file exceeds size limit', async () => {
|
|
// Set a small file size limit for this test
|
|
fileSizeLimitConfig.value = 1000; // 1KB limit
|
|
|
|
const largeBuffer = Buffer.alloc(5000); // 5KB - exceeds 1KB limit
|
|
mockAxios.mockResolvedValue({ data: largeBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(logger.warn).toHaveBeenCalledWith(expect.stringContaining('exceeds size limit'));
|
|
expect(result.filepath).toContain('/api/files/code/download/session-123/file-id-123');
|
|
expect(result.expiresAt).toBeDefined();
|
|
// Should not call createFile for oversized files (fallback path)
|
|
expect(createFile).not.toHaveBeenCalled();
|
|
|
|
// Reset to default for other tests
|
|
fileSizeLimitConfig.value = 20 * 1024 * 1024;
|
|
});
|
|
});
|
|
|
|
describe('fallback behavior', () => {
|
|
it('should fallback to download URL when saveBuffer is not available', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
getStrategyFunctions.mockReturnValue({ saveBuffer: null });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(logger.warn).toHaveBeenCalledWith(
|
|
expect.stringContaining('saveBuffer not available'),
|
|
);
|
|
expect(result.filepath).toContain('/api/files/code/download/');
|
|
expect(result.filename).toBe('test-file.txt');
|
|
});
|
|
|
|
it('should fallback to download URL on axios error', async () => {
|
|
mockAxios.mockRejectedValue(new Error('Network error'));
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.filepath).toContain('/api/files/code/download/session-123/file-id-123');
|
|
expect(result.conversationId).toBe('conv-123');
|
|
expect(result.messageId).toBe('msg-123');
|
|
expect(result.toolCallId).toBe('tool-call-123');
|
|
});
|
|
});
|
|
|
|
describe('usage counter increment', () => {
|
|
it('should set usage to 1 for new files', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.usage).toBe(1);
|
|
});
|
|
|
|
it('should increment usage for existing files', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-id',
|
|
usage: 5,
|
|
createdAt: '2024-01-01',
|
|
});
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.usage).toBe(6);
|
|
});
|
|
|
|
it('should handle existing file with undefined usage', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-id',
|
|
createdAt: '2024-01-01',
|
|
});
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.usage).toBe(1);
|
|
});
|
|
});
|
|
|
|
describe('metadata and file properties', () => {
|
|
it('should include fileIdentifier in metadata', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.metadata).toEqual({
|
|
fileIdentifier: 'session-123/file-id-123',
|
|
});
|
|
});
|
|
|
|
it('should set correct context for code-generated files', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.context).toBe(FileContext.execute_code);
|
|
});
|
|
|
|
it('should include toolCallId and messageId in result', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput(baseParams);
|
|
|
|
expect(result.toolCallId).toBe('tool-call-123');
|
|
expect(result.messageId).toBe('msg-123');
|
|
});
|
|
|
|
it('should call createFile with upsert enabled', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
await processCodeOutput(baseParams);
|
|
|
|
expect(createFile).toHaveBeenCalledWith(
|
|
expect.objectContaining({
|
|
file_id: 'mock-uuid-1234',
|
|
context: FileContext.execute_code,
|
|
}),
|
|
true, // upsert flag
|
|
);
|
|
});
|
|
});
|
|
|
|
describe('persistedMessageId (regression for cross-turn priming)', () => {
|
|
/**
|
|
* `getCodeGeneratedFiles` filters by `messageId IN <thread message ids>`
|
|
* to scope files to the current branch. If `processCodeOutput` overwrote
|
|
* the file's `messageId` with the current run's id on every update, a
|
|
* file re-touched by a later turn (e.g. a failed read attempt that
|
|
* re-shells the same filename) would lose its link to the assistant
|
|
* message that originally produced it. Subsequent turns then can't find
|
|
* it via `getCodeGeneratedFiles`, the priming chain has nothing to seed,
|
|
* and the model thinks its own prior-turn artifact disappeared.
|
|
*
|
|
* Contract:
|
|
* - On UPDATE (claimCodeFile returned an existing record): the persisted
|
|
* `messageId` is `claimed.messageId` (preserved). Falls back to the
|
|
* current run's `messageId` when the existing record predates the
|
|
* `messageId` field (legacy data).
|
|
* - On CREATE (new file): the persisted `messageId` is the current run's.
|
|
* - The runtime return value ALWAYS uses the current run's `messageId`
|
|
* via `Object.assign(file, { messageId, toolCallId })` so the artifact
|
|
* attaches to the correct tool_call in the live response.
|
|
*/
|
|
|
|
/**
|
|
* `processCodeOutput` mutates the file object after `createFile` returns
|
|
* (`Object.assign(file, { messageId, toolCallId })`) so the runtime
|
|
* caller sees the live messageId on the response. Reading
|
|
* `createFile.mock.calls[0][0]` directly would therefore reflect the
|
|
* post-mutation state because JS captures by reference. To assert
|
|
* what was actually PERSISTED, snapshot the args at call time.
|
|
*/
|
|
function snapshotCreateFileArgs() {
|
|
const snapshots = [];
|
|
createFile.mockImplementation(async (file) => {
|
|
snapshots.push({ ...file });
|
|
return {};
|
|
});
|
|
return snapshots;
|
|
}
|
|
|
|
it('preserves the original messageId in the persisted record on UPDATE', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-id',
|
|
filename: 'sentinel.txt',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
messageId: 'turn-1-original-msg',
|
|
});
|
|
const persisted = snapshotCreateFileArgs();
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
await processCodeOutput({
|
|
...baseParams,
|
|
name: 'sentinel.txt',
|
|
messageId: 'turn-2-current-run-msg',
|
|
});
|
|
|
|
expect(persisted[0].messageId).toBe('turn-1-original-msg');
|
|
});
|
|
|
|
it('falls back to current run messageId on UPDATE when claimed.messageId is undefined (legacy record)', async () => {
|
|
// Legacy record predates the persistedMessageId tracking.
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'legacy-id',
|
|
filename: 'legacy.txt',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
// messageId intentionally absent
|
|
});
|
|
const persisted = snapshotCreateFileArgs();
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
await processCodeOutput({
|
|
...baseParams,
|
|
name: 'legacy.txt',
|
|
messageId: 'turn-N-current-run-msg',
|
|
});
|
|
|
|
expect(persisted[0].messageId).toBe('turn-N-current-run-msg');
|
|
});
|
|
|
|
it('uses the current run messageId on CREATE (no claimed record)', async () => {
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'mock-uuid-1234',
|
|
user: 'user-123',
|
|
});
|
|
const persisted = snapshotCreateFileArgs();
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
await processCodeOutput({
|
|
...baseParams,
|
|
messageId: 'turn-1-create-msg',
|
|
});
|
|
|
|
expect(persisted[0].messageId).toBe('turn-1-create-msg');
|
|
});
|
|
|
|
it('returns the CURRENT run messageId in the runtime response even on UPDATE (artifact attribution)', async () => {
|
|
// The persisted DB record keeps the original messageId, but the
|
|
// returned object surfaces the live messageId so the artifact lands
|
|
// on the correct tool_call in this run's response.
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-id',
|
|
filename: 'sentinel.txt',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
messageId: 'turn-1-original-msg',
|
|
});
|
|
const persisted = snapshotCreateFileArgs();
|
|
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
const result = await processCodeOutput({
|
|
...baseParams,
|
|
name: 'sentinel.txt',
|
|
messageId: 'turn-2-current-run-msg',
|
|
});
|
|
|
|
// DB preserves original
|
|
expect(persisted[0].messageId).toBe('turn-1-original-msg');
|
|
// Runtime return surfaces the live (current) messageId
|
|
expect(result.messageId).toBe('turn-2-current-run-msg');
|
|
});
|
|
|
|
it('preserves the original messageId on UPDATE for image files too', async () => {
|
|
// Same contract as text files; the image branch builds its own file
|
|
// record and would silently regress if the ternary diverged there.
|
|
mockClaimCodeFile.mockResolvedValue({
|
|
file_id: 'existing-img',
|
|
filename: 'chart.png',
|
|
usage: 1,
|
|
createdAt: '2024-01-01T00:00:00.000Z',
|
|
messageId: 'turn-1-image-msg',
|
|
});
|
|
const persisted = snapshotCreateFileArgs();
|
|
|
|
const imageBuffer = Buffer.alloc(500);
|
|
mockAxios.mockResolvedValue({ data: imageBuffer });
|
|
convertImage.mockResolvedValue({
|
|
filepath: '/uploads/chart.webp',
|
|
bytes: 400,
|
|
});
|
|
|
|
await processCodeOutput({
|
|
...baseParams,
|
|
name: 'chart.png',
|
|
messageId: 'turn-2-current-img-msg',
|
|
});
|
|
|
|
expect(persisted[0].messageId).toBe('turn-1-image-msg');
|
|
});
|
|
});
|
|
|
|
describe('socket pool isolation', () => {
|
|
it('should pass dedicated keepAlive:false agents to axios for processCodeOutput', async () => {
|
|
const smallBuffer = Buffer.alloc(100);
|
|
mockAxios.mockResolvedValue({ data: smallBuffer });
|
|
|
|
await processCodeOutput(baseParams);
|
|
|
|
const callConfig = mockAxios.mock.calls[0][0];
|
|
expect(callConfig.httpAgent).toBe(codeServerHttpAgent);
|
|
expect(callConfig.httpsAgent).toBe(codeServerHttpsAgent);
|
|
expect(callConfig.httpAgent).toBeInstanceOf(http.Agent);
|
|
expect(callConfig.httpsAgent).toBeInstanceOf(https.Agent);
|
|
expect(callConfig.httpAgent.keepAlive).toBe(false);
|
|
expect(callConfig.httpsAgent.keepAlive).toBe(false);
|
|
});
|
|
|
|
it('should pass dedicated keepAlive:false agents to axios for getSessionInfo', async () => {
|
|
mockAxios.mockResolvedValue({
|
|
data: [{ name: 'sess/fid', lastModified: new Date().toISOString() }],
|
|
});
|
|
|
|
await getSessionInfo('sess/fid', 'api-key');
|
|
|
|
const callConfig = mockAxios.mock.calls[0][0];
|
|
expect(callConfig.httpAgent).toBe(codeServerHttpAgent);
|
|
expect(callConfig.httpsAgent).toBe(codeServerHttpsAgent);
|
|
expect(callConfig.httpAgent.keepAlive).toBe(false);
|
|
expect(callConfig.httpsAgent.keepAlive).toBe(false);
|
|
});
|
|
});
|
|
});
|
|
|
|
describe('readSandboxFile', () => {
|
|
/**
|
|
* `readSandboxFile` shells `cat <file_path>` through the sandbox
|
|
* `/exec` endpoint. The `file_path` argument is model-controlled, so
|
|
* the single-quote escaping is a security boundary — a regression
|
|
* here would let a malicious filename break out of the `cat`
|
|
* argument and inject arbitrary shell. Lock the contract in tests.
|
|
*/
|
|
|
|
/** Pull the bash code that the helper would send to /exec, given
|
|
* the file_path that the model supplied. */
|
|
function execCodeFor(file_path) {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
return readSandboxFile({ file_path }).then(() => {
|
|
const postData = mockAxios.mock.calls[0][0].data;
|
|
return postData.code;
|
|
});
|
|
}
|
|
|
|
describe('shell quoting (security boundary)', () => {
|
|
it('wraps a plain filename in single quotes', async () => {
|
|
const code = await execCodeFor('/mnt/data/sentinel.txt');
|
|
expect(code).toBe(`cat '/mnt/data/sentinel.txt'`);
|
|
});
|
|
|
|
it("escapes a literal single-quote in the filename via the standard '\\'' sequence", async () => {
|
|
// Adversarial filename: `quote'breakout.txt`. Naive
|
|
// single-quoting would terminate the quoted string and
|
|
// inject the trailing `breakout.txt'` as shell tokens.
|
|
const code = await execCodeFor(`/mnt/data/quote'breakout.txt`);
|
|
// Expected escape: end the string, escape a literal quote,
|
|
// start a new string. POSIX-portable.
|
|
expect(code).toBe(`cat '/mnt/data/quote'\\''breakout.txt'`);
|
|
});
|
|
|
|
it('does not interpret command substitution syntax inside the quoted argument', async () => {
|
|
// `$(rm -rf /)` would expand if the path were unquoted or
|
|
// double-quoted. Inside POSIX single-quotes it stays literal.
|
|
const code = await execCodeFor('/mnt/data/$(rm -rf /).txt');
|
|
expect(code).toBe(`cat '/mnt/data/$(rm -rf /).txt'`);
|
|
});
|
|
|
|
it('does not expand backtick command substitution inside the quoted argument', async () => {
|
|
const code = await execCodeFor('/mnt/data/`whoami`.txt');
|
|
expect(code).toBe(`cat '/mnt/data/\`whoami\`.txt'`);
|
|
});
|
|
|
|
it('keeps newlines literal inside the quoted argument', async () => {
|
|
const code = await execCodeFor('/mnt/data/line1\nline2.txt');
|
|
expect(code).toBe(`cat '/mnt/data/line1\nline2.txt'`);
|
|
});
|
|
|
|
it('keeps spaces and other shell metacharacters literal', async () => {
|
|
const code = await execCodeFor('/mnt/data/file ; ls -la /etc/passwd');
|
|
expect(code).toBe(`cat '/mnt/data/file ; ls -la /etc/passwd'`);
|
|
});
|
|
|
|
it('handles multiple consecutive single-quotes', async () => {
|
|
const code = await execCodeFor(`a''b`);
|
|
// Each `'` becomes the 4-char escape sequence.
|
|
expect(code).toBe(`cat 'a'\\'''\\''b'`);
|
|
});
|
|
});
|
|
|
|
describe('payload shape', () => {
|
|
it('POSTs to /exec on the configured codeapi base URL with bash language', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: 'ok', stderr: '' } });
|
|
|
|
await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
const call = mockAxios.mock.calls[0][0];
|
|
expect(call.method).toBe('post');
|
|
expect(call.url).toBe('https://code-api.example.com/exec');
|
|
expect(call.data.lang).toBe('bash');
|
|
});
|
|
|
|
it('omits session_id and files when not provided', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
const data = mockAxios.mock.calls[0][0].data;
|
|
expect(data).not.toHaveProperty('session_id');
|
|
expect(data).not.toHaveProperty('files');
|
|
});
|
|
|
|
it('forwards session_id when provided so the read lands in the seeded sandbox', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
await readSandboxFile({
|
|
file_path: '/mnt/data/x.txt',
|
|
session_id: 'sess-XYZ',
|
|
});
|
|
|
|
expect(mockAxios.mock.calls[0][0].data.session_id).toBe('sess-XYZ');
|
|
});
|
|
|
|
it('forwards files (non-empty array) so prior-turn artifacts are mounted', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
const files = [{ id: 'f1', name: 'sentinel.txt', session_id: 'sess-XYZ' }];
|
|
await readSandboxFile({
|
|
file_path: '/mnt/data/sentinel.txt',
|
|
session_id: 'sess-XYZ',
|
|
files,
|
|
});
|
|
|
|
expect(mockAxios.mock.calls[0][0].data.files).toEqual(files);
|
|
});
|
|
|
|
it('omits files when an empty array is provided (cleaner payload)', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
await readSandboxFile({
|
|
file_path: '/mnt/data/x.txt',
|
|
session_id: 'sess-XYZ',
|
|
files: [],
|
|
});
|
|
|
|
expect(mockAxios.mock.calls[0][0].data).not.toHaveProperty('files');
|
|
});
|
|
|
|
it('uses dedicated keepAlive:false agents (matches processCodeOutput pool isolation)', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
const call = mockAxios.mock.calls[0][0];
|
|
expect(call.httpAgent).toBe(codeServerHttpAgent);
|
|
expect(call.httpsAgent).toBe(codeServerHttpsAgent);
|
|
});
|
|
});
|
|
|
|
describe('response handling', () => {
|
|
it('returns { content: stdout } on success', async () => {
|
|
mockAxios.mockResolvedValueOnce({
|
|
data: { stdout: 'sentinel-XYZ-1234\n', stderr: '' },
|
|
});
|
|
|
|
const result = await readSandboxFile({ file_path: '/mnt/data/sentinel.txt' });
|
|
|
|
expect(result).toEqual({ content: 'sentinel-XYZ-1234\n' });
|
|
});
|
|
|
|
it('returns null when getCodeBaseURL is not configured', async () => {
|
|
const { getCodeBaseURL } = require('@librechat/agents');
|
|
getCodeBaseURL.mockReturnValueOnce('');
|
|
|
|
const result = await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
expect(result).toBeNull();
|
|
expect(mockAxios).not.toHaveBeenCalled();
|
|
});
|
|
|
|
it('returns null when stdout is missing entirely (no content to surface)', async () => {
|
|
// stdout absent + no stderr = nothing to report; caller turns this
|
|
// into a model-visible "Failed to read" message.
|
|
mockAxios.mockResolvedValueOnce({ data: { stderr: '' } });
|
|
|
|
const result = await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
expect(result).toBeNull();
|
|
});
|
|
|
|
it('throws when the command writes to stderr with no stdout (exposes the error to the caller)', async () => {
|
|
mockAxios.mockResolvedValueOnce({
|
|
data: { stdout: '', stderr: 'cat: /mnt/data/missing.txt: No such file or directory\n' },
|
|
});
|
|
|
|
await expect(readSandboxFile({ file_path: '/mnt/data/missing.txt' })).rejects.toThrow(
|
|
'cat: /mnt/data/missing.txt: No such file or directory',
|
|
);
|
|
});
|
|
|
|
it('returns stdout even when stderr is also present (stdout wins on partial-success)', async () => {
|
|
// Some `cat` builds emit warnings on stderr while still producing
|
|
// stdout (e.g. unusual line endings). Surface the content.
|
|
mockAxios.mockResolvedValueOnce({
|
|
data: { stdout: 'partial', stderr: 'warning: ...' },
|
|
});
|
|
|
|
const result = await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
expect(result).toEqual({ content: 'partial' });
|
|
});
|
|
|
|
it('rethrows axios transport errors after logging via logAxiosError', async () => {
|
|
const { logAxiosError } = require('@librechat/api');
|
|
const transportError = Object.assign(new Error('connect ECONNREFUSED'), {
|
|
code: 'ECONNREFUSED',
|
|
});
|
|
mockAxios.mockRejectedValueOnce(transportError);
|
|
|
|
await expect(readSandboxFile({ file_path: '/mnt/data/x.txt' })).rejects.toBe(
|
|
transportError,
|
|
);
|
|
expect(logAxiosError).toHaveBeenCalledWith(
|
|
expect.objectContaining({
|
|
message: expect.stringContaining('/mnt/data/x.txt'),
|
|
error: transportError,
|
|
}),
|
|
);
|
|
});
|
|
});
|
|
|
|
describe('timeout', () => {
|
|
it('uses the same 15s timeout as processCodeOutput (consistent code-server SLA)', async () => {
|
|
mockAxios.mockResolvedValueOnce({ data: { stdout: '', stderr: '' } });
|
|
|
|
await readSandboxFile({ file_path: '/mnt/data/x.txt' });
|
|
|
|
expect(mockAxios.mock.calls[0][0].timeout).toBe(15000);
|
|
});
|
|
});
|
|
});
|
|
|
|
describe('primeFiles reupload pushes FRESH sandbox ids (Pass-N review P2)', () => {
|
|
/**
|
|
* Regression: when a primed code file is missing/expired in the
|
|
* sandbox (`getSessionInfo` returns null), `primeFiles` re-uploads
|
|
* the file via `handleFileUpload` and persists the new
|
|
* `fileIdentifier`. Before the fix, the in-memory `files[]` array
|
|
* (now consumed by `buildInitialToolSessions` to seed
|
|
* `Graph.sessions`) still received the STALE `(session_id, id)`
|
|
* parsed from the original `fileIdentifier` at the top of the
|
|
* loop. The DB record was correct but the seed referenced a
|
|
* sandbox object that no longer existed — the first tool call
|
|
* 404'd trying to mount it until the next turn re-read metadata.
|
|
*
|
|
* Fix: parse the FRESH `fileIdentifier` returned by upload and
|
|
* push those ids into both the dedupe Map and the seed list.
|
|
*/
|
|
|
|
const { getStrategyFunctions } = require('~/server/services/Files/strategies');
|
|
const { updateFile, getFiles } = require('~/models');
|
|
const { filterFilesByAgentAccess } = require('~/server/services/Files/permissions');
|
|
|
|
/**
|
|
* Mock the full strategy pair. `primeFiles` calls
|
|
* `getStrategyFunctions(file.source)` for the download stream and
|
|
* `getStrategyFunctions(FileSources.execute_code)` for the code-env
|
|
* upload — both go through the same factory in production.
|
|
*/
|
|
function setupReuploadMocks(newFileIdentifier) {
|
|
const handleFileUpload = jest.fn().mockResolvedValue(newFileIdentifier);
|
|
const getDownloadStream = jest.fn().mockResolvedValue('mock-stream');
|
|
getStrategyFunctions.mockImplementation((source) => {
|
|
if (source === 'execute_code') return { handleFileUpload };
|
|
return { getDownloadStream };
|
|
});
|
|
updateFile.mockResolvedValue({});
|
|
filterFilesByAgentAccess.mockImplementation(({ files }) => Promise.resolve(files));
|
|
// getSessionInfo is mocked at module level via mockAxios; return null
|
|
// to force the reupload path. Each `getSessionInfo` call hits axios.
|
|
mockAxios.mockResolvedValue({ data: null });
|
|
return { handleFileUpload, getDownloadStream };
|
|
}
|
|
|
|
it('seed receives FRESH session_id + id parsed off the new fileIdentifier on reupload', async () => {
|
|
const dbFile = {
|
|
file_id: 'librechat-file-id',
|
|
filename: 'sentinel.txt',
|
|
filepath: '/uploads/sentinel.txt',
|
|
source: 'local',
|
|
context: 'execute_code',
|
|
metadata: {
|
|
/* Stale sandbox ref — this is what `getSessionInfo` will 404 on. */
|
|
fileIdentifier: 'OLD_SESSION/OLD_ID',
|
|
},
|
|
};
|
|
getFiles.mockResolvedValue([dbFile]);
|
|
|
|
setupReuploadMocks('NEW_SESSION/NEW_ID');
|
|
|
|
const result = await primeFiles({
|
|
req: { user: { id: 'user-123', role: 'USER' } },
|
|
tool_resources: {
|
|
execute_code: { file_ids: ['librechat-file-id'], files: [] },
|
|
},
|
|
agentId: 'agent-id',
|
|
});
|
|
|
|
// The seed list (consumed by buildInitialToolSessions) MUST carry
|
|
// the post-reupload ids — not the stale pre-reupload ones.
|
|
expect(result.files).toEqual([
|
|
{ id: 'NEW_ID', session_id: 'NEW_SESSION', name: 'sentinel.txt' },
|
|
]);
|
|
});
|
|
|
|
it('persists the new fileIdentifier on the DB record (existing behavior, regression-locked)', async () => {
|
|
const dbFile = {
|
|
file_id: 'librechat-file-id',
|
|
filename: 'sentinel.txt',
|
|
filepath: '/uploads/sentinel.txt',
|
|
source: 'local',
|
|
context: 'execute_code',
|
|
metadata: { fileIdentifier: 'OLD_SESSION/OLD_ID' },
|
|
};
|
|
getFiles.mockResolvedValue([dbFile]);
|
|
|
|
setupReuploadMocks('NEW_SESSION/NEW_ID');
|
|
|
|
await primeFiles({
|
|
req: { user: { id: 'user-123', role: 'USER' } },
|
|
tool_resources: {
|
|
execute_code: { file_ids: ['librechat-file-id'], files: [] },
|
|
},
|
|
agentId: 'agent-id',
|
|
});
|
|
|
|
expect(updateFile).toHaveBeenCalledWith(
|
|
expect.objectContaining({
|
|
file_id: 'librechat-file-id',
|
|
metadata: expect.objectContaining({ fileIdentifier: 'NEW_SESSION/NEW_ID' }),
|
|
}),
|
|
);
|
|
});
|
|
});
|
|
});
|