mirror of
https://github.com/danny-avila/LibreChat.git
synced 2026-05-13 16:07:30 +00:00
* 🧱 refactor: typed CodeEnvRef + kind discriminator + tenant-aware sandbox cache Final cutover for the LibreChat ↔ codeapi sandbox file identity. Replaces the magic string `${session_id}/${file_id}?entity_id=...` with a typed, discriminated `CodeEnvRef`. Pre-release lockstep deploy with codeapi #1455 and agents #148; no legacy aliases retained. ## Final shape ```ts type CodeEnvRef = | { kind: 'skill'; id: string; storage_session_id: string; file_id: string; version: number } | { kind: 'agent'; id: string; storage_session_id: string; file_id: string } | { kind: 'user'; id: string; storage_session_id: string; file_id: string }; ``` `kind` drives codeapi's sessionKey: `<tenant>:<kind>:<id>[✌️<version>]` for shared kinds, `<tenant>:user:<userId>` for user-private (auth context provides `userId`). `version` is statically required for `kind: 'skill'` and forbidden otherwise via discriminated union — constraint holds at compile time on every consumer, not just codeapi's runtime validator. `id` is sessionKey-meaningful for `'skill'` / `'agent'`; informational only for `'user'` (codeapi resolves user identity from auth context). ## What changed - `packages/data-provider/src/codeEnvRef.ts` — discriminated union + `CODE_ENV_KINDS` const-tuple keeps the runtime list and TS union locked together. - Schemas: `metadata.codeEnvRef` and `SkillFile.codeEnvRef` enums tightened to `['skill', 'agent', 'user']`. - `primeSkillFiles` writes `kind: 'skill'`, `id: skill._id`, `version: skill.version`. Cache-hit path reads `codeEnvRef` directly. Bumping `skill.version` on edit naturally invalidates the prior cache entry under the new sessionKey. - `processCodeOutput` writes `kind: 'user'`, `id: req.user.id`. Output bucket is always user-scoped, regardless of which skill the execution invoked. New regression test pins the asymmetry. - `primeFiles` reupload preserves `kind`/`id`/`version?` from the existing ref so a skill-cache-miss reupload doesn't silently demote to user bucket. - `crud.js` upload functions (`uploadCodeEnvFile` / `batchUploadCodeEnvFiles`) thread `kind`/`id`/`version?` to the multipart form (codeapi #1455 option α). Without these on the wire, codeapi falls back to user bucketing and skill-cache invalidation never fires. Client-side validation mirrors codeapi's validator. - `Files/process.js` — chat attachments use `kind: 'user'`; agent setup files use `kind: 'agent'`. - Drops `entity_id` everywhere (struct, schema sub-docs, write paths, upload form fields). Drops `'system'` from the kind enum (no emitter ever existed). ## Test plan - [x] `cd packages/data-provider && npx jest src/codeEnvRef.spec` — 4 / 4 - [x] `cd packages/data-schemas && npx jest` — 1447 / 1447 - [x] `cd packages/api && npx jest src/agents` — 81 / 81 in skillFiles + handlers + resources - [x] `cd api && npx jest server/services/Files server/controllers/agents` — 436 / 436 - [x] `cd api && npx jest server/services/Files/Code` — 98 / 98 (incl. new "outputs are user-scoped regardless of which skill the execution invoked" regression and "reupload forwards kind/id/version from existing ref") - [x] `npx tsc --noEmit -p packages/data-{provider,schemas}/tsconfig.json && npx tsc --noEmit -p packages/api/tsconfig.json` — clean (only pre-existing unrelated dev errors in storage/balance, untouched here) ## Deploy notes - **24h cache-miss burst** on first deploy. Inputs (skill caches re-prime under new sessionKey shape) and outputs (any pre-Phase C skill-output cached files become unreadable). Bounded by codeapi's 24h TTL. - **Lockstep with codeapi #1455 and agents #148.** Either repo can land first since no aliases to drain, but the three deploys must overlap within the same maintenance window. - **`@librechat/agents` bump to `3.1.79-dev.0`** required after agents #148 lands and is published. ## What this enables Auth bridge work (JWT-based tenant/user identity between LC and codeapi) — codeapi now derives sessionKey purely from `req.codeApiAuthContext.{ tenantId, userId}`, so the next chapter is replacing the header-asserted user identity with a verified-claim path. * 🩹 fix: persist execute_code uploads under codeEnvRef metadata key Codex review P1 (chatgpt-codex-connector). `Files/process.js` was storing the upload result under `metadata.fileIdentifier` even though: - `uploadCodeEnvFile` now returns `{ storage_session_id, file_id }`, not the legacy magic string. - The post-cutover schema (`File.metadata.codeEnvRef`) only declares `codeEnvRef` — mongoose strict mode silently strips unknown keys. - All readers (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`, controller filtering) check `metadata.codeEnvRef`. Net effect of the bug: chat-attached and agent-setup execute_code files would lose their sandbox reference on save, and primeFiles would skip them on subsequent code-execution turns — the file blob would still be available locally but never re-mounted in the sandbox. Fix: construct the full `CodeEnvRef` (`{ kind, id, storage_session_id, file_id }`) at the write site and persist under `metadata.codeEnvRef`. `BaseClient`'s "is this a code-env file" presence check accepts the new shape alongside the legacy `fileIdentifier` for back-compat with any pre-cutover records still in the database. Mirrors the same change in `processAttachments.spec.ts` (which re-implements the BaseClient logic for testability). New regression tests in `process.spec.js` cover three cases: - chat attachments (`messageAttachment=true`) → `kind: 'user'` - agent setup (`messageAttachment=false`) → `kind: 'agent'` - legacy `fileIdentifier` key is NOT persisted (would be schema-stripped) * 🩹 fix: read storage_session_id on primed file refs (Codex P1) Codex review (chatgpt-codex-connector). After Phase B's per-file `session_id` → `storage_session_id` rename, `primeFiles` emits the new field — but `seedCodeFilesIntoSessions` was still reading `files[0].session_id` for the representative session and `f.session_id` for the dedupe key. In runs with only primed attachments (no skill seed), `representativeSessionId` was `undefined`, the function returned the unchanged map, and `seedCodeFilesIntoSessions` silently dropped the entire batch. The first `execute_code` call then started without `_injected_files` and the agent couldn't see prior-turn artifacts. Fix: - `codeFilesSession.ts`: read `f.storage_session_id` for both the dedupe key and the representative session id. JSDoc updated to match the new field name. - `callbacks.js`: the two output-file persistence paths read `file.session_id` to pass to `processCodeOutput` — switch to `file.storage_session_id`. The original comment explicitly says this should be the STORAGE session, which is exactly the field Phase B renamed. - `codeFilesSession.spec.ts`: fixture builder uses `storage_session_id` and `kind: 'user'` to match the post-cutover `CodeEnvFile` shape. Lockstep coordination: this matches the post-bump shape of `@librechat/agents` 3.1.79+. CI tsc errors against the currently-pinned 3.1.78 are expected and resolve when the dep bumps in this PR before merge. * 📦 chore: Bump `@librechat/agents` to version 3.1.80-dev.0 in package-lock and package.json files * 🪪 fix: thread kind/id/version through codeapi /download URLs (Phase C α) Symmetric fix for the upload-side wire change in 537725a. Codeapi's `sessionAuth` middleware now requires `kind`/`id`/`version?` on every download/freshness URL — without them it 400s with "kind must be one of: skill, agent, user" before serving the file. Three sites construct codeapi-side URLs that go through `sessionAuth`: - `processCodeOutput` (`Files/Code/process.js`): `/download/<sess>/<id>` for freshly-generated sandbox outputs. Always `kind: 'user'` + `id: req.user.id` — code-output files are always user-private, regardless of which skill the run invoked. - `getSessionInfo` (`Files/Code/process.js`): `/sessions/<sess>/objects/<id>` for the 23h freshness check. Pulls kind/id/version straight off the `codeEnvRef` already in scope — skill files stay skill-bucketed, user files stay user-bucketed. - `/code/download/:session_id/:fileId` LC route (`routes/files/files.js`): proxies to codeapi for manual downloads. Code-output files only on this route, so `kind: 'user'` + `id: req.user.id`. The `getCodeOutputDownloadStream` helper in `crud.js` now takes an `identity` param, validated by a `buildCodeEnvDownloadQuery` helper that mirrors `appendCodeEnvFileIdentity`'s shape rules: kind required from the closed `{skill, agent, user}` set, version required for 'skill' and forbidden otherwise. Bad callers fail fast on the client instead of round-tripping a 400. Also cleans up two log-noise sources reported alongside the 400: - `logAxiosError` in `packages/api/src/utils/axios.ts` was dumping `error.response.data` raw. With `responseType: 'arraybuffer'` that's a `Buffer` (~4 chars per byte after JSON-serialization); with `responseType: 'stream'` it's a `Readable` whose internal state serializes the entire ring buffer + socket. New `renderResponseData` decodes small buffers as UTF-8 (truncated past 2KB) and stubs streams as `'[stream]'`. Diagnostics stay useful, log lines stop being megabytes. - `/code/download` route's catch was bare `logger.error('...', error)`, bypassing the redactor. Switched to `logAxiosError` so it benefits from the same buffer/stream handling. Tests updated to match the new contract: - crud.spec: `getCodeOutputDownloadStream` fixtures pass `userIdentity`; new cases cover skill identity (with version), bad kind rejection, skill-without-version rejection. - process.spec: `getSessionInfo` test passes a full `codeEnvRef` object. * ♻️ refactor: extract codeEnv identity helpers into packages/api Per the project convention that new backend code lives in TypeScript under `packages/api`, moves `appendCodeEnvFileIdentity` and `buildCodeEnvDownloadQuery` from `api/server/services/Files/Code/crud.js` into a new `packages/api/src/files/code/identity.ts` module. Both helpers are pure validators that mirror codeapi's `parseUploadSessionKeyInput` server-side rules (closed kind set, `version` required for `'skill'` and forbidden otherwise) — they deserve TS support and a dedicated spec rather than living as JSDoc-typed helpers in the legacy `/api` workspace. The new module: - Exports a `CodeEnvIdentity` interface using the `librechat-data-provider` `CodeEnvKind` discriminated union. - Adds 13 unit tests in `identity.spec.ts` covering the validation matrix (skill+version, agent, user, and every rejection path) plus URL encoding for the download query. - Re-exported from `packages/api/src/files/code/index.ts` alongside `classify`, `extract`, and `form`. Consumer updates: - `api/server/services/Files/Code/crud.js`: drops the local helpers and imports them from `@librechat/api`. Net -64 lines. - `api/server/services/Files/Code/process.js`: same. - Test mocks for `@librechat/api` in three spec files now stub the helpers' validation behavior locally rather than pulling them through `requireActual` (which would drag in provider-config init-time side effects). The package's `exports` field only surfaces the root barrel, so leaf imports aren't reachable from legacy `/api` test setup. No runtime behavior change. Identity validation rules and emitted form/query shapes are byte-for-byte identical pre/post. * 🪪 fix: emit resource_id alongside id on _injected_files (skill 403 fix) Companion to codeapi #1455 fix and agents 3.1.80-dev.1 — the wire shape for shared-kind files now requires `resource_id` distinct from the storage `id`. Without this LC change, codeapi's sessionKey re-derivation on every shared-kind /exec rejects with 403 session_key_mismatch: cached: legacy:skill:69dcf561...✌️59 (signed at upload, skill _id) derived: legacy:skill:ysPwEURuPk-...✌️59 (storage nanoid) Emit sites updated: - `primeInvokedSkills` cache-hit path: `resource_id: ref.id` (the persisted skill `_id` from `codeEnvRef.id`); `id: ref.file_id` unchanged (storage uuid). - `primeInvokedSkills` fresh-upload path: `resource_id: skill._id.toString()` on every primed file (the `allPrimedFiles` builder type now carries the field). - `processCodeOutput`'s `pushFile` (Code/process.js): `resource_id: ref.id` — for `kind: 'user'` this is informational (codeapi derives sessionKey from auth context) but emitted for shape uniformity with shared kinds. Bumps `@librechat/agents` to `^3.1.80-dev.1` (the version that ships the matching `CodeEnvFile.resource_id` field). ## Test plan - [x] `cd packages/api && npx jest src/agents` — 67 / 67 pass (skillFiles fixtures updated to assert `resource_id` on the emitted CodeSessionContext.files). - [x] `cd api && npx jest server/services/Files server/controllers/agents` — 445 / 445 pass (process.spec fixtures updated for the reupload + cache-hit emission). - [x] `npx tsc --noEmit -p packages/api/tsconfig.json` — clean. * fix(skill-tool-call): carry resource_id through primeSkillFiles → artifact Codeapi was 400ing every /exec following a `handle_skill` tool call with `resource_id is invalid` (`type: 'undefined'`). Both code paths in `primeSkillFiles` (cache-hit + fresh-upload) returned files without `resource_id`/`kind`/`version`, and the artifact in `handlers.ts` forwarded the stripped shape into `tc.codeSessionContext.files` → `_injected_files`. `primeInvokedSkills` (the NL-detected loader) had already been fixed end-to-end; this commit aligns the tool-invoked path with the same contract: `resource_id` = `skill._id.toString()`, `kind: 'skill'`, `version` = the skill's monotonic counter. Tests added to `skillFiles.spec.ts` lock the contract on `primeSkillFiles` directly so future refactors can't silently drop the resource identity again. * fix(handlers.spec): align session_id → storage_session_id rename + kind discriminator Pre-existing TS errors against the post-rename `CodeEnvFile` shape: the test file still used `session_id` on per-file objects (renamed to `storage_session_id` in agents Phase B/C) and was missing the `kind` discriminator the discriminated union requires. Both inputs and the matching `expect.toEqual(...)` mirrors updated together so the runtime equality check still holds. Lines 723-732 stay as-is — they sit behind `as unknown as ToolCallRequest` and TS already skipped them. * chore: fix `@librechat/agents`, correct version to 3.1.80-dev.0 in package.json files * chore: bump `@librechat/agents` to version 3.1.80-dev.1 in package.json and package-lock.json * chore: bump `@librechat/agents` to version 3.1.80-dev.2 * feat(observability): trace file priming chain from primeCodeFiles to _injected_files Diagnosing the user-upload "files=[] on first /exec" bug requires seeing where in the LC chain a file ref disappears. Prior to this patch the chain (primeCodeFiles → primedCodeFiles → initialSessions → CodeSessionContext → _injected_files) was opaque end-to-end: - primeCodeFiles silently dropped files without `metadata.codeEnvRef` - reuploadFile catches all errors and continues with no signal - the handlers.ts handoff to codeapi never logged what it was sending After this patch, a single grep on `[primeCodeFiles]` plus `[code-env:inject]` shows the full per-file path: [primeCodeFiles] in: file_ids=N resourceFiles=M [primeCodeFiles] file=<id> path=skip reason=no-codeenvref filename=... [primeCodeFiles] file=<id> path=cache-hit-by-session storage_session_id=... [primeCodeFiles] file=<id> path=reupload reason=no-uploadtime ... [primeCodeFiles] file=<id> path=reupload reason=stale ... [primeCodeFiles] file=<id> path=reupload-success oldSession=... newSession=... newFileId=... [primeCodeFiles] file=<id> path=reupload-failed session=... [primeCodeFiles] file=<id> path=fresh-active storage_session_id=... [primeCodeFiles] out: returned=N skippedNoRef=M reuploadFailures=K [code-env:inject] tool=<name> files=N missingResourceId=K (debug) [code-env:inject] M/N files missing resource_id ... (warn) [code-env:inject] tool=<name> _injected_files=0 ... (warn) The boundary log warns when LC sends zero injected files on a code-execution tool call — that's the user's actual symptom showing up at the LC side instead of having to correlate against codeapi's `Request received { files: [] }`. Tag chosen as `[code-env:inject]` rather than `[handoff:exec]` to avoid collision with the app-level "handoff" semantic (subagent handoff workflow). Structural cleanup in primeFiles: replaced the `if (ref) { ... }` nesting with an early `if (!ref) continue` so the per-path instrumentation hooks land at top-level scope instead of indented inside a conditional. Behavior unchanged; pushFile / reuploadFile identical. Spec fixtures (handlers.spec.ts, codeFilesSession.spec.ts) updated to include `resource_id` on `CodeEnvFile` literals — required by the post-3.1.80-dev.2 type now installed. ## Test plan - [x] `cd packages/api && npx jest src/agents/handlers.spec.ts src/agents/codeFilesSession.spec.ts src/agents/skillFiles.spec.ts` — 69/69 pass - [x] `cd api && npx jest server/services/Files/Code/process.spec.js` — 84/84 pass - [x] `npx tsc --noEmit -p packages/api` — clean - [x] `npx eslint` on all four touched files — clean * chore: add CONSOLE_JSON_STRING_LENGTH to .env.example for JSON log string length configuration * fix(files): align codeapi upload filename with LC's sanitized DB filename User-attached files for code execution were uploading to codeapi under `file.originalname` (raw upload filename, may contain spaces / special chars) while LC's DB record stored the sanitized form (`sanitizeFilename(file.originalname)`, underscores). Codeapi preserves whatever filename the upload sent, so the sandbox saw `/mnt/data/<originalname>` while LC's `primeFiles` toolContext text + `_injected_files.name` referenced `file.filename` (sanitized). Visible failure: agent gets system prompt saying /mnt/data/librechat_code_api_-_active_customer_-_2025-11-05.xlsx …tries that path, hits `FileNotFoundError`, then notices the sandbox's actual `Available files` line says /mnt/data/librechat code api - active customer - 2025-11-05.xlsx …retries with spaces, succeeds. Wastes a tool call per upload and leaks raw filenames into model context. Fix: sanitize once and use the sanitized form in both the codeapi upload AND the LC DB record. Sandbox path = LC toolContext text = in-memory ref name. No drift. Reupload path (`Code/process.js` line 867 `filename: file.filename`) already uses the sanitized DB name, so it stays consistent with the fresh-upload path after this change. ## Test plan - [x] `cd api && npx jest server/services/Files/process` — 32/32 pass - [x] `npx eslint` on the touched file — clean * chore: bump `@librechat/agents` to version 3.1.80-dev.3 in package.json and package-lock.json
1189 lines
38 KiB
JavaScript
1189 lines
38 KiB
JavaScript
const fs = require('fs');
|
|
const path = require('path');
|
|
const mime = require('mime');
|
|
const { v4 } = require('uuid');
|
|
const {
|
|
isUUID,
|
|
megabyte,
|
|
FileContext,
|
|
FileSources,
|
|
imageExtRegex,
|
|
EModelEndpoint,
|
|
EToolResources,
|
|
mergeFileConfig,
|
|
AgentCapabilities,
|
|
checkOpenAIStorage,
|
|
removeNullishValues,
|
|
isAssistantsEndpoint,
|
|
getEndpointFileConfig,
|
|
documentParserMimeTypes,
|
|
} = require('librechat-data-provider');
|
|
const { logger } = require('@librechat/data-schemas');
|
|
const {
|
|
sanitizeFilename,
|
|
parseText,
|
|
processAudioFile,
|
|
getStorageMetadata,
|
|
} = require('@librechat/api');
|
|
const {
|
|
convertImage,
|
|
resizeAndConvert,
|
|
resizeImageBuffer,
|
|
} = require('~/server/services/Files/images');
|
|
const { addResourceFileId, deleteResourceFileId } = require('~/server/controllers/assistants/v2');
|
|
const { getOpenAIClient } = require('~/server/controllers/assistants/helpers');
|
|
const { loadAuthValues } = require('~/server/services/Tools/credentials');
|
|
const { getFileStrategy } = require('~/server/utils/getFileStrategy');
|
|
const { checkCapability } = require('~/server/services/Config');
|
|
const { LB_QueueAsyncCall } = require('~/server/utils/queue');
|
|
const { getStrategyFunctions } = require('./strategies');
|
|
const { determineFileType } = require('~/server/utils');
|
|
const { STTService } = require('./Audio/STTService');
|
|
const db = require('~/models');
|
|
|
|
/**
|
|
* Creates a modular file upload wrapper that ensures filename sanitization
|
|
* across all storage strategies. This prevents storage-specific implementations
|
|
* from having to handle sanitization individually.
|
|
*
|
|
* @param {Function} uploadFunction - The storage strategy's upload function
|
|
* @returns {Function} - Wrapped upload function with sanitization
|
|
*/
|
|
const createSanitizedUploadWrapper = (uploadFunction) => {
|
|
return async (params) => {
|
|
const { req, file, file_id, ...restParams } = params;
|
|
|
|
// Create a modified file object with sanitized original name
|
|
// This ensures consistent filename handling across all storage strategies
|
|
const sanitizedFile = {
|
|
...file,
|
|
originalname: sanitizeFilename(file.originalname),
|
|
};
|
|
|
|
return uploadFunction({ req, file: sanitizedFile, file_id, ...restParams });
|
|
};
|
|
};
|
|
|
|
/**
|
|
* Enqueues the delete operation to the leaky bucket queue if necessary, or adds it directly to promises.
|
|
*
|
|
* @param {object} params - The passed parameters.
|
|
* @param {ServerRequest} params.req - The express request object.
|
|
* @param {MongoFile} params.file - The file object to delete.
|
|
* @param {Function} params.deleteFile - The delete file function.
|
|
* @param {Promise[]} params.promises - The array of promises to await.
|
|
* @param {string[]} params.resolvedFileIds - The array of promises to await.
|
|
* @param {OpenAI | undefined} [params.openai] - If an OpenAI file, the initialized OpenAI client.
|
|
*/
|
|
function enqueueDeleteOperation({ req, file, deleteFile, promises, resolvedFileIds, openai }) {
|
|
if (checkOpenAIStorage(file.source)) {
|
|
// Enqueue to leaky bucket
|
|
promises.push(
|
|
new Promise((resolve, reject) => {
|
|
LB_QueueAsyncCall(
|
|
() => deleteFile(req, file, openai),
|
|
[],
|
|
(err, result) => {
|
|
if (err) {
|
|
logger.error('Error deleting file from OpenAI source', err);
|
|
reject(err);
|
|
} else {
|
|
resolvedFileIds.push(file.file_id);
|
|
resolve(result);
|
|
}
|
|
},
|
|
);
|
|
}),
|
|
);
|
|
} else {
|
|
// Add directly to promises
|
|
promises.push(
|
|
deleteFile(req, file)
|
|
.then(() => resolvedFileIds.push(file.file_id))
|
|
.catch((err) => {
|
|
logger.error('Error deleting file', err);
|
|
return Promise.reject(err);
|
|
}),
|
|
);
|
|
}
|
|
}
|
|
|
|
// TODO: refactor as currently only image files can be deleted this way
|
|
// as other filetypes will not reside in public path
|
|
/**
|
|
* Deletes a list of files from the server filesystem and the database.
|
|
*
|
|
* @param {Object} params - The params object.
|
|
* @param {MongoFile[]} params.files - The file objects to delete.
|
|
* @param {ServerRequest} params.req - The express request object.
|
|
* @param {DeleteFilesBody} params.req.body - The request body.
|
|
* @param {string} [params.req.body.agent_id] - The agent ID if file uploaded is associated to an agent.
|
|
* @param {string} [params.req.body.assistant_id] - The assistant ID if file uploaded is associated to an assistant.
|
|
* @param {string} [params.req.body.tool_resource] - The tool resource if assistant file uploaded is associated to a tool resource.
|
|
*
|
|
* @returns {Promise<void>}
|
|
*/
|
|
const processDeleteRequest = async ({ req, files }) => {
|
|
const appConfig = req.config;
|
|
const resolvedFileIds = [];
|
|
const deletionMethods = {};
|
|
const promises = [];
|
|
|
|
/** @type {Record<string, OpenAI | undefined>} */
|
|
const client = { [FileSources.openai]: undefined, [FileSources.azure]: undefined };
|
|
const initializeClients = async () => {
|
|
if (appConfig.endpoints?.[EModelEndpoint.assistants]) {
|
|
const openAIClient = await getOpenAIClient({
|
|
req,
|
|
overrideEndpoint: EModelEndpoint.assistants,
|
|
});
|
|
client[FileSources.openai] = openAIClient.openai;
|
|
}
|
|
|
|
if (!appConfig.endpoints?.[EModelEndpoint.azureOpenAI]?.assistants) {
|
|
return;
|
|
}
|
|
|
|
const azureClient = await getOpenAIClient({
|
|
req,
|
|
overrideEndpoint: EModelEndpoint.azureAssistants,
|
|
});
|
|
client[FileSources.azure] = azureClient.openai;
|
|
};
|
|
|
|
if (req.body.assistant_id !== undefined) {
|
|
await initializeClients();
|
|
}
|
|
|
|
const agentFiles = [];
|
|
|
|
for (const file of files) {
|
|
const source = file.source ?? FileSources.local;
|
|
if (req.body.agent_id && req.body.tool_resource) {
|
|
agentFiles.push({
|
|
tool_resource: req.body.tool_resource,
|
|
file_id: file.file_id,
|
|
});
|
|
}
|
|
|
|
if (source === FileSources.text) {
|
|
resolvedFileIds.push(file.file_id);
|
|
continue;
|
|
}
|
|
|
|
if (checkOpenAIStorage(source) && !client[source]) {
|
|
await initializeClients();
|
|
}
|
|
|
|
const openai = client[source];
|
|
|
|
if (req.body.assistant_id && req.body.tool_resource) {
|
|
promises.push(
|
|
deleteResourceFileId({
|
|
req,
|
|
openai,
|
|
file_id: file.file_id,
|
|
assistant_id: req.body.assistant_id,
|
|
tool_resource: req.body.tool_resource,
|
|
}),
|
|
);
|
|
} else if (req.body.assistant_id) {
|
|
promises.push(openai.beta.assistants.files.del(req.body.assistant_id, file.file_id));
|
|
}
|
|
|
|
if (deletionMethods[source]) {
|
|
enqueueDeleteOperation({
|
|
req,
|
|
file,
|
|
deleteFile: deletionMethods[source],
|
|
promises,
|
|
resolvedFileIds,
|
|
openai,
|
|
});
|
|
continue;
|
|
}
|
|
|
|
const { deleteFile } = getStrategyFunctions(source);
|
|
if (!deleteFile) {
|
|
throw new Error(`Delete function not implemented for ${source}`);
|
|
}
|
|
|
|
deletionMethods[source] = deleteFile;
|
|
enqueueDeleteOperation({ req, file, deleteFile, promises, resolvedFileIds, openai });
|
|
}
|
|
|
|
if (agentFiles.length > 0) {
|
|
promises.push(
|
|
db.removeAgentResourceFiles({
|
|
agent_id: req.body.agent_id,
|
|
files: agentFiles,
|
|
}),
|
|
);
|
|
}
|
|
|
|
await Promise.allSettled(promises);
|
|
await db.deleteFiles(resolvedFileIds);
|
|
|
|
if (resolvedFileIds.length > 0) {
|
|
try {
|
|
await db.removeAgentResourceFilesFromAllAgents({ file_ids: resolvedFileIds });
|
|
} catch (error) {
|
|
logger.error('Error cleaning up orphaned agent file references', error);
|
|
}
|
|
}
|
|
};
|
|
|
|
/**
|
|
* Processes a file URL using a specified file handling strategy. This function accepts a strategy name,
|
|
* fetches the corresponding file processing functions (for saving and retrieving file URLs), and then
|
|
* executes these functions in sequence. It first saves the file using the provided URL and then retrieves
|
|
* the URL of the saved file. If any error occurs during this process, it logs the error and throws an
|
|
* exception with an appropriate message.
|
|
*
|
|
* @param {Object} params - The parameters object.
|
|
* @param {FileSources} params.fileStrategy - The file handling strategy to use.
|
|
* Must be a value from the `FileSources` enum, which defines different file
|
|
* handling strategies (like saving to Firebase, local storage, etc.).
|
|
* @param {string} params.userId - The user's unique identifier. Used for creating user-specific paths or
|
|
* references in the file handling process.
|
|
* @param {string} params.URL - The URL of the file to be processed.
|
|
* @param {string} params.fileName - The name that will be used to save the file (including extension)
|
|
* @param {string} params.basePath - The base path or directory where the file will be saved or retrieved from.
|
|
* @param {FileContext} params.context - The context of the file (e.g., 'avatar', 'image_generation', etc.)
|
|
* @param {string} [params.tenantId] - Optional tenant identifier for tenant-prefixed storage paths.
|
|
* @returns {Promise<MongoFile>} A promise that resolves to the DB representation (MongoFile)
|
|
* of the processed file. It throws an error if the file processing fails at any stage.
|
|
*/
|
|
const processFileURL = async ({
|
|
fileStrategy,
|
|
userId,
|
|
URL,
|
|
fileName,
|
|
basePath,
|
|
context,
|
|
tenantId,
|
|
}) => {
|
|
const { saveURL, getFileURL } = getStrategyFunctions(fileStrategy);
|
|
try {
|
|
const savedFile = await saveURL({ userId, URL, fileName, basePath, tenantId });
|
|
if (!savedFile) {
|
|
throw new Error(`Strategy "${fileStrategy}" did not save "${fileName}"`);
|
|
}
|
|
|
|
const {
|
|
bytes = 0,
|
|
type = '',
|
|
dimensions = {},
|
|
} = typeof savedFile === 'string' ? {} : savedFile;
|
|
const fallbackFileName =
|
|
fileStrategy === FileSources.local || fileStrategy === FileSources.firebase
|
|
? `${userId}/${fileName}`
|
|
: fileName;
|
|
const filepath =
|
|
typeof savedFile === 'string'
|
|
? savedFile
|
|
: (savedFile.filepath ??
|
|
(await getFileURL({ userId, fileName: fallbackFileName, basePath, tenantId })));
|
|
if (!filepath) {
|
|
throw new Error(`Strategy "${fileStrategy}" did not return a file URL for "${fileName}"`);
|
|
}
|
|
const storageMetadata = getStorageMetadata({
|
|
filepath,
|
|
source: fileStrategy,
|
|
storageKey: typeof savedFile === 'string' ? undefined : savedFile.storageKey,
|
|
storageRegion: typeof savedFile === 'string' ? undefined : savedFile.storageRegion,
|
|
});
|
|
|
|
return await db.createFile(
|
|
{
|
|
user: userId,
|
|
file_id: v4(),
|
|
bytes,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename: fileName,
|
|
source: fileStrategy,
|
|
type,
|
|
context,
|
|
tenantId,
|
|
width: dimensions.width,
|
|
height: dimensions.height,
|
|
},
|
|
true,
|
|
);
|
|
} catch (error) {
|
|
logger.error(`Error while processing the image with ${fileStrategy}:`, error);
|
|
throw new Error(`Failed to process the image with ${fileStrategy}. ${error.message}`);
|
|
}
|
|
};
|
|
|
|
/**
|
|
* Applies the current strategy for image uploads.
|
|
* Saves file metadata to the database with an expiry TTL.
|
|
*
|
|
* @param {Object} params - The parameters object.
|
|
* @param {ServerRequest} params.req - The Express request object.
|
|
* @param {Express.Response} [params.res] - The Express response object.
|
|
* @param {ImageMetadata} params.metadata - Additional metadata for the file.
|
|
* @param {boolean} params.returnFile - Whether to return the file metadata or return response as normal.
|
|
* @returns {Promise<void>}
|
|
*/
|
|
const processImageFile = async ({ req, res, metadata, returnFile = false }) => {
|
|
const { file } = req;
|
|
const appConfig = req.config;
|
|
const source = getFileStrategy(appConfig, { isImage: true });
|
|
const { handleImageUpload } = getStrategyFunctions(source);
|
|
const { file_id, temp_file_id, endpoint } = metadata;
|
|
|
|
const { filepath, bytes, width, height, storageKey, storageRegion } = await handleImageUpload({
|
|
req,
|
|
file,
|
|
file_id,
|
|
endpoint,
|
|
});
|
|
const storageMetadata = getStorageMetadata({ filepath, source, storageKey, storageRegion });
|
|
|
|
const result = await db.createFile(
|
|
{
|
|
user: req.user.id,
|
|
file_id,
|
|
temp_file_id,
|
|
bytes,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename: file.originalname,
|
|
context: FileContext.message_attachment,
|
|
source,
|
|
type: `image/${appConfig.imageOutputType}`,
|
|
width,
|
|
height,
|
|
tenantId: req.user.tenantId,
|
|
},
|
|
true,
|
|
);
|
|
|
|
if (returnFile) {
|
|
return result;
|
|
}
|
|
res.status(200).json({ message: 'File uploaded and processed successfully', ...result });
|
|
};
|
|
|
|
/**
|
|
* Applies the current strategy for image uploads and
|
|
* returns minimal file metadata, without saving to the database.
|
|
*
|
|
* @param {Object} params - The parameters object.
|
|
* @param {ServerRequest} params.req - The Express request object.
|
|
* @param {FileContext} params.context - The context of the file (e.g., 'avatar', 'image_generation', etc.)
|
|
* @param {boolean} [params.resize=true] - Whether to resize and convert the image to target format. Default is `true`.
|
|
* @param {{ buffer: Buffer, width: number, height: number, bytes: number, filename: string, type: string, file_id: string }} [params.metadata] - Required metadata for the file if resize is false.
|
|
* @returns {Promise<{ filepath: string, filename: string, source: string, type: string}>}
|
|
*/
|
|
const uploadImageBuffer = async ({ req, context, metadata = {}, resize = true }) => {
|
|
const appConfig = req.config;
|
|
const source = getFileStrategy(appConfig, { isImage: true });
|
|
const { saveBuffer } = getStrategyFunctions(source);
|
|
let { buffer, width, height, bytes, filename, file_id, type } = metadata;
|
|
if (resize) {
|
|
file_id = v4();
|
|
type = `image/${appConfig.imageOutputType}`;
|
|
({ buffer, width, height, bytes } = await resizeAndConvert({
|
|
inputBuffer: buffer,
|
|
desiredFormat: appConfig.imageOutputType,
|
|
}));
|
|
filename = `${path.basename(req.file.originalname, path.extname(req.file.originalname))}.${
|
|
appConfig.imageOutputType
|
|
}`;
|
|
}
|
|
const fileName = `${file_id}-${filename}`;
|
|
const filepath = await saveBuffer({
|
|
userId: req.user.id,
|
|
fileName,
|
|
buffer,
|
|
tenantId: req.user.tenantId,
|
|
});
|
|
const storageMetadata = getStorageMetadata({ filepath, source });
|
|
return await db.createFile(
|
|
{
|
|
user: req.user.id,
|
|
file_id,
|
|
bytes,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename,
|
|
context,
|
|
source,
|
|
type,
|
|
width,
|
|
height,
|
|
tenantId: req.user.tenantId,
|
|
},
|
|
true,
|
|
);
|
|
};
|
|
|
|
/**
|
|
* Applies the current strategy for file uploads.
|
|
* Saves file metadata to the database with an expiry TTL.
|
|
* Files must be deleted from the server filesystem manually.
|
|
*
|
|
* @param {Object} params - The parameters object.
|
|
* @param {ServerRequest} params.req - The Express request object.
|
|
* @param {Express.Response} params.res - The Express response object.
|
|
* @param {FileMetadata} params.metadata - Additional metadata for the file.
|
|
* @returns {Promise<void>}
|
|
*/
|
|
const processFileUpload = async ({ req, res, metadata }) => {
|
|
const appConfig = req.config;
|
|
const isAssistantUpload = isAssistantsEndpoint(metadata.endpoint);
|
|
const assistantSource =
|
|
metadata.endpoint === EModelEndpoint.azureAssistants ? FileSources.azure : FileSources.openai;
|
|
// Use the configured file strategy for regular file uploads (not vectordb)
|
|
const source = isAssistantUpload ? assistantSource : appConfig.fileStrategy;
|
|
const { handleFileUpload } = getStrategyFunctions(source);
|
|
const { file_id, temp_file_id = null } = metadata;
|
|
|
|
/** @type {OpenAI | undefined} */
|
|
let openai;
|
|
if (checkOpenAIStorage(source)) {
|
|
({ openai } = await getOpenAIClient({ req }));
|
|
}
|
|
|
|
const { file } = req;
|
|
const sanitizedUploadFn = createSanitizedUploadWrapper(handleFileUpload);
|
|
const {
|
|
id,
|
|
bytes,
|
|
filename,
|
|
filepath: _filepath,
|
|
storageKey: _storageKey,
|
|
storageRegion: _storageRegion,
|
|
embedded,
|
|
height,
|
|
width,
|
|
} = await sanitizedUploadFn({
|
|
req,
|
|
file,
|
|
file_id,
|
|
openai,
|
|
});
|
|
|
|
if (isAssistantUpload && !metadata.message_file && !metadata.tool_resource) {
|
|
await openai.beta.assistants.files.create(metadata.assistant_id, {
|
|
file_id: id,
|
|
});
|
|
} else if (isAssistantUpload && !metadata.message_file) {
|
|
await addResourceFileId({
|
|
req,
|
|
openai,
|
|
file_id: id,
|
|
assistant_id: metadata.assistant_id,
|
|
tool_resource: metadata.tool_resource,
|
|
});
|
|
}
|
|
|
|
let filepath = isAssistantUpload ? `${openai.baseURL}/files/${id}` : _filepath;
|
|
let storageMetadata = getStorageMetadata({
|
|
filepath,
|
|
source,
|
|
storageKey: _storageKey,
|
|
storageRegion: _storageRegion,
|
|
});
|
|
if (isAssistantUpload && file.mimetype.startsWith('image')) {
|
|
const result = await processImageFile({
|
|
req,
|
|
file,
|
|
metadata: { file_id: v4() },
|
|
returnFile: true,
|
|
});
|
|
filepath = result.filepath;
|
|
storageMetadata = getStorageMetadata({
|
|
filepath,
|
|
source: result.source,
|
|
storageKey: result.storageKey,
|
|
storageRegion: result.storageRegion,
|
|
});
|
|
}
|
|
|
|
const result = await db.createFile(
|
|
{
|
|
user: req.user.id,
|
|
file_id: id ?? file_id,
|
|
temp_file_id,
|
|
bytes,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename: filename ?? sanitizeFilename(file.originalname),
|
|
context: isAssistantUpload ? FileContext.assistants : FileContext.message_attachment,
|
|
model: isAssistantUpload ? req.body.model : undefined,
|
|
type: file.mimetype,
|
|
embedded,
|
|
source,
|
|
height,
|
|
width,
|
|
tenantId: req.user.tenantId,
|
|
},
|
|
true,
|
|
);
|
|
res.status(200).json({ message: 'File uploaded and processed successfully', ...result });
|
|
};
|
|
|
|
/**
|
|
* Applies the current strategy for file uploads.
|
|
* Saves file metadata to the database with an expiry TTL.
|
|
* Files must be deleted from the server filesystem manually.
|
|
*
|
|
* @param {Object} params - The parameters object.
|
|
* @param {ServerRequest} params.req - The Express request object.
|
|
* @param {Express.Response} params.res - The Express response object.
|
|
* @param {FileMetadata} params.metadata - Additional metadata for the file.
|
|
* @returns {Promise<void>}
|
|
*/
|
|
const processAgentFileUpload = async ({ req, res, metadata }) => {
|
|
const { file } = req;
|
|
const appConfig = req.config;
|
|
const { agent_id, tool_resource, file_id, temp_file_id = null } = metadata;
|
|
|
|
let messageAttachment = !!metadata.message_file;
|
|
|
|
if (agent_id && !tool_resource && !messageAttachment) {
|
|
throw new Error('No tool resource provided for agent file upload');
|
|
}
|
|
|
|
if (tool_resource === EToolResources.file_search && file.mimetype.startsWith('image')) {
|
|
throw new Error('Image uploads are not supported for file search tool resources');
|
|
}
|
|
|
|
if (!messageAttachment && !agent_id) {
|
|
throw new Error('No agent ID provided for agent file upload');
|
|
}
|
|
|
|
const isImage = file.mimetype.startsWith('image');
|
|
let fileInfoMetadata;
|
|
const entity_id = messageAttachment === true ? undefined : agent_id;
|
|
const basePath = mime.getType(file.originalname)?.startsWith('image') ? 'images' : 'uploads';
|
|
if (tool_resource === EToolResources.execute_code) {
|
|
const isCodeEnabled = await checkCapability(req, AgentCapabilities.execute_code);
|
|
if (!isCodeEnabled) {
|
|
throw new Error('Code execution is not enabled for Agents');
|
|
}
|
|
const { handleFileUpload: uploadCodeEnvFile } = getStrategyFunctions(FileSources.execute_code);
|
|
const stream = fs.createReadStream(file.path);
|
|
/* Resource identity for codeapi's sessionKey:
|
|
* - chat attachments (messageAttachment=true): `kind: 'user'`, codeapi
|
|
* buckets under `<tenant>:user:<authContext.userId>` regardless of `id`.
|
|
* - agent setup files (messageAttachment=false): `kind: 'agent'`, shared
|
|
* per agent identity. `id` carries the agent id. */
|
|
const codeKind = messageAttachment === true ? 'user' : 'agent';
|
|
const codeId = messageAttachment === true ? req.user.id : agent_id;
|
|
/* Upload under the same sanitized filename LC stores in its DB
|
|
* (`fileInfo.filename` below uses `sanitizeFilename(originalname)`).
|
|
* Codeapi/file_server use this as the on-disk name in the sandbox
|
|
* — `/mnt/data/<filename>` — and `primeFiles`'s `toolContext` text
|
|
* + `_injected_files.name` both reference `file.filename`. Sending
|
|
* the unsanitized `file.originalname` here makes the sandbox path
|
|
* (with spaces / special chars) drift from what LC tells the model
|
|
* is available, causing FileNotFoundError on the first reference. */
|
|
const sandboxFilename = sanitizeFilename(file.originalname);
|
|
const uploaded = await uploadCodeEnvFile({
|
|
req,
|
|
stream,
|
|
filename: sandboxFilename,
|
|
kind: codeKind,
|
|
id: codeId,
|
|
});
|
|
/* Persist under the structured `codeEnvRef` shape — the only key the
|
|
* post-cutover schema (`metadata.codeEnvRef`) and downstream readers
|
|
* (`primeFiles`, `getCodeFilesByIds`, `categorizeFileForToolResources`,
|
|
* controller filtering) accept. Storing under the legacy
|
|
* `fileIdentifier` key would be silently dropped by mongoose strict
|
|
* mode and the file would lose its sandbox reference on subsequent
|
|
* priming turns. */
|
|
fileInfoMetadata = {
|
|
codeEnvRef: {
|
|
kind: codeKind,
|
|
id: codeId,
|
|
storage_session_id: uploaded.storage_session_id,
|
|
file_id: uploaded.file_id,
|
|
},
|
|
};
|
|
} else if (tool_resource === EToolResources.file_search) {
|
|
const isFileSearchEnabled = await checkCapability(req, AgentCapabilities.file_search);
|
|
if (!isFileSearchEnabled) {
|
|
throw new Error('File search is not enabled for Agents');
|
|
}
|
|
// Note: File search processing continues to dual storage logic below
|
|
} else if (tool_resource === EToolResources.context) {
|
|
const { file_id, temp_file_id = null } = metadata;
|
|
|
|
/**
|
|
* @param {object} params
|
|
* @param {string} params.text
|
|
* @param {number} params.bytes
|
|
* @param {string} params.filepath
|
|
* @param {string} params.type
|
|
* @return {Promise<void>}
|
|
*/
|
|
const createTextFile = async ({ text, bytes, filepath, type = 'text/plain' }) => {
|
|
const textBytes = Buffer.byteLength(text, 'utf8');
|
|
if (textBytes > 15 * megabyte) {
|
|
throw new Error(
|
|
`Extracted text from "${file.originalname}" exceeds the 15MB storage limit (${Math.round(textBytes / megabyte)}MB). Try a shorter document.`,
|
|
);
|
|
}
|
|
const fileInfo = removeNullishValues({
|
|
text,
|
|
bytes,
|
|
file_id,
|
|
temp_file_id,
|
|
user: req.user.id,
|
|
type,
|
|
filepath: filepath ?? file.path,
|
|
source: FileSources.text,
|
|
filename: file.originalname,
|
|
model: messageAttachment ? undefined : req.body.model,
|
|
context: messageAttachment ? FileContext.message_attachment : FileContext.agents,
|
|
tenantId: req.user.tenantId,
|
|
});
|
|
|
|
if (!messageAttachment && tool_resource) {
|
|
await db.addAgentResourceFile({
|
|
file_id,
|
|
agent_id,
|
|
tool_resource,
|
|
updatingUserId: req?.user?.id,
|
|
});
|
|
}
|
|
const result = await db.createFile(fileInfo, true);
|
|
return res
|
|
.status(200)
|
|
.json({ message: 'Agent file uploaded and processed successfully', ...result });
|
|
};
|
|
|
|
const fileConfig = mergeFileConfig(appConfig.fileConfig);
|
|
|
|
const shouldUseConfiguredOCR =
|
|
appConfig?.ocr != null &&
|
|
fileConfig.checkType(file.mimetype, fileConfig.ocr?.supportedMimeTypes || []);
|
|
|
|
const shouldUseDocumentParser =
|
|
!shouldUseConfiguredOCR && documentParserMimeTypes.some((regex) => regex.test(file.mimetype));
|
|
|
|
const shouldUseOCR = shouldUseConfiguredOCR || shouldUseDocumentParser;
|
|
|
|
const resolveDocumentText = async () => {
|
|
if (shouldUseConfiguredOCR) {
|
|
try {
|
|
const ocrStrategy = appConfig?.ocr?.strategy ?? FileSources.document_parser;
|
|
const { handleFileUpload } = getStrategyFunctions(ocrStrategy);
|
|
return await handleFileUpload({ req, file, loadAuthValues });
|
|
} catch (err) {
|
|
logger.error(
|
|
`[processAgentFileUpload] Configured OCR failed for "${file.originalname}", falling back to document_parser:`,
|
|
err,
|
|
);
|
|
}
|
|
}
|
|
try {
|
|
const { handleFileUpload } = getStrategyFunctions(FileSources.document_parser);
|
|
return await handleFileUpload({ req, file, loadAuthValues });
|
|
} catch (err) {
|
|
logger.error(
|
|
`[processAgentFileUpload] Document parser failed for "${file.originalname}":`,
|
|
err,
|
|
);
|
|
}
|
|
};
|
|
|
|
if (shouldUseConfiguredOCR && !(await checkCapability(req, AgentCapabilities.ocr))) {
|
|
throw new Error('OCR capability is not enabled for Agents');
|
|
}
|
|
|
|
if (shouldUseOCR) {
|
|
const ocrResult = await resolveDocumentText();
|
|
if (ocrResult) {
|
|
const { text, bytes, filepath: ocrFileURL } = ocrResult;
|
|
return await createTextFile({ text, bytes, filepath: ocrFileURL });
|
|
}
|
|
throw new Error(
|
|
`Unable to extract text from "${file.originalname}". The document may be image-based and requires an OCR service to process.`,
|
|
);
|
|
}
|
|
|
|
const shouldUseSTT = fileConfig.checkType(
|
|
file.mimetype,
|
|
fileConfig.stt?.supportedMimeTypes || [],
|
|
);
|
|
|
|
if (shouldUseSTT) {
|
|
const sttService = await STTService.getInstance();
|
|
const { text, bytes } = await processAudioFile({ req, file, sttService });
|
|
return await createTextFile({ text, bytes });
|
|
}
|
|
|
|
const shouldUseText = fileConfig.checkType(
|
|
file.mimetype,
|
|
fileConfig.text?.supportedMimeTypes || [],
|
|
);
|
|
|
|
if (!shouldUseText) {
|
|
throw new Error(`File type ${file.mimetype} is not supported for text parsing.`);
|
|
}
|
|
|
|
const { text, bytes } = await parseText({ req, file, file_id });
|
|
return await createTextFile({ text, bytes, type: file.mimetype });
|
|
}
|
|
|
|
// Dual storage pattern for RAG files: Storage + Vector DB
|
|
let storageResult, embeddingResult;
|
|
const isImageFile = file.mimetype.startsWith('image');
|
|
const source = getFileStrategy(appConfig, { isImage: isImageFile });
|
|
|
|
if (tool_resource === EToolResources.file_search) {
|
|
// FIRST: Upload to Storage for permanent backup (S3/local/etc.)
|
|
const { handleFileUpload } = getStrategyFunctions(source);
|
|
const sanitizedUploadFn = createSanitizedUploadWrapper(handleFileUpload);
|
|
storageResult = await sanitizedUploadFn({
|
|
req,
|
|
file,
|
|
file_id,
|
|
basePath,
|
|
entity_id,
|
|
});
|
|
|
|
// SECOND: Upload to Vector DB
|
|
const { uploadVectors } = require('./VectorDB/crud');
|
|
|
|
embeddingResult = await uploadVectors({
|
|
req,
|
|
file,
|
|
file_id,
|
|
entity_id,
|
|
});
|
|
|
|
// Vector status will be stored at root level, no need for metadata
|
|
fileInfoMetadata = {};
|
|
} else {
|
|
// Standard single storage for non-RAG files
|
|
const { handleFileUpload } = getStrategyFunctions(source);
|
|
const sanitizedUploadFn = createSanitizedUploadWrapper(handleFileUpload);
|
|
storageResult = await sanitizedUploadFn({
|
|
req,
|
|
file,
|
|
file_id,
|
|
basePath,
|
|
entity_id,
|
|
});
|
|
}
|
|
|
|
let {
|
|
bytes,
|
|
filename,
|
|
filepath: _filepath,
|
|
storageKey: _storageKey,
|
|
storageRegion: _storageRegion,
|
|
height,
|
|
width,
|
|
} = storageResult;
|
|
// For RAG files, use embedding result; for others, use storage result
|
|
let embedded = storageResult.embedded;
|
|
if (tool_resource === EToolResources.file_search) {
|
|
embedded = embeddingResult?.embedded;
|
|
filename = embeddingResult?.filename || filename;
|
|
}
|
|
|
|
let filepath = _filepath;
|
|
let storageMetadata = getStorageMetadata({
|
|
filepath,
|
|
source,
|
|
storageKey: _storageKey,
|
|
storageRegion: _storageRegion,
|
|
});
|
|
|
|
if (!messageAttachment && tool_resource) {
|
|
await db.addAgentResourceFile({
|
|
file_id,
|
|
agent_id,
|
|
tool_resource,
|
|
updatingUserId: req?.user?.id,
|
|
});
|
|
}
|
|
|
|
if (isImage) {
|
|
const result = await processImageFile({
|
|
req,
|
|
file,
|
|
metadata: { file_id: v4() },
|
|
returnFile: true,
|
|
});
|
|
filepath = result.filepath;
|
|
storageMetadata = getStorageMetadata({
|
|
filepath,
|
|
source: result.source,
|
|
storageKey: result.storageKey,
|
|
storageRegion: result.storageRegion,
|
|
});
|
|
}
|
|
|
|
const fileInfo = removeNullishValues({
|
|
user: req.user.id,
|
|
file_id,
|
|
temp_file_id,
|
|
bytes,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename: filename ?? sanitizeFilename(file.originalname),
|
|
context: messageAttachment ? FileContext.message_attachment : FileContext.agents,
|
|
model: messageAttachment ? undefined : req.body.model,
|
|
metadata: fileInfoMetadata,
|
|
type: file.mimetype,
|
|
embedded,
|
|
source,
|
|
height,
|
|
width,
|
|
tenantId: req.user.tenantId,
|
|
});
|
|
|
|
const result = await db.createFile(fileInfo, true);
|
|
|
|
res.status(200).json({ message: 'Agent file uploaded and processed successfully', ...result });
|
|
};
|
|
|
|
/**
|
|
* @param {object} params - The params object.
|
|
* @param {OpenAI} params.openai - The OpenAI client instance.
|
|
* @param {string} params.file_id - The ID of the file to retrieve.
|
|
* @param {string} params.userId - The user ID.
|
|
* @param {string} [params.filename] - The name of the file. `undefined` for `file_citation` annotations.
|
|
* @param {boolean} [params.saveFile=false] - Whether to save the file metadata to the database.
|
|
* @param {boolean} [params.updateUsage=false] - Whether to update file usage in database.
|
|
*/
|
|
const processOpenAIFile = async ({
|
|
openai,
|
|
file_id,
|
|
userId,
|
|
filename,
|
|
saveFile = false,
|
|
updateUsage = false,
|
|
}) => {
|
|
const _file = await openai.files.retrieve(file_id);
|
|
const originalName = filename ?? (_file.filename ? path.basename(_file.filename) : undefined);
|
|
const filepath = `${openai.baseURL}/files/${userId}/${file_id}${
|
|
originalName ? `/${originalName}` : ''
|
|
}`;
|
|
const type = mime.getType(originalName ?? file_id);
|
|
const source =
|
|
openai.req.body.endpoint === EModelEndpoint.azureAssistants
|
|
? FileSources.azure
|
|
: FileSources.openai;
|
|
const file = {
|
|
..._file,
|
|
type,
|
|
file_id,
|
|
filepath,
|
|
usage: 1,
|
|
user: userId,
|
|
context: _file.purpose,
|
|
source,
|
|
model: openai.req.body.model,
|
|
filename: originalName ?? file_id,
|
|
tenantId: openai.req?.user?.tenantId,
|
|
};
|
|
|
|
if (saveFile) {
|
|
await db.createFile(file, true);
|
|
} else if (updateUsage) {
|
|
try {
|
|
await db.updateFileUsage({ file_id });
|
|
} catch (error) {
|
|
logger.error('Error updating file usage', error);
|
|
}
|
|
}
|
|
|
|
return file;
|
|
};
|
|
|
|
/**
|
|
* Process OpenAI image files, convert to target format, save and return file metadata.
|
|
* @param {object} params - The params object.
|
|
* @param {ServerRequest} params.req - The Express request object.
|
|
* @param {Buffer} params.buffer - The image buffer.
|
|
* @param {string} params.file_id - The file ID.
|
|
* @param {string} params.filename - The filename.
|
|
* @param {string} params.fileExt - The file extension.
|
|
* @returns {Promise<MongoFile>} The file metadata.
|
|
*/
|
|
const processOpenAIImageOutput = async ({ req, buffer, file_id, filename, fileExt }) => {
|
|
const currentDate = new Date();
|
|
const formattedDate = currentDate.toISOString();
|
|
const appConfig = req.config;
|
|
const _file = await convertImage(req, buffer, undefined, `${file_id}${fileExt}`);
|
|
|
|
// Create only one file record with the correct information
|
|
const file = {
|
|
..._file,
|
|
usage: 1,
|
|
user: req.user.id,
|
|
type: mime.getType(fileExt),
|
|
createdAt: formattedDate,
|
|
updatedAt: formattedDate,
|
|
source: getFileStrategy(appConfig, { isImage: true }),
|
|
context: FileContext.assistants_output,
|
|
file_id,
|
|
filename,
|
|
tenantId: req.user.tenantId,
|
|
};
|
|
db.createFile(file, true);
|
|
return file;
|
|
};
|
|
|
|
/**
|
|
* Retrieves and processes an OpenAI file based on its type.
|
|
*
|
|
* @param {Object} params - The params passed to the function.
|
|
* @param {OpenAIClient} params.openai - The OpenAI client instance.
|
|
* @param {RunClient} params.client - The LibreChat client instance: either refers to `openai` or `streamRunManager`.
|
|
* @param {string} params.file_id - The ID of the file to retrieve.
|
|
* @param {string} [params.basename] - The basename of the file (if image); e.g., 'image.jpg'. `undefined` for `file_citation` annotations.
|
|
* @param {boolean} [params.unknownType] - Whether the file type is unknown.
|
|
* @returns {Promise<{file_id: string, filepath: string, source: string, bytes?: number, width?: number, height?: number} | null>}
|
|
* - Returns null if `file_id` is not defined; else, the file metadata if successfully retrieved and processed.
|
|
*/
|
|
async function retrieveAndProcessFile({
|
|
openai,
|
|
client,
|
|
file_id,
|
|
basename: _basename,
|
|
unknownType,
|
|
}) {
|
|
if (!file_id) {
|
|
return null;
|
|
}
|
|
|
|
let basename = _basename;
|
|
const processArgs = { openai, file_id, filename: basename, userId: client.req.user.id };
|
|
|
|
// If no basename provided, return only the file metadata
|
|
if (!basename) {
|
|
return await processOpenAIFile({ ...processArgs, saveFile: true });
|
|
}
|
|
|
|
const fileExt = path.extname(basename);
|
|
if (client.attachedFileIds?.has(file_id) || client.processedFileIds?.has(file_id)) {
|
|
return processOpenAIFile({ ...processArgs, updateUsage: true });
|
|
}
|
|
|
|
/**
|
|
* @returns {Promise<Buffer>} The file data buffer.
|
|
*/
|
|
const getDataBuffer = async () => {
|
|
const response = await openai.files.content(file_id);
|
|
const arrayBuffer = await response.arrayBuffer();
|
|
return Buffer.from(arrayBuffer);
|
|
};
|
|
|
|
let dataBuffer;
|
|
if (unknownType || !fileExt || imageExtRegex.test(basename)) {
|
|
try {
|
|
dataBuffer = await getDataBuffer();
|
|
} catch (error) {
|
|
logger.error('Error downloading file from OpenAI:', error);
|
|
dataBuffer = null;
|
|
}
|
|
}
|
|
|
|
if (!dataBuffer) {
|
|
return await processOpenAIFile({ ...processArgs, saveFile: true });
|
|
}
|
|
|
|
// If the filetype is unknown, inspect the file
|
|
if (dataBuffer && (unknownType || !fileExt)) {
|
|
const detectedExt = await determineFileType(dataBuffer);
|
|
const isImageOutput = detectedExt && imageExtRegex.test('.' + detectedExt);
|
|
|
|
if (!isImageOutput) {
|
|
return await processOpenAIFile({ ...processArgs, saveFile: true });
|
|
}
|
|
|
|
return await processOpenAIImageOutput({
|
|
file_id,
|
|
req: client.req,
|
|
buffer: dataBuffer,
|
|
filename: basename,
|
|
fileExt: detectedExt,
|
|
});
|
|
} else if (dataBuffer && imageExtRegex.test(basename)) {
|
|
return await processOpenAIImageOutput({
|
|
file_id,
|
|
req: client.req,
|
|
buffer: dataBuffer,
|
|
filename: basename,
|
|
fileExt,
|
|
});
|
|
} else {
|
|
logger.debug(`[retrieveAndProcessFile] Non-image file type detected: ${basename}`);
|
|
return await processOpenAIFile({ ...processArgs, saveFile: true });
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Converts a base64 string to a buffer.
|
|
* @param {string} base64String
|
|
* @returns {Buffer<ArrayBufferLike>}
|
|
*/
|
|
function base64ToBuffer(base64String) {
|
|
try {
|
|
const typeMatch = base64String.match(/^data:([A-Za-z-+/]+);base64,/);
|
|
const type = typeMatch ? typeMatch[1] : '';
|
|
|
|
const base64Data = base64String.replace(/^data:([A-Za-z-+/]+);base64,/, '');
|
|
|
|
if (!base64Data) {
|
|
throw new Error('Invalid base64 string');
|
|
}
|
|
|
|
return {
|
|
buffer: Buffer.from(base64Data, 'base64'),
|
|
type,
|
|
};
|
|
} catch (error) {
|
|
throw new Error(`Failed to convert base64 to buffer: ${error.message}`);
|
|
}
|
|
}
|
|
|
|
async function saveBase64Image(
|
|
url,
|
|
{ req, file_id: _file_id, filename: _filename, endpoint, context, resolution },
|
|
) {
|
|
const appConfig = req.config;
|
|
const effectiveResolution = resolution ?? appConfig.fileConfig?.imageGeneration ?? 'high';
|
|
const file_id = _file_id ?? v4();
|
|
let filename = `${file_id}-${_filename}`;
|
|
const { buffer: inputBuffer, type } = base64ToBuffer(url);
|
|
if (!path.extname(_filename)) {
|
|
const extension = mime.getExtension(type);
|
|
if (extension) {
|
|
filename += `.${extension}`;
|
|
} else {
|
|
throw new Error(`Could not determine file extension from MIME type: ${type}`);
|
|
}
|
|
}
|
|
|
|
const image = await resizeImageBuffer(inputBuffer, effectiveResolution, endpoint);
|
|
const source = getFileStrategy(appConfig, { isImage: true });
|
|
const { saveBuffer } = getStrategyFunctions(source);
|
|
const filepath = await saveBuffer({
|
|
userId: req.user.id,
|
|
fileName: filename,
|
|
buffer: image.buffer,
|
|
tenantId: req.user.tenantId,
|
|
});
|
|
const storageMetadata = getStorageMetadata({ filepath, source });
|
|
return await db.createFile(
|
|
{
|
|
type,
|
|
source,
|
|
context,
|
|
file_id,
|
|
filepath,
|
|
...storageMetadata,
|
|
filename,
|
|
user: req.user.id,
|
|
bytes: image.bytes,
|
|
width: image.width,
|
|
height: image.height,
|
|
tenantId: req.user.tenantId,
|
|
},
|
|
true,
|
|
);
|
|
}
|
|
|
|
/**
|
|
* Filters a file based on its size and the endpoint origin.
|
|
*
|
|
* @param {Object} params - The parameters for the function.
|
|
* @param {ServerRequest} params.req - The request object from Express.
|
|
* @param {string} [params.req.endpoint]
|
|
* @param {string} [params.req.file_id]
|
|
* @param {number} [params.req.width]
|
|
* @param {number} [params.req.height]
|
|
* @param {number} [params.req.version]
|
|
* @param {boolean} [params.image] - Whether the file expected is an image.
|
|
* @param {boolean} [params.isAvatar] - Whether the file expected is a user or entity avatar.
|
|
* @returns {void}
|
|
*
|
|
* @throws {Error} If a file exception is caught (invalid file size or type, lack of metadata).
|
|
*/
|
|
function filterFile({ req, image, isAvatar }) {
|
|
const { file } = req;
|
|
const { endpoint, endpointType, file_id, width, height } = req.body;
|
|
|
|
if (!file_id && !isAvatar) {
|
|
throw new Error('No file_id provided');
|
|
}
|
|
|
|
if (file.size === 0) {
|
|
throw new Error('Empty file uploaded');
|
|
}
|
|
|
|
/* parse to validate api call, throws error on fail */
|
|
if (!isAvatar) {
|
|
isUUID.parse(file_id);
|
|
}
|
|
|
|
if (!endpoint && !isAvatar) {
|
|
throw new Error('No endpoint provided');
|
|
}
|
|
|
|
const appConfig = req.config;
|
|
const fileConfig = mergeFileConfig(appConfig.fileConfig);
|
|
|
|
const endpointFileConfig = getEndpointFileConfig({
|
|
endpoint,
|
|
fileConfig,
|
|
endpointType,
|
|
});
|
|
const fileSizeLimit =
|
|
isAvatar === true ? fileConfig.avatarSizeLimit : endpointFileConfig.fileSizeLimit;
|
|
|
|
if (file.size > fileSizeLimit) {
|
|
throw new Error(
|
|
`File size limit of ${fileSizeLimit / megabyte} MB exceeded for ${
|
|
isAvatar ? 'avatar upload' : `${endpoint} endpoint`
|
|
}`,
|
|
);
|
|
}
|
|
|
|
const isSupportedMimeType = fileConfig.checkType(
|
|
file.mimetype,
|
|
endpointFileConfig.supportedMimeTypes,
|
|
);
|
|
|
|
if (!isSupportedMimeType) {
|
|
throw new Error('Unsupported file type');
|
|
}
|
|
|
|
if (!image || isAvatar === true) {
|
|
return;
|
|
}
|
|
|
|
if (!width) {
|
|
throw new Error('No width provided');
|
|
}
|
|
|
|
if (!height) {
|
|
throw new Error('No height provided');
|
|
}
|
|
}
|
|
|
|
module.exports = {
|
|
filterFile,
|
|
processFileURL,
|
|
saveBase64Image,
|
|
processImageFile,
|
|
uploadImageBuffer,
|
|
processFileUpload,
|
|
processDeleteRequest,
|
|
processAgentFileUpload,
|
|
retrieveAndProcessFile,
|
|
};
|