From d8a9517c9d1645c4e4da8345fd0da59d846f129f Mon Sep 17 00:00:00 2001 From: jim800121chen Date: Sun, 17 May 2026 22:47:28 +0800 Subject: [PATCH] =?UTF-8?q?feat(task-scheduler):=20Phase=200.8b=20?= =?UTF-8?q?=E2=80=94=20API=20key=20auth=20+=20/result=20endpoint?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Auth pillar 從 OAuth 2.0 resource server 改成 pre-shared API key (visionA ↔ converter 1:1 internal trust)。新增 GET /api/v1/jobs/:id/result streaming endpoint 給 visionA backend 中轉 NEF 下載。 Phase A(auth 切換): - 新增 apiKeyMiddleware(constant-time compare、tokenFingerprint、4 audit events) - 砍 OAuth middleware + JWKS(保留 oauthClient 供 promote → FAA 使用) - 4 個 endpoint 換掛 requireApiKey - 加 TRUST_PROXY env + Express trust proxy 設定(forensic source_ip) Phase B(/result endpoint): - streaming NEF download with 5min timeout + concurrent cap 10 - Two-tier rate limit(burst 5/10s + sustained 20/min) - Bandwidth quota(1 GB/hr + 6 GB/24hr)by token_fingerprint - Range header silently ignored + Accept-Ranges: none - filename quote-escape + RFC 5987 fallback + sanitize - 8 個 /result audit events(forensic 完整) 設計演進記錄:docs/TODO-visionA-integration-v2.md(5/2 OAuth → 5/16 API key → 5/16 download via converter;對應 visionA repo ADR-015/016) Tests: 597 → 666 (+69)、29 suites all pass Security: APPROVE WITH CONDITIONS(單 instance 部署、6 新 env、24hr 監控) npm audit: 3 vuln → 0(transitive AWS SDK xml chain) Co-Authored-By: Claude Opus 4.7 (1M context) --- apps/task-scheduler/README.md | 299 +++- apps/task-scheduler/docs/openapi.yaml | 357 +++- apps/task-scheduler/env.example | 168 +- apps/task-scheduler/package-lock.json | 78 +- .../src/__tests__/config.test.js | 68 +- apps/task-scheduler/src/app.js | 28 +- .../auth/__tests__/apiKeyMiddleware.test.js | 874 ++++++++++ .../src/auth/__tests__/jwks.test.js | 285 ---- .../src/auth/__tests__/middleware.test.js | 763 --------- .../src/auth/apiKeyMiddleware.js | 350 ++++ apps/task-scheduler/src/auth/jwks.js | 155 -- apps/task-scheduler/src/auth/middleware.js | 286 ---- apps/task-scheduler/src/config.js | 143 +- .../__tests__/perClientRateLimit.test.js | 2 +- .../src/middleware/perClientRateLimit.js | 81 +- .../src/middleware/resultBandwidthQuota.js | 305 ++++ .../src/middleware/resultStreamConcurrency.js | 156 ++ .../src/middleware/uploadConcurrency.js | 2 +- .../__tests__/createJob.integration.test.js | 343 ++-- .../v1/__tests__/getJobs.integration.test.js | 323 ++-- .../v1/__tests__/promote.integration.test.js | 298 ++-- .../v1/__tests__/result.integration.test.js | 1060 ++++++++++++ .../__tests__/v1-routes.integration.test.js | 59 +- apps/task-scheduler/src/routes/v1/index.js | 131 +- apps/task-scheduler/src/routes/v1/jobs.js | 43 +- apps/task-scheduler/src/routes/v1/promote.js | 21 +- apps/task-scheduler/src/routes/v1/result.js | 606 +++++++ docs/TODO-visionA-integration-v2.md | 713 ++++++++ docs/autoflow/04-architecture/TDD.md | 1508 +++------------- docs/autoflow/04-architecture/api/api-jobs.md | 295 ++++ .../04-architecture/api/api-promote.md | 183 ++ .../04-architecture/api/api-result.md | 1513 +++++++++++++++++ docs/autoflow/04-architecture/auth.md | 286 ++++ docs/autoflow/04-architecture/database.md | 187 ++ docs/autoflow/04-architecture/design-doc.md | 829 ++++----- docs/autoflow/04-architecture/infra.md | 306 ++++ .../autoflow/04-architecture/observability.md | 274 +++ docs/autoflow/04-architecture/performance.md | 158 ++ docs/autoflow/04-architecture/security.md | 213 ++- 39 files changed, 9639 insertions(+), 4110 deletions(-) create mode 100644 apps/task-scheduler/src/auth/__tests__/apiKeyMiddleware.test.js delete mode 100644 apps/task-scheduler/src/auth/__tests__/jwks.test.js delete mode 100644 apps/task-scheduler/src/auth/__tests__/middleware.test.js create mode 100644 apps/task-scheduler/src/auth/apiKeyMiddleware.js delete mode 100644 apps/task-scheduler/src/auth/jwks.js delete mode 100644 apps/task-scheduler/src/auth/middleware.js create mode 100644 apps/task-scheduler/src/middleware/resultBandwidthQuota.js create mode 100644 apps/task-scheduler/src/middleware/resultStreamConcurrency.js create mode 100644 apps/task-scheduler/src/routes/v1/__tests__/result.integration.test.js create mode 100644 apps/task-scheduler/src/routes/v1/result.js create mode 100644 docs/TODO-visionA-integration-v2.md create mode 100644 docs/autoflow/04-architecture/api/api-jobs.md create mode 100644 docs/autoflow/04-architecture/api/api-promote.md create mode 100644 docs/autoflow/04-architecture/api/api-result.md create mode 100644 docs/autoflow/04-architecture/auth.md create mode 100644 docs/autoflow/04-architecture/database.md create mode 100644 docs/autoflow/04-architecture/infra.md create mode 100644 docs/autoflow/04-architecture/observability.md create mode 100644 docs/autoflow/04-architecture/performance.md diff --git a/apps/task-scheduler/README.md b/apps/task-scheduler/README.md index 62408db..61cd6cb 100644 --- a/apps/task-scheduler/README.md +++ b/apps/task-scheduler/README.md @@ -44,7 +44,8 @@ task-scheduler 是 Phase 1 唯一暴露給上游的應用層元件,承擔: | Web framework | Express | 4.x | | Queue | Redis Stream + ioredis | 5.x | | 物件儲存 | MinIO(S3 compatible,AWS SDK v3) | latest | -| 認證 | OAuth 2.0 + JWT(jose) | jose 5.x | +| 對外認證 | Pre-shared API key(Phase 0.8b)| — | +| 對 FAA 認證 | OAuth 2.0 client_credentials | jose 5.x | | 上傳 | multer (memoryStorage) | 1.4.x | | 速率限制 | express-rate-limit | 6.x | | 安全 headers | helmet | 7.x | @@ -61,8 +62,9 @@ task-scheduler 是 Phase 1 唯一暴露給上游的應用層元件,承擔: | Docker / docker-compose(可選) | 24.x+ | | Redis | 7.x(dev / prod 都需要) | | MinIO | latest(POST /api/v1/jobs 必須啟用) | -| Member Center | OAuth 2.0 Authorization Server,提供 JWKS / token endpoint | +| Member Center | OAuth 2.0 Authorization Server,**僅 promote 階段使用**(converter → FAA 取 token),visionA → converter 改採 API key 後不再依賴 JWKS | | File Access Agent | promote 階段呼叫,需支援 `PUT /files/{key}` | +| `CONVERTER_API_KEY` | 64 hex chars,由 `openssl rand -hex 32` 產生,與 visionA-backend 共用 | dev 環境若無真實 Member Center / FAA,可用 placeholder 值(見 `env.example`)。 @@ -76,8 +78,10 @@ dev 環境若無真實 Member Center / FAA,可用 placeholder 值(見 `env.e cd apps/task-scheduler cp env.example .env # 編輯 .env,至少把以下 placeholder 替換為真實值: -# - MEMBER_CENTER_*(若要實際打 Member Center) -# - KNERON_CONVERTER_CLIENT_SECRET +# - CONVERTER_API_KEY(visionA → converter 對外 auth,必填;用 `openssl rand -hex 32` 產) +# - MEMBER_CENTER_TOKEN_URL(promote 階段取 FAA token 用) +# - KNERON_CONVERTER_CLIENT_ID / CLIENT_SECRET(promote 階段身分) +# - FILE_ACCESS_AGENT_*(promote 目標) # - MINIO_*(若 STORAGE_BACKEND=minio) npm install @@ -134,9 +138,10 @@ apps/task-scheduler/ │ ├── config.js ← 集中讀 env,啟動時 fail-fast │ ├── redis.js ← Redis client + helpers │ ├── auth/ -│ │ ├── jwks.js ← jose remote JWKS cache + jwtVerify -│ │ ├── middleware.js ← requireAuth(scope) Express middleware -│ │ └── oauthClient.js ← Converter as OAuth Client(client_credentials) +│ │ ├── apiKeyMiddleware.js ← requireApiKey() Express middleware(Phase 0.8b A3 起, +│ │ │ visionA → converter 認證;取代既有 OAuth resource-server) +│ │ └── oauthClient.js ← Converter as OAuth Client(client_credentials, +│ │ promote 階段對 FAA 取 token 用) │ ├── fileAccessAgent/ │ │ ├── client.js ← FAA HTTP client(PUT only,重試 + 401 invalidate) │ │ └── errors.js @@ -190,11 +195,8 @@ apps/task-scheduler/ |------|------| | `REDIS_URL` | Redis 連線(含 password) | | `STORAGE_BACKEND` | `local` / `minio`;POST /api/v1/jobs 必須 `minio` | -| `MEMBER_CENTER_ISSUER` | JWT iss 比對基準 | -| `MEMBER_CENTER_JWKS_URL` | JWKS endpoint(驗 token 用) | -| `MEMBER_CENTER_TOKEN_URL` | token endpoint(取 promote 用 token) | -| `KNERON_CONVERTER_AUDIENCE` | 接受 JWT 的 aud | -| `KNERON_CONVERTER_CLIENT_ID` | Converter 自己 OAuth client | +| `MEMBER_CENTER_TOKEN_URL` | promote 階段取 FAA token 用(converter 端 OAuth client) | +| `KNERON_CONVERTER_CLIENT_ID` | Converter 自己 OAuth client 身份(promote 用) | | `KNERON_CONVERTER_CLIENT_SECRET` | **不要進 git;用 secret manager** | | `FILE_ACCESS_AGENT_BASE_URL` | promote 目標;production 強制 https | | `FILE_ACCESS_AGENT_AUDIENCE` | promote token 的 aud | @@ -202,6 +204,16 @@ apps/task-scheduler/ `STORAGE_BACKEND=minio` 時還需:`MINIO_ENDPOINT_URL` / `MINIO_BUCKET` / `MINIO_ACCESS_KEY` / `MINIO_SECRET_KEY`。 +### 5.1b API Key(visionA → converter 認證,Phase 0.8b 必填於 stage / prod) + +| 變數 | 用途 | +|------|------| +| `CONVERTER_API_KEY` | 64 hex chars pre-shared key;對外 `/api/v1/*` 認證憑證。未設定時所有 `/api/v1/*` 一律回 503 `service_unavailable`(fail-secure) | + +- 產生:`openssl rand -hex 32` +- 設置:converter `.env` 與 visionA `.env.stage` 兩端用**相同字串** +- 詳見 §7 Auth 流程 + `docs/autoflow/04-architecture/auth.md` + ### 5.2 可選(有合理預設) 涵蓋: @@ -210,11 +222,13 @@ apps/task-scheduler/ 預設 10MB、`MULTIPART_REF_IMAGES_MAX_COUNT` 預設 100) - 上傳並發(`MAX_CONCURRENT_UPLOADS` 預設 5、`UPLOAD_RETRY_AFTER_SECONDS` 預設 30) - Rate limit(`API_V1_RATE_LIMIT_WINDOW_MS` 預設 5min、`API_V1_RATE_LIMIT_MAX` 預設 300) -- JWKS 行為(`JWKS_CACHE_MAX_AGE_MS`、`JWKS_COOLDOWN_MS`、`JWT_CLOCK_TOLERANCE_SEC`) -- OAuth client(`OAUTH_TOKEN_REFRESH_SKEW_MS`、`OAUTH_TOKEN_TIMEOUT_MS`) +- OAuth client (converter → FAA,僅 promote 用):`OAUTH_TOKEN_REFRESH_SKEW_MS`、`OAUTH_TOKEN_TIMEOUT_MS` - promote timeout(`PROMOTE_TIMEOUT_MS` 預設 300s) -- Tenant 隔離(`CONVERTER_TENANT_ID`,空字串 = 不檢查) -- Scope 命名覆寫(`CONVERTER_SCOPE_WRITE` / `CONVERTER_SCOPE_READ`) + +> **Phase 0.8b A4 已砍除**:`MEMBER_CENTER_ISSUER` / `MEMBER_CENTER_JWKS_URL` / +> `KNERON_CONVERTER_AUDIENCE` / `CONVERTER_TENANT_ID` / `CONVERTER_SCOPE_*` / +> `JWKS_*` / `JWT_CLOCK_TOLERANCE_SEC` —— 這些都是 OAuth resource-server 模式 +> 才需要的;改 API key 後不再使用。若部署環境仍設這些 env,server 啟動會忽略(不報錯)。 ### 5.3 安全提醒 @@ -229,17 +243,54 @@ apps/task-scheduler/ ### 6.1 Phase 1 對外 API(`/api/v1/*`) -| 方法 | 路徑 | scope | 說明 | -|------|------|-------|------| -| POST | `/api/v1/jobs` | `converter:job.write` | 建立轉檔 job(multipart) | -| GET | `/api/v1/jobs` | `converter:job.read` | Recovery 列表(user_id 必填) | -| GET | `/api/v1/jobs/:id` | `converter:job.read` | 單一 job 狀態(含 ETag) | -| POST | `/api/v1/jobs/:id/promote` | `converter:job.write` | 結果檔搬到 FAA | -| POST | `/api/v1/jobs/:id/download-tokens` | `converter:job.read` | **Phase 2 預留**,回 501 | -| DELETE | `/api/v1/jobs/:id` | `converter:job.write` | **Phase 2 預留**,回 501 | +所有 endpoint 統一以 `Authorization: Bearer ` 認證(Phase 0.8b A3 起); +API key 即「caller 是 visionA」的完整證明,不分 read/write scope。 + +| 方法 | 路徑 | 說明 | +|------|------|------| +| POST | `/api/v1/jobs` | 建立轉檔 job(multipart) | +| GET | `/api/v1/jobs` | Recovery 列表(user_id 必填) | +| GET | `/api/v1/jobs/:id` | 單一 job 狀態(含 ETag) | +| POST | `/api/v1/jobs/:id/promote` | 結果檔搬到 FAA | +| GET | `/api/v1/jobs/:id/result` | **Phase 0.8b Phase B 新增** — NEF binary stream proxy 給 visionA-backend | +| POST | `/api/v1/jobs/:id/download-tokens` | **Phase 2 預留**,回 501 | +| DELETE | `/api/v1/jobs/:id` | **Phase 2 預留**,回 501 | 完整規格、所有 schema、所有錯誤情境的 example:見 [`docs/openapi.yaml`](./docs/openapi.yaml)。 +#### 6.1.a `/result` 端點細節(Phase 0.8b Phase B) + +`GET /api/v1/jobs/:id/result` 是 streaming proxy(200 + `application/octet-stream`),給 +visionA-backend 從 Converter Bucket 直接拉 NEF 結果檔。取代「visionA → 拿 delegated download +token → FAA」路徑(該路徑因 MC 沒實作 endpoint 而從未跑通)。 + +**安全限制**(對齊 [api-result.md §9 / §15](../../docs/autoflow/04-architecture/api/api-result.md)): + +| 限制 | 預設值 | env 覆寫 | 失敗回應 | +|------|--------|----------|---------| +| Burst rate limit | 5 req / 10s per token_fingerprint | `RESULT_RATE_LIMIT_BURST_PER_10S` | 429 `rate_limit_exceeded` + `limit_type: burst` | +| Sustained rate limit | 20 req / 1min per token_fingerprint | `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | 429 `rate_limit_exceeded` + `limit_type: sustained` | +| Hourly bandwidth quota | 1 GB / hr per token_fingerprint | `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | 429 `bandwidth_quota_exceeded` + `limit_type: bandwidth_hourly` | +| Daily bandwidth quota | 6 GB / 24hr per token_fingerprint | `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | 429 `bandwidth_quota_exceeded` + `limit_type: bandwidth_daily` | +| Concurrent stream cap | 10 同時 stream(per-instance) | `MAX_CONCURRENT_RESULT_STREAMS` | 503 `service_busy` + `Retry-After: 30` | +| Stream response timeout | 5 分鐘 | `RESULT_STREAM_TIMEOUT_MS` | connection destroy + audit log `result.stream_timeout` | + +**Range header 處理**:silently ignored,response 永遠 200 整段 + `Accept-Ranges: none` +(不回 416、不切片)。收到 Range header 時會寫 audit log `result.range_attempted`(INFO)。 +詳見 [api-result.md §10](../../docs/autoflow/04-architecture/api/api-result.md)。 + +**audit log 12 種事件**(對齊 [api-result.md §11.3](../../docs/autoflow/04-architecture/api/api-result.md)): +`result.streamed` / `result.stream_error` / `result.client_closed` / `result.stream_timeout` / +`result.not_found` / `result.not_completed` / `result.expired` / `result.storage_unavailable` / +`result.rate_limited` / `result.bandwidth_quota_exceeded` / `result.range_attempted` / +`result.filename_assertion_failed`。每個事件含 A.7 五欄(source_ip / token_fingerprint / +request_id / http_method / http_path)+ /result 四欄(job_id / size_bytes / duration_ms / +stream_completed,按事件類型按需)。 + +**Multi-instance 限制**:上述 in-memory counter 均為 per-process;Phase 2 多 instance +部署前必切 Redis backend,否則 limit 會被「乘以 instance 數」放鬆。見 [security.md +候補 #8](../../docs/autoflow/04-architecture/security.md)(HIGH)。 + ### 6.2 Legacy / 內部 API(`/jobs/*`,僅內網 vhost 暴露) 對 Web UI 100% 不變更行為(T4 重構僅是「移動 + 抽象」): @@ -261,48 +312,160 @@ apps/task-scheduler/ --- -## 7. Auth 流程 +## 7. Auth 流程(Phase 0.8b) -### 7.1 上游消費者(visionA-backend)取 token +> **設計演進**:Phase 0.8b 起,visionA → converter 對外認證從 OAuth `client_credentials` +> 改為 pre-shared API key(1:1 internal trust)。converter → FAA 仍走 OAuth client_credentials。 +> 歷史 OAuth resource-server 設計詳見 visionA repo `ADR-014` / `ADR-015` v2.1。 -Converter 是 OAuth 2.0 Resource Server。建議消費者用 `client_credentials` -grant 從 Member Center 取得 service-to-service token: +### 7.1 visionA → Converter(API key) -``` -POST {member-center}/oauth/token -Content-Type: application/x-www-form-urlencoded +#### 7.1.1 設置 -grant_type=client_credentials -&client_id= -&client_secret= -&scope=converter:job.write converter:job.read -&audience=kneron_converter_api +1. 在 converter `.env`(或 secret manager)設: + ```bash + CONVERTER_API_KEY=$(openssl rand -hex 32) + ``` + 產出 64 hex chars(128 bits 熵)。 + +2. visionA 端 `.env.stage` 設**相同字串**: + ```bash + VISIONA_CONVERTER_API_KEY= + ``` + +#### 7.1.2 呼叫範例 + +```bash +# 健康檢查(不需 API key) +curl http://localhost:4000/health + +# 建立 job(需 API key) +curl -X POST http://localhost:4000/api/v1/jobs \ + -H "Authorization: Bearer $CONVERTER_API_KEY" \ + -F "model=@./model.onnx" \ + -F "user_id=alice" \ + -F "model_id=1001" \ + -F "version=v1.0.0" \ + -F "platform=520" + +# 查 job 狀態 +curl -H "Authorization: Bearer $CONVERTER_API_KEY" \ + http://localhost:4000/api/v1/jobs/ ``` -### 7.2 Converter 端驗證 +#### 7.1.3 Middleware 行為 每個 `/api/v1/*` request 進入時: -1. Bearer token 驗章(`jose.createRemoteJWKSet` + `jwtVerify`) -2. `iss` / `aud` / `exp`(含 60 秒 clock skew) -3. `scope`(端點要求的 scope 必須在 token claim 內) -4. `tenant_id`(若 `CONVERTER_TENANT_ID` 非空則檢查) -5. `client_id`(用於 rate limit / log / job 隔離) +1. 解 `Authorization: Bearer ` +2. `crypto.timingSafeEqual` constant-time compare(防 timing attack) +3. 通過後設 `req.auth`: + ```js + req.auth = { + sub: 'visionA-service', + clientId: 'visionA-service', + tenantId: null, + scopes: ['converter:job.write', 'converter:job.read'], + raw: { authType: 'api_key' }, + }; + ``` 驗證失敗時: -- 回 v1 標準錯誤格式(`{error: {code, message, details, request_id}}`) -- **設 `Connection: close` header + `req.socket.destroy()`**:阻止 - unauthorized client 繼續灌大檔。但這是 best-effort;真正的 body 上限 - 靠 Nginx `client_max_body_size`(部署層) +| 情境 | HTTP | error.code | +|------|------|-----------| +| 缺 Authorization header / 非 Bearer 格式 / token 為空 | 401 | `invalid_token` | +| Token 與 `CONVERTER_API_KEY` 不符 | 401 | `invalid_token` | +| `CONVERTER_API_KEY` env 未設定(fail-secure) | 503 | `service_unavailable` | -### 7.3 Converter 取 promote 用 token +所有失敗: +- 回 v1 標準錯誤格式(`{error: {code, message, request_id}}`) +- 設 `Connection: close` + `req.socket.destroy()`,阻止 unauthorized client 繼續灌大檔(best-effort;真正的 body 上限靠 Nginx `client_max_body_size`) -promote 時 Converter 切換成 OAuth Client,用 `client_credentials` 取 -`files:upload.write` scope token,PUT 到 FAA。 +#### 7.1.5 Audit log(Phase 0.8b A7) -token cache per scope,過期前 60s 主動 refresh;FAA 回 401 時自動 -invalidate cache 並重試一次。 +每個 `/api/v1/*` request 都會寫一筆 audit log(JSON、stdout): + +| `action` | 時機 | 欄位 | +|----------|------|------| +| `auth.api_key.authenticated` | 驗證成功 | level=INFO、`source_ip`、`token_fingerprint`、`request_id`、`http_method`、`http_path`、`client_id` | +| `auth.api_key.missing` | 缺 Authorization / 格式錯 / token 空 | level=WARN、`source_ip`、`request_id`、`http_method`、`http_path`(無 fingerprint) | +| `auth.api_key.invalid` | Token 不符 | level=WARN、`source_ip`、`request_id`、`http_method`、`http_path`、`token_fingerprint`(wrong token 的 fingerprint) | +| `auth.api_key.not_configured` | `CONVERTER_API_KEY` env 未設 | level=ERROR、`source_ip`、`request_id`、`http_method`、`http_path`(無 fingerprint、不洩漏 caller token) | + +關鍵設計: + +- **`source_ip` 從 `req.ip` 取**:依賴 `app.set('trust proxy', ...)` 正確配置(見 `TRUST_PROXY` env)。設錯會讓 source_ip 失去 forensic 價值或被 attacker 偽造。 +- **`token_fingerprint` = `sha256(token)` 前 12 hex chars(48 bits 識別空間)**:足以 cluster 同一把 key 的多 caller 或同 attacker 的多次嘗試,不可逆推 token 本身。 +- **絕不 log token 內容**:失敗 path 也只 log fingerprint。 + +範例(成功 path): + +```json +{ + "service": "task-scheduler", + "timestamp": "2026-05-16T10:30:00.123Z", + "level": "INFO", + "action": "auth.api_key.authenticated", + "auth_type": "api_key", + "client_id": "visionA-service", + "source_ip": "203.0.113.42", + "request_id": "7c6e4f3b-...", + "http_method": "POST", + "http_path": "/api/v1/jobs", + "token_fingerprint": "8a1b3c2d4e5f" +} +``` + +範例(失敗 path — wrong token): + +```json +{ + "service": "task-scheduler", + "timestamp": "2026-05-16T10:30:01.456Z", + "level": "WARN", + "action": "auth.api_key.invalid", + "auth_type": "api_key", + "source_ip": "203.0.113.99", + "request_id": "abc1-...", + "http_method": "POST", + "http_path": "/api/v1/jobs", + "token_fingerprint": "f9e8d7c6b5a4" +} +``` + +⚠️ **`TRUST_PROXY` env 配置(關鍵!)**: + +| 部署架構 | `TRUST_PROXY` 設定 | 風險 | +|---------|-------------------|------| +| Local dev / 測試環境 | 留空(預設 `loopback`) | — | +| Stage / prod(前面 1 層 Nginx) | `TRUST_PROXY=1` | — | +| Stage / prod(cloud LB + Nginx) | `TRUST_PROXY=2` | — | +| 任何位置 | `TRUST_PROXY=true`(信任所有 hop) | ⚠️ Attacker 可偽造 `X-Forwarded-For` 欺騙 audit log | + +設過嚴(stage / prod 留 `loopback`)→ `source_ip` 永遠是 Nginx 內部 IP、forensic 失效。設過寬(`true`)→ attacker 可偽造 IP。**必須與實際部署 hop 數一致**。詳見 `env.example` §16 或 [Express trust proxy docs](https://expressjs.com/en/guide/behind-proxies.html)。 + +#### 7.1.4 Rotation 流程 + +1. 雙端各自 stop(或允許短暫 401 期) +2. `openssl rand -hex 32` 產新 key +3. 更新雙端 `.env` 為新 key +4. converter 先 redeploy;visionA 後 redeploy +5. 驗證:任意 `/api/v1/*` endpoint 帶新 key 應 200 + +詳見 `docs/autoflow/04-architecture/auth.md` §4。 + +### 7.2 Converter → File Access Agent(OAuth client_credentials,保留) + +promote 流程(`POST /api/v1/jobs/:id/promote`)中,Converter 切換成 OAuth Client, +用 `client_credentials` 取 `files:upload.write` scope token,PUT 結果檔到 FAA。 +**Phase 0.8b 完全不動**。 + +token cache per scope,過期前 60s 主動 refresh;FAA 回 401 時自動 invalidate +cache 並重試一次。詳見 `src/auth/oauthClient.js`。 + +需要的 env:`MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_ID` / +`KNERON_CONVERTER_CLIENT_SECRET` / `FILE_ACCESS_AGENT_*`。 --- @@ -312,10 +475,7 @@ invalidate cache 並重試一次。 |------|------|------| | 400 | `validation_error` | 欄位格式錯(`details.fields[]` 列具體欄位) | | 400 | `invalid_multipart` | multipart parse 失敗、缺必要 file、副檔名不符 | -| 401 | `invalid_token` | JWT 無效 / 簽章錯 / 缺 claim | -| 401 | `token_expired` | JWT 過期 | -| 403 | `insufficient_scope` | scope 不足(`details.required_scope` / `provided_scopes`) | -| 403 | `tenant_mismatch` | tenant_id 不符 | +| 401 | `invalid_token` | API key 不符 / 缺 Authorization header / 格式錯 | | 404 | `job_not_found` | job 不存在或不屬於該 client(不洩漏存在性) | | 404 | `not_found` | 路徑不存在 | | 409 | `user_has_active_job` | 同 user 已有未完成 job(`details.active_job_*`) | @@ -329,8 +489,9 @@ invalidate cache 並重試一次。 | 501 | `not_implemented` | Phase 2 預留端點 | | 502 | `storage_unavailable` | MinIO 寫入失敗 | | 502 | `file_gateway_unavailable` | FAA 不可用 / 拒絕 | -| 503 | `auth_service_unavailable` | Member Center 取 token 失敗 | +| 503 | `auth_service_unavailable` | Member Center 取 token 失敗(**僅 promote 階段**,converter → FAA 那條鏈) | | 503 | `service_busy` | upload concurrency 已滿(`Retry-After` header) | +| 503 | `service_unavailable` | `CONVERTER_API_KEY` env 未設定(visionA → converter 對外 API fail-secure) | response 完整 schema 見 [`docs/openapi.yaml`](./docs/openapi.yaml#components/schemas/ApiError)。 @@ -395,7 +556,7 @@ response body 同時包含 `dependencies.{redis, member_center, file_access_agen ## 11. Phase 1 已知接受風險 -> 本節為摘要,完整內容見 [`.autoflow/04-architecture/security.md`](../../.autoflow/04-architecture/security.md)。 +> 本節為摘要,完整內容見 [`docs/autoflow/04-architecture/security.md`](../../docs/autoflow/04-architecture/security.md)。 ### 11.1 user_id 信任邊界(最重要) @@ -441,7 +602,7 @@ response body 同時包含 `dependencies.{redis, member_center, file_access_agen ## 12. 測試 ```bash -npm test # 跑所有 unit + integration test(630 tests,~4 秒) +npm test # 跑所有 unit + integration test(Phase 0.8b A6 後 ~640 tests,< 10 秒) npm test -- --watch # watch 模式 npm test -- src/auth # 只跑 auth 模組的測試 ``` @@ -460,8 +621,10 @@ CI 用:`npm test`。 | 症狀 | 可能原因 | 排查 | |------|---------|------| | 啟動立刻 exit 1 | env 缺漏 | 看 `[Scheduler] Config validation failed` log;對照 `env.example` | -| 401 invalid_token / token_expired | clock skew、JWKS cache 沒拿到新 kid | 檢查 server 時鐘、`MEMBER_CENTER_JWKS_URL` 可達性 | +| 啟動 warn `config.api_key_not_set` | `CONVERTER_API_KEY` env 未設定 | 設 `CONVERTER_API_KEY` 為 64 hex(`openssl rand -hex 32`);未設時 `/api/v1/*` 一律 503 | +| 401 invalid_token | API key 不符 / 缺 Authorization header / 格式錯 | 確認 visionA 與 converter 兩端 `CONVERTER_API_KEY` 字串完全相同 | | 401 後 client 連線立刻斷 | 設計如此(`Connection: close` + `socket.destroy()`) | 正常行為,避免 client 繼續灌 body | +| 503 service_unavailable on `/api/v1/*` | converter 端 `CONVERTER_API_KEY` 未設 | 設 env 後重啟 | | 409 user_has_active_job 但前一個 job 已 failed | active_job lock 沒被釋放 | 看 worker done listener 是否運作;最壞情況 7 天 TTL 會自動清 | | 502 storage_unavailable | MinIO 不可達 / 認證錯 | 檢查 `MINIO_*` env、bucket 是否存在 | | 502 file_gateway_unavailable | FAA 5xx 或 4xx 拒絕(非 401) | 看 server log `promote.faa_put_failed`,FAA 端排查 | @@ -473,9 +636,10 @@ CI 用:`npm test`。 更多細節: -- `.autoflow/04-architecture/TDD.md`(完整規格) -- `.autoflow/04-architecture/security.md`(安全模型 / 接受風險) -- `.autoflow/05-implementation/tasks-phase1.md`(任務拆分與決策紀錄) +- `docs/autoflow/04-architecture/TDD.md`(完整規格索引) +- `docs/autoflow/04-architecture/auth.md`(Phase 0.8b API key 認證設計) +- `docs/autoflow/04-architecture/security.md`(安全模型 / 接受風險) +- `.autoflow/05-implementation/`(per-branch 實作筆記與 Phase 0.8b A1–A6 報告) --- @@ -485,11 +649,12 @@ CI 用:`npm test`。 |------|------| | [`docs/openapi.yaml`](./docs/openapi.yaml) | Phase 1 對外 API spec(給 visionA-backend 等消費者 import) | | [`env.example`](./env.example) | 完整環境變數清單(含說明、預設、必填與否) | -| `../../.autoflow/04-architecture/TDD.md` | 完整技術設計文件 | -| `../../.autoflow/04-architecture/security.md` | 安全模型 / 接受風險 / Phase 2 候補 | -| `../../.autoflow/04-architecture/design-doc.md` | 架構決策(為什麼選這些方案) | -| `../../.autoflow/02-prd/PRD.md` | 產品需求 / user stories | -| `../../.autoflow/05-implementation/tasks-phase1.md` | T1-T11 任務拆分與審查紀錄 | +| `../../docs/autoflow/04-architecture/TDD.md` | 完整技術設計文件 | +| `../../docs/autoflow/04-architecture/auth.md` | Phase 0.8b API key 認證設計(visionA → converter)+ FAA OAuth client(保留) | +| `../../docs/autoflow/04-architecture/security.md` | 安全模型 / 接受風險 / Phase 2 候補 | +| `../../docs/autoflow/04-architecture/design-doc.md` | 架構決策(為什麼選這些方案) | +| `../../docs/autoflow/02-prd/PRD.md` | 產品需求 / user stories | +| `../../docs/TODO-visionA-integration-v2.md` | Phase 0.8b 對 visionA 整合的交接紀錄 | --- diff --git a/apps/task-scheduler/docs/openapi.yaml b/apps/task-scheduler/docs/openapi.yaml index 60c1d39..2eee6d3 100644 --- a/apps/task-scheduler/docs/openapi.yaml +++ b/apps/task-scheduler/docs/openapi.yaml @@ -37,18 +37,34 @@ info: 2. Polling job 狀態,直到 `completed` 或 `failed` 3. 把成功的結果檔 promote(推送)到 File Access Agent / NAS 模型庫 - ## 認證 + ## 認證(Phase 0.8b 起) - 所有 `/api/v1/*` 端點都需要 `Authorization: Bearer `,token - 需由 Innovedus Member Center 簽發、`aud=kneron_converter_api`,並含對應 scope。 + 所有 `/api/v1/*` 端點都需要 `Authorization: Bearer `。 - Converter 是 OAuth 2.0 Resource Server。上游消費者建議用 - `client_credentials` grant 取得 service-to-service token。 + Converter 採 **pre-shared API key**(1:1 internal trust,取代 OAuth resource-server 模式)。 + API key 即「caller 是 visionA」的完整證明,不分 read/write scope,不檢查 audience / tenant。 + + - 產生:`openssl rand -hex 32`(64 hex chars / 128 bits 熵) + - visionA 與 converter 兩端使用**完全相同字串** + - 詳見 `docs/autoflow/04-architecture/auth.md` + + ### Audit log(Phase 0.8b A7) + + 每個 `/api/v1/*` request(成功 / 失敗皆然)都會寫一筆 audit log,含 `source_ip`、 + `token_fingerprint`(sha256 前 12 hex chars、不可逆推 token)、`request_id`、 + `http_method`、`http_path`。詳見 README §7.1.5。對 visionA 端 awareness:請避免 + 在不同 caller instance 間混用同一個 IP 來源(否則 forensic 區分能力下降)。 + + > **歷史**:Phase 0.8b 之前曾規劃 OAuth `client_credentials` + JWT 驗證 + > (`aud=kneron_converter_api`、`scope=converter:job.{read,write}`), + > 但因 1:1 trust 場景下 OAuth 過度設計、且 stage 撞 4 個 blocker(見 visionA repo + > `ADR-014` / `ADR-015` v2.1),改採 API key。converter → FAA 的 OAuth + > `client_credentials` 鏈條(promote 階段內部用)**保留不動**。 ## user_id 與 trust boundary - `user_id` 不是來自 JWT claim,而是 multipart form field(POST)或 - query string(GET)。Converter **完全信任**呼叫端帶來的 user_id 是 + `user_id` 不是來自 auth credential(API key 也無 user 概念),而是 multipart form + field(POST)或 query string(GET)。Converter **完全信任**呼叫端帶來的 user_id 是 對的,不做 user 層級 ACL。 這是 Phase 1 刻意接受的設計風險,詳見: @@ -106,11 +122,11 @@ tags: description: 預留路由,Phase 1 一律回 501 not_implemented # ============================================================================= -# 全域 security default:除標明 security: [] 外,所有 path 都需 Bearer JWT +# 全域 security default:除標明 security: [] 外,所有 path 都需 ApiKeyAuth(Bearer scheme) # ============================================================================= security: - - BearerAuth: [] + - ApiKeyAuth: [] paths: @@ -222,7 +238,7 @@ paths: 5. 回 201 `created` security: - - BearerAuth: [converter:job.write] + - ApiKeyAuth: [] parameters: - $ref: '#/components/parameters/XRequestId' requestBody: @@ -321,8 +337,6 @@ paths: request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff '401': $ref: '#/components/responses/Unauthorized' - '403': - $ref: '#/components/responses/Forbidden' '409': description: 該 user_id 已有進行中 job content: @@ -399,7 +413,10 @@ paths: message: 檔案儲存服務暫時無法使用,請稍後重試 request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff '503': - description: 並發 upload 超過 process semaphore 上限 + description: | + 兩種情境: + - `service_busy`:並發 upload 超過 process semaphore 上限 + - `service_unavailable`:server 端 `CONVERTER_API_KEY` env 未設定(fail-secure) content: application/json: schema: @@ -414,6 +431,12 @@ paths: retry_after_seconds: 30 max_concurrent: 5 request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + api_key_not_configured: + value: + error: + code: service_unavailable + message: API key not configured + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff # ========================================================================= # GET /api/v1/jobs — Recovery 列表(user_id 必填) @@ -433,15 +456,16 @@ paths: ## 隔離 - 列出的 job 一律自動以 token 內的 `client_id` 過濾 — 即同一 user_id 但 - 由不同 client 建立的 job 不會出現在結果中。 + 列出的 job 一律自動以呼叫端 `client_id` 過濾 — 即同一 user_id 但由不同 client + 建立的 job 不會出現在結果中。Phase 0.8b A3 起,API key 路線下 `client_id` 寫死為 + `'visionA-service'`;隔離邏輯仍保留供未來多 caller 擴展用。 ## 分頁 使用 base64-url-encoded opaque cursor。client 不該假設 cursor 內容格式(未來可能改為 keyset)。當沒有更多資料時 `next_cursor: null`。 security: - - BearerAuth: [converter:job.read] + - ApiKeyAuth: [] parameters: - name: user_id in: query @@ -563,10 +587,10 @@ paths: request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff '401': $ref: '#/components/responses/Unauthorized' - '403': - $ref: '#/components/responses/Forbidden' '429': $ref: '#/components/responses/RateLimited' + '503': + $ref: '#/components/responses/ServiceUnavailable' # =========================================================================== # GET /api/v1/jobs/:id — 單一 job 狀態 + ETag @@ -584,8 +608,9 @@ paths: ## Client 隔離 - 即使 jobId 真實存在,若 token 內 `client_id` 與 job 的 - `created_by_client_id` 不符,**一律回 404**(不洩漏存在性)。 + 即使 jobId 真實存在,若呼叫端 `client_id` 與 job 的 `created_by_client_id` + 不符,**一律回 404**(不洩漏存在性)。Phase 0.8b A3 起,API key 路線下 + `client_id` 寫死為 `'visionA-service'`。 ## ETag @@ -594,7 +619,7 @@ paths: 若 job 未變化回 304(無 body) security: - - BearerAuth: [converter:job.read] + - ApiKeyAuth: [] parameters: - name: If-None-Match in: header @@ -744,12 +769,12 @@ paths: type: string '401': $ref: '#/components/responses/Unauthorized' - '403': - $ref: '#/components/responses/Forbidden' '404': $ref: '#/components/responses/JobNotFound' '429': $ref: '#/components/responses/RateLimited' + '503': + $ref: '#/components/responses/ServiceUnavailable' delete: tags: [Phase 2 (Reserved)] @@ -759,7 +784,7 @@ paths: Phase 2 規劃的端點。Phase 1 一律回 501 `not_implemented`。 deprecated: false security: - - BearerAuth: [converter:job.write] + - ApiKeyAuth: [] responses: '501': $ref: '#/components/responses/NotImplemented' @@ -798,7 +823,7 @@ paths: `auth_service_unavailable`。 security: - - BearerAuth: [converter:job.write] + - ApiKeyAuth: [] parameters: - $ref: '#/components/parameters/XRequestId' requestBody: @@ -859,8 +884,6 @@ paths: request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff '401': $ref: '#/components/responses/Unauthorized' - '403': - $ref: '#/components/responses/Forbidden' '404': $ref: '#/components/responses/JobNotFound' '409': @@ -925,7 +948,10 @@ paths: message: 檔案存取服務拒絕此請求 request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff '503': - description: 認證服務無法簽發 promote 用 token + description: | + 兩種情境: + - `auth_service_unavailable`:promote 階段 converter → Member Center 取 FAA token 失敗 + - `service_unavailable`:server 端 `CONVERTER_API_KEY` env 未設定(fail-secure) content: application/json: schema: @@ -937,6 +963,197 @@ paths: code: auth_service_unavailable message: 認證服務目前無法簽發必要 token,請稍後重試 request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + api_key_not_configured: + value: + error: + code: service_unavailable + message: API key not configured + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + + # =========================================================================== + # Phase 0.8b Phase B — GET /api/v1/jobs/{id}/result(streaming proxy) + # =========================================================================== + + /api/v1/jobs/{id}/result: + parameters: + - $ref: '#/components/parameters/JobIdPath' + get: + tags: [Result] + summary: 'Stream NEF result binary for visionA-backend' + operationId: getJobResult + description: | + Phase 0.8b Phase B 新增(取代 Phase 2 `/download-tokens`)。 + + visionA-backend 用此 endpoint 從 Converter Bucket 直接 streaming NEF 結果檔。 + 取代「visionA → 拿 delegated download token → FAA」路徑(該路徑因 MC 沒實作而從未跑通)。 + + **安全限制**(per `token_fingerprint = sha256(api_key).slice(0,12)`): + - Burst rate limit:5 req / 10s(429 + `limit_type: burst`) + - Sustained rate limit:20 req / 1min(429 + `limit_type: sustained`) + - Hourly bandwidth quota:1 GB / hr(429 + `limit_type: bandwidth_hourly`) + - Daily bandwidth quota:6 GB / 24hr(429 + `limit_type: bandwidth_daily`) + - Concurrent stream cap:10 同時 stream / instance(503 + `Retry-After: 30`) + - Stream response timeout:5 分鐘(destroy connection) + + **Range header 處理**:silently ignored、永遠回 200 整段、設 `Accept-Ranges: none` + (不回 416、不切片 MinIO request)。收到 Range header 時寫 audit log + `result.range_attempted`(INFO、forensic baseline)。 + + 詳見 `docs/autoflow/04-architecture/api/api-result.md`。 + security: + - ApiKeyAuth: [] + responses: + '200': + description: NEF binary stream(完整檔、不支援 Range / partial content) + headers: + Content-Type: + schema: + type: string + example: application/octet-stream + Content-Length: + schema: + type: integer + description: NEF 物件 size bytes(從 MinIO 取得) + Content-Disposition: + schema: + type: string + example: 'attachment; filename="yolov5s_720.nef"; filename*=UTF-8''''yolov5s_720.nef' + description: | + `attachment; filename="_.nef"; filename*=UTF-8''` + + - `filename`:ASCII-safe + quote-escape + - `filename*`:RFC 5987 extended(為未來 unicode 預留) + - 缺 source_filename 或 platform → fallback `job_.nef` + Accept-Ranges: + schema: + type: string + example: none + description: '明示不支援 Range request(RFC 7233 §2.3)' + content: + application/octet-stream: + schema: + type: string + format: binary + '401': + $ref: '#/components/responses/Unauthorized' + '404': + description: | + - `job_not_found`:jobID 不存在 + - `result_not_found`:completed 但 result_object_keys 內沒 NEF + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + JobNotFound: + value: + error: + code: job_not_found + message: Job 550e8400-e29b-41d4-a716-446655440000 not found + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + ResultNotFound: + value: + error: + code: result_not_found + message: Job 550e8400-e29b-41d4-a716-446655440000 completed but no NEF result available + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + '409': + description: Job 還沒完成(status !== 'COMPLETED') + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + NotCompleted: + value: + error: + code: job_not_completed + message: 'Job 550e8400-e29b-41d4-a716-446655440000 is ONNX; result only available after completion' + details: + current_status: ONNX + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + '410': + description: NEF 已過期(converter MinIO lifecycle 清掉 / expires_at < now) + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + Expired: + value: + error: + code: result_expired + message: 'Job 550e8400-e29b-41d4-a716-446655440000 result expired at 2026-05-10T00:00:00Z; re-convert to get a fresh result' + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + '429': + description: | + Rate limit 超限或 bandwidth quota 超限。 + - `rate_limit_exceeded`:req-based limit 超限(`limit_type: burst | sustained`) + - `bandwidth_quota_exceeded`:頻寬 quota 超限(`limit_type: bandwidth_hourly | bandwidth_daily`) + headers: + Retry-After: + schema: + type: integer + description: 建議 retry 時間(秒) + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + BurstLimitHit: + value: + error: + code: rate_limit_exceeded + message: 請求頻率過高,請稍後再試 + details: + limit_type: burst + retry_after_seconds: 10 + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + BandwidthQuotaHit: + value: + error: + code: bandwidth_quota_exceeded + message: 下載額度已用完,請稍後再試 + details: + limit_type: bandwidth_hourly + retry_after_seconds: 2847 + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + '502': + description: MinIO 暫時不可用(5xx / 連線錯誤) + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + StorageUnavailable: + value: + error: + code: storage_unavailable + message: 無法讀取結果檔,請稍後重試 + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff + '503': + description: | + - `service_busy`:並發 stream 達上限(concurrent cap) + - `stream_timeout`:response stream 5min timeout + - `service_unavailable`:CONVERTER_API_KEY 未配置 + headers: + Retry-After: + schema: + type: integer + content: + application/json: + schema: + $ref: '#/components/schemas/Error' + examples: + ServiceBusy: + value: + error: + code: service_busy + message: 伺服器忙碌中,請稍後再試 + details: + limit_type: concurrent + retry_after_seconds: 30 + request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff # =========================================================================== # Phase 2 預留端點 — 一律 501 not_implemented @@ -954,7 +1171,7 @@ paths: Phase 1 一律回 501 `not_implemented`。 deprecated: false security: - - BearerAuth: [converter:job.read] + - ApiKeyAuth: [] responses: '501': $ref: '#/components/responses/NotImplemented' @@ -966,36 +1183,29 @@ paths: components: securitySchemes: - BearerAuth: + ApiKeyAuth: type: http scheme: bearer - bearerFormat: JWT + bearerFormat: APIKey description: | - OAuth 2.0 Bearer JWT,由 Innovedus Member Center 簽發。 + Phase 0.8b 起,visionA → converter 採 pre-shared API key 認證(1:1 internal trust)。 - - `iss`:與 `MEMBER_CENTER_ISSUER` env 相符 - - `aud`:含 `kneron_converter_api`(或 env `KNERON_CONVERTER_AUDIENCE` 值) - - `exp`:未過期(含 60 秒 clock skew) - - `scope`:空白分隔字串,必須包含端點要求的 scope - - `client_id`:必須有(用於識別與 rate limit) - - `tenant_id`(可選):若 server 設了 `CONVERTER_TENANT_ID`,須吻合 + - Header 格式:`Authorization: Bearer ` + - Key 為 64 hex chars(128 bits 熵),由 `openssl rand -hex 32` 產生 + - visionA 與 converter 兩端用**完全相同字串**(兩端 env 各自設定) + - 不分 read/write scope(API key 即「caller 是 visionA」的完整證明) + - 不檢查 issuer / audience / tenant / expiration + - Server 端用 `crypto.timingSafeEqual` constant-time compare 防 timing attack - 建議消費者使用 `client_credentials` grant(見下方 `OAuth2ClientCredentials` scheme, - 提供 SDK generator 自動處理 token 取得與 refresh)。 + 失敗: + - 缺 Authorization / 非 Bearer 格式 / token 為空 / key 不符 → 401 `invalid_token` + - server `CONVERTER_API_KEY` env 未設定 → 503 `service_unavailable`(fail-secure) - OAuth2ClientCredentials: - type: oauth2 - description: | - OAuth 2.0 client credentials grant — 用於服務間(VisionA → Converter)認證。 - Member Center 簽發 JWT,Converter 透過 JWKS 驗簽(`MEMBER_CENTER_JWKS_URL`)。 + 詳見 `docs/autoflow/04-architecture/auth.md` §1 + §4。 - Token endpoint URL 由部署環境決定,請查 `MEMBER_CENTER_TOKEN_URL` env 或 ops 文件。 - flows: - clientCredentials: - tokenUrl: https://member-center.example.com/oauth/token - scopes: - 'converter:job.write': 建立 / 修改 job(POST /jobs, POST /jobs/:id/promote) - 'converter:job.read': 查詢 job(GET /jobs, GET /jobs/:id) + > **歷史**:原先設計用 OAuth Bearer JWT(`BearerAuth` + `OAuth2ClientCredentials` + > schemes),詳見 visionA repo `ADR-014` / `ADR-015` v2.1。Phase 0.8b 改 API key 後 + > 兩個 OAuth scheme 已從本 spec 移除。 parameters: @@ -1041,46 +1251,45 @@ components: responses: Unauthorized: - description: Token 無效 / 過期 / 簽章錯 + description: | + API key 不符 / 缺 Authorization header / 非 Bearer 格式 / token 為空。 + + Phase 0.8b 起,所有 401 一律為 `invalid_token`(沒有 `token_expired` — + API key 無過期概念;沒有 `insufficient_scope` / `tenant_mismatch` — + API key 不分 scope / tenant)。 content: application/json: schema: $ref: '#/components/schemas/ApiError' examples: - invalid_token: + missing_header: + summary: 缺 Authorization header value: error: code: invalid_token - message: Token 無效或已過期 + message: 缺少或格式錯誤的 Authorization header(需為 Bearer ) request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff - token_expired: + wrong_key: + summary: API key 不符 value: error: - code: token_expired - message: Token 已過期 + code: invalid_token + message: API key 驗證失敗 request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff - Forbidden: - description: scope 不足 / tenant 不符 + ServiceUnavailable: + description: | + Server 端 `CONVERTER_API_KEY` env 未設定(fail-secure)。設好 env 重啟即可。 content: application/json: schema: $ref: '#/components/schemas/ApiError' examples: - insufficient_scope: + api_key_not_configured: value: error: - code: insufficient_scope - message: token 缺少必要權限 - details: - required_scope: converter:job.write - provided_scopes: [converter:job.read] - request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff - tenant_mismatch: - value: - error: - code: tenant_mismatch - message: tenant_id 與 Converter 配置不符 + code: service_unavailable + message: API key not configured request_id: 7c6e4f3b-1a2b-4c3d-9e8f-aabbccddeeff JobNotFound: @@ -1155,9 +1364,6 @@ components: - validation_error - invalid_multipart - invalid_token - - token_expired - - insufficient_scope - - tenant_mismatch - job_not_found - not_found - user_has_active_job @@ -1170,6 +1376,7 @@ components: - file_gateway_unavailable - auth_service_unavailable - service_busy + - service_unavailable - rate_limit_exceeded - internal_error - not_implemented diff --git a/apps/task-scheduler/env.example b/apps/task-scheduler/env.example index bc6331a..4b4e26c 100644 --- a/apps/task-scheduler/env.example +++ b/apps/task-scheduler/env.example @@ -1,5 +1,5 @@ ############################################################################### -# Task Scheduler 環境變數範本(Phase 1 完整版,T10 收斂) +# Task Scheduler 環境變數範本(Phase 0.8b 收斂版) # # 三類分區(依顯示順序): # 1. 必填(production 必須設真實值)— 缺漏會 fail-fast,process exit code 1 @@ -10,6 +10,15 @@ # - 切勿 commit `.env`(已在 .gitignore;歷史 commit 待 D7 處理) # - production 用 secret manager(Vault / AWS Secrets Manager),不要直接設環境變數 # - 任何含 `REPLACE-ME` 字樣或 `.invalid` TLD 的值,部署前必須替換 +# +# Phase 0.8b A4 砍除(visionA → converter 改用 API key、不再走 OAuth/JWKS): +# - MEMBER_CENTER_ISSUER / MEMBER_CENTER_JWKS_URL +# - KNERON_CONVERTER_AUDIENCE / CONVERTER_TENANT_ID +# - CONVERTER_SCOPE_WRITE / CONVERTER_SCOPE_READ +# - JWKS_CACHE_MAX_AGE_MS / JWKS_COOLDOWN_MS / JWT_CLOCK_TOLERANCE_SEC +# converter → FAA OAuth client 鏈條保留不動(MEMBER_CENTER_TOKEN_URL / +# KNERON_CONVERTER_CLIENT_* / FILE_ACCESS_AGENT_* / OAUTH_TOKEN_*),那條與 +# visionA → converter 對外認證無關。 ############################################################################### @@ -79,47 +88,61 @@ MINIO_LIFECYCLE_DAYS=7 # ============================================================================= -# 7. OAuth / Member Center(必填) +# 7. visionA → converter API Key 認證(Phase 0.8b 必填) # ============================================================================= # -# ⚠️ 下方 `*.invalid` 主機名都是 RFC 2606 保留 TLD,DNS 永不解析。 -# 本地開發跑「不需 OAuth 的 legacy /jobs 流程」可直接照抄; -# production 部署前務必替換為真實 Member Center URL,否則 token 驗證 / 取得會 DNS 失敗。 +# visionA-backend ↔ converter 為 1:1 internal trust,採 pre-shared API key +# 取代 OAuth client_credentials(ADR-015 v2.1)。converter 端只需這一把 key, +# 無 issuer / audience / scope / tenant 概念。 # -# 三組 URL 通常來自同一個 Member Center 服務: -# - ISSUER:JWT 的 iss claim 比對基準 -# - JWKS_URL:取公鑰用,做 JWT 簽章驗證 -# - TOKEN_URL:Converter 自己取 token 用(client_credentials grant) +# 產生: +# $ openssl rand -hex 32 +# # 輸出 64 hex chars(128 bits 熵),對齊 NIST SP 800-131A +# +# 部署: +# - converter `.env`:CONVERTER_API_KEY=<產出字串> +# - visionA `.env.stage`:VISIONA_CONVERTER_API_KEY=<相同字串> +# - **兩端必須完全相同**;rotate 時雙端同時更新 +# +# 安全: +# - 絕不 commit 進 git(`.env` 已在 .gitignore) +# - 絕不寫入 log / Slack / email / 對話 +# - 每個環境獨立 key(dev / stage / prod 各自 `openssl rand -hex 32`) +# - 詳見 docs/autoflow/04-architecture/auth.md §1 + §4 +# +# 為什麼是 optional(不 requireEnv): +# - dev 環境可能還沒設 key(local 跑 legacy Web UI 路徑也應該能啟動) +# - apiKeyMiddleware 自己會做 fail-fast(env 未設時對外 API 一律回 503) +# - 啟動時會印 warn log 提醒,避免無聲問題 + +CONVERTER_API_KEY= + + +# ============================================================================= +# 8. Member Center Token Endpoint(converter → FAA OAuth client 用,保留) +# ============================================================================= +# +# 給 oauthClient.js 取 service token 用(promote 流程把結果檔 PUT 到 FAA 之前要 +# 先用 client_credentials grant 取 token)。visionA → converter 路線**不再經此**。 +# +# ⚠️ `.invalid` 為 RFC 2606 保留 TLD,DNS 永不解析。本地開發跑「不需 promote 的 +# legacy /jobs 流程」可直接照抄;production 部署前務必替換為真實 Member Center +# URL,否則 promote 階段取 token 會 DNS 失敗。 -MEMBER_CENTER_ISSUER=https://auth.example.invalid -MEMBER_CENTER_JWKS_URL=https://auth.example.invalid/.well-known/jwks MEMBER_CENTER_TOKEN_URL=https://auth.example.invalid/oauth/token # ============================================================================= -# 8. Converter 身份(必填) +# 9. Converter OAuth Client 身份(converter → FAA 用,保留) # ============================================================================= # -# Converter 同時是: -# - Resource Server:接收 visionA-backend 的 token,audience 必須為 KNERON_CONVERTER_AUDIENCE -# - OAuth Client:自己去 Member Center 取 token 打 File Access Agent;身份用 client_id / secret +# Converter 在 promote 階段以 OAuth client 身份去 Member Center 取 +# `files:upload.write` scope token,再 PUT 到 File Access Agent。 +# 兩者必須成對出現。 -KNERON_CONVERTER_AUDIENCE=kneron_converter_api KNERON_CONVERTER_CLIENT_ID=kneron_converter_dev KNERON_CONVERTER_CLIENT_SECRET=REPLACE-ME-IN-PRODUCTION -# 若需 tenant 隔離,設此值;空字串代表不檢查 tenant claim -CONVERTER_TENANT_ID= - - -# ============================================================================= -# 9. Scope 命名(可選,預設對齊 TDD §8) -# ============================================================================= -# 通常不需改;除非 Member Center 端命名不一樣 - -# CONVERTER_SCOPE_WRITE=converter:job.write -# CONVERTER_SCOPE_READ=converter:job.read - # ============================================================================= # 10. File Access Agent(必填) @@ -143,17 +166,7 @@ FILE_ACCESS_AGENT_AUDIENCE=file_access_api # ============================================================================= -# 12. JWKS / JWT 行為(可選) -# ============================================================================= -# 預設值對齊 TDD §5.1。 - -# JWKS_CACHE_MAX_AGE_MS=600000 # JWKS cache 有效期(10 分鐘) -# JWKS_COOLDOWN_MS=30000 # 同 kid 連續 miss 的 cooldown(30 秒) -# JWT_CLOCK_TOLERANCE_SEC=60 # 時鐘偏差容忍(秒) - - -# ============================================================================= -# 13. OAuth Client cache(可選) +# 12. OAuth Client cache(converter → FAA 用,可選) # ============================================================================= # OAUTH_TOKEN_REFRESH_SKEW_MS=60000 # token 距 expiresAt 還剩多少 ms 主動 refresh @@ -161,7 +174,7 @@ FILE_ACCESS_AGENT_AUDIENCE=file_access_api # ============================================================================= -# 14. Multipart 上傳上限(可選,T10 修 D5) +# 13. Multipart 上傳上限(可選,T10 修 D5) # ============================================================================= # # 為什麼用 env: @@ -176,7 +189,7 @@ FILE_ACCESS_AGENT_AUDIENCE=file_access_api # ============================================================================= -# 15. Upload concurrency(可選,T10 修 D5) +# 14. Upload concurrency(可選,T10 修 D5) # ============================================================================= # # 為什麼需要: @@ -191,10 +204,81 @@ FILE_ACCESS_AGENT_AUDIENCE=file_access_api # ============================================================================= -# 16. Per-client_id rate limit(可選,T3 起) +# 15. Per-client_id rate limit(可選,T3 起) # ============================================================================= # 對 /api/v1/* 套用,window 內每個 client_id 最多 max 個 request。 # 預設 5min / 300 req(對齊 TDD §1.1)。 +# +# Phase 0.8b A3 後,clientId 在 API key 路線下寫死為 `'visionA-service'`; +# rate limit 仍然套用、實質為 visionA → converter 整體上限。 # API_V1_RATE_LIMIT_WINDOW_MS=300000 # API_V1_RATE_LIMIT_MAX=300 + + +# ============================================================================= +# 16. Trust Proxy(可選,Phase 0.8b A7) +# ============================================================================= +# 給 Express `app.set('trust proxy', ...)` 用,影響 `req.ip` 取得真實 caller IP。 +# 直接影響 audit log 中 `source_ip` 欄位的 forensic 價值。 +# +# 為什麼需要: +# converter 跑在 Nginx / cloud LB 後面時,Node 看到的 remote address 永遠是 +# 反代理的內網 IP;必須信任 `X-Forwarded-For` header 才能取真實 caller IP。 +# +# ⚠️ 安全提醒(極度重要): +# - 設過寬(如 `true`)→ attacker 可偽造 `X-Forwarded-For` 欺騙 audit log +# - 設過嚴(如 stage / prod 留 'loopback')→ source_ip 永遠是反代理 IP、forensic 失效 +# - 必須跟實際部署架構一致;不確定時請問 DevOps / SRE +# +# 部署建議: +# - local dev / 測試環境:留空(預設 `loopback`,只信任 127.0.0.1 / ::1) +# - stage / prod(前面 1 層 Nginx):TRUST_PROXY=1 +# - stage / prod(cloud LB + Nginx 兩層):TRUST_PROXY=2 +# - 明確 CIDR:TRUST_PROXY="loopback, 10.0.0.0/8" +# +# 接受值:boolean(`true` / `false`)/ 整數 / 字串(含 'loopback' / 'linklocal' +# / 'uniquelocal' / CIDR)。詳見 Express 文件: +# https://expressjs.com/en/guide/behind-proxies.html + +# TRUST_PROXY=1 + + +# ============================================================================= +# 17. GET /api/v1/jobs/:id/result(Phase 0.8b Phase B 新增) +# ============================================================================= +# +# `/result` 端點給 visionA-backend streaming proxy NEF binary。所有限制 +# 都以 `token_fingerprint`(sha256(api_key).slice(0,12))為 bucket key。 +# +# 設計:docs/autoflow/04-architecture/api/api-result.md §9 / §15 +# +# 為什麼 single instance 部署可接受 in-memory counter: +# - Phase 0.8b 部署是單 Node process;counter 在 process 內 atomic +# - Phase 2 多 instance 部署前必切 Redis(否則 limit 會被「乘以 instance 數」 +# 放鬆);見 security.md 候補 #8(HIGH) + +# Stream response timeout(AC-7)— 預設 5 分鐘(300_000 ms) +# 5 min 最低 throughput ≈ 1.7 MB/s;合法 client 即使中等網路也能拿完 500MB +# RESULT_STREAM_TIMEOUT_MS=300000 + +# Concurrent stream cap(AC-4)— 預設 10 個同時 stream(per-instance) +# 超過 → 503 service_busy + Retry-After: 30 +# 平衡:normal load P95 < 5、留 2× headroom;blast radius 可控 +# MAX_CONCURRENT_RESULT_STREAMS=10 + +# Burst rate limit(AC-2)— 預設 5 req / 10 sec(per token_fingerprint) +# 阻擋短時間 burst 攻擊;允許 visionA retry pattern +# RESULT_RATE_LIMIT_BURST_PER_10S=5 +# RESULT_RATE_LIMIT_BURST_WINDOW_MS=10000 + +# Sustained rate limit(AC-2)— 預設 20 req / 1 min(per token_fingerprint) +# 涵蓋 visionA P95 normal load + 1.7× headroom;阻擋持續 mass request +# RESULT_RATE_LIMIT_SUSTAINED_PER_MIN=20 +# RESULT_RATE_LIMIT_SUSTAINED_WINDOW_MS=60000 + +# Bandwidth quota(AC-3)— 預設 1 GB / hr + 6 GB / 24hr(per token_fingerprint) +# 阻擋 attacker 用「剛好踩線 req count」配大檔的 bandwidth abuse +# 1 GB = 1073741824 / 6 GB = 6442450944 +# RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES=1073741824 +# RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES=6442450944 diff --git a/apps/task-scheduler/package-lock.json b/apps/task-scheduler/package-lock.json index f3ebd95..555ec00 100644 --- a/apps/task-scheduler/package-lock.json +++ b/apps/task-scheduler/package-lock.json @@ -856,13 +856,14 @@ } }, "node_modules/@aws-sdk/xml-builder": { - "version": "3.972.16", - "resolved": "https://registry.npmjs.org/@aws-sdk/xml-builder/-/xml-builder-3.972.16.tgz", - "integrity": "sha512-iu2pyvaqmeatIJLURLqx9D+4jKAdTH20ntzB6BFwjyN7V960r4jK32mx0Zf7YbtOYAbmbtQfDNuL60ONinyw7A==", + "version": "3.972.24", + "resolved": "https://registry.npmjs.org/@aws-sdk/xml-builder/-/xml-builder-3.972.24.tgz", + "integrity": "sha512-V8z5YcDPfsvzrBlj0xR1vhRtocblhYbqdreCJB/voGd4Sr5zjNAeWxexbnqVtskTJe0vFb5KMqbSL++ePl+zRw==", "license": "Apache-2.0", "dependencies": { - "@smithy/types": "^4.13.1", - "fast-xml-parser": "5.5.8", + "@nodable/entities": "2.1.0", + "@smithy/types": "^4.14.1", + "fast-xml-parser": "5.7.3", "tslib": "^2.6.2" }, "engines": { @@ -1799,6 +1800,18 @@ "@jridgewell/sourcemap-codec": "^1.4.14" } }, + "node_modules/@nodable/entities": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/@nodable/entities/-/entities-2.1.0.tgz", + "integrity": "sha512-nyT7T3nbMyBI/lvr6L5TyWbFJAI9FTgVRakNoBqCD+PmID8DzFrrNdLLtHMwMszOtqZa8PAOV24ZqDnQrhQINA==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/nodable" + } + ], + "license": "MIT" + }, "node_modules/@sinclair/typebox": { "version": "0.27.10", "resolved": "https://registry.npmjs.org/@sinclair/typebox/-/typebox-0.27.10.tgz", @@ -2301,9 +2314,9 @@ } }, "node_modules/@smithy/types": { - "version": "4.13.1", - "resolved": "https://registry.npmjs.org/@smithy/types/-/types-4.13.1.tgz", - "integrity": "sha512-787F3yzE2UiJIQ+wYW1CVg2odHjmaWLGksnKQHUrK/lYZSEcy1msuLVvxaR/sI2/aDe9U+TBuLsXnr3vod1g0g==", + "version": "4.14.2", + "resolved": "https://registry.npmjs.org/@smithy/types/-/types-4.14.2.tgz", + "integrity": "sha512-P+otAxbV4CqBybp7EkcJCrig63yE2E7PuNVOmilVMRcx/O+QDzGULTrKsq4DV13gSfak9ObPrWaHl/9bL5YcWw==", "license": "Apache-2.0", "dependencies": { "tslib": "^2.6.2" @@ -3768,9 +3781,9 @@ "license": "MIT" }, "node_modules/fast-xml-builder": { - "version": "1.1.4", - "resolved": "https://registry.npmjs.org/fast-xml-builder/-/fast-xml-builder-1.1.4.tgz", - "integrity": "sha512-f2jhpN4Eccy0/Uz9csxh3Nu6q4ErKxf0XIsasomfOihuSUa3/xw6w8dnOtCDgEItQFJG8KyXPzQXzcODDrrbOg==", + "version": "1.2.0", + "resolved": "https://registry.npmjs.org/fast-xml-builder/-/fast-xml-builder-1.2.0.tgz", + "integrity": "sha512-00aAWieqff+ZJhsXA4g1g7M8k+7AYoMUUHF+/zFb5U6Uv/P0Vl4QZo84/IcufzYalLuEj9928bXN9PbbFzMF0Q==", "funding": [ { "type": "github", @@ -3779,13 +3792,14 @@ ], "license": "MIT", "dependencies": { - "path-expression-matcher": "^1.1.3" + "path-expression-matcher": "^1.5.0", + "xml-naming": "^0.1.0" } }, "node_modules/fast-xml-parser": { - "version": "5.5.8", - "resolved": "https://registry.npmjs.org/fast-xml-parser/-/fast-xml-parser-5.5.8.tgz", - "integrity": "sha512-Z7Fh2nVQSb2d+poDViM063ix2ZGt9jmY1nWhPfHBOK2Hgnb/OW3P4Et3P/81SEej0J7QbWtJqxO05h8QYfK7LQ==", + "version": "5.7.3", + "resolved": "https://registry.npmjs.org/fast-xml-parser/-/fast-xml-parser-5.7.3.tgz", + "integrity": "sha512-C0AaNuC+mscy6vrAQKAc/rMq+zAPHodfHGZu4sGVehvAQt/JLG1O5zEcYcXSY5zSqr4YVgxsB+pHXTq0i7eDlg==", "funding": [ { "type": "github", @@ -3794,9 +3808,10 @@ ], "license": "MIT", "dependencies": { - "fast-xml-builder": "^1.1.4", - "path-expression-matcher": "^1.2.0", - "strnum": "^2.2.0" + "@nodable/entities": "^2.1.0", + "fast-xml-builder": "^1.1.7", + "path-expression-matcher": "^1.5.0", + "strnum": "^2.2.3" }, "bin": { "fxparser": "src/cli/cli.js" @@ -5767,9 +5782,9 @@ } }, "node_modules/path-expression-matcher": { - "version": "1.2.0", - "resolved": "https://registry.npmjs.org/path-expression-matcher/-/path-expression-matcher-1.2.0.tgz", - "integrity": "sha512-DwmPWeFn+tq7TiyJ2CxezCAirXjFxvaiD03npak3cRjlP9+OjTmSy1EpIrEbh+l6JgUundniloMLDQ/6VTdhLQ==", + "version": "1.5.0", + "resolved": "https://registry.npmjs.org/path-expression-matcher/-/path-expression-matcher-1.5.0.tgz", + "integrity": "sha512-cbrerZV+6rvdQrrD+iGMcZFEiiSrbv9Tfdkvnusy6y0x0GKBXREFg/Y65GhIfm0tnLntThhzCnfKwp1WRjeCyQ==", "funding": [ { "type": "github", @@ -6494,9 +6509,9 @@ } }, "node_modules/strnum": { - "version": "2.2.2", - "resolved": "https://registry.npmjs.org/strnum/-/strnum-2.2.2.tgz", - "integrity": "sha512-DnR90I+jtXNSTXWdwrEy9FakW7UX+qUZg28gj5fk2vxxl7uS/3bpI4fjFYVmdK9etptYBPNkpahuQnEwhwECqA==", + "version": "2.3.0", + "resolved": "https://registry.npmjs.org/strnum/-/strnum-2.3.0.tgz", + "integrity": "sha512-ums3KNd42PGyx5xaoVTO1mjU1bH3NpY4vsrVlnv9PNGqQj8wd7rJ6nEypLrJ7z5vxK5RP0yMLo6J/Gsm62DI5Q==", "funding": [ { "type": "github", @@ -6804,6 +6819,21 @@ "node": "^12.13.0 || ^14.15.0 || >=16.0.0" } }, + "node_modules/xml-naming": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/xml-naming/-/xml-naming-0.1.0.tgz", + "integrity": "sha512-k8KO9hrMyNk6tUWqUfkTEZbezRRpONVOzUTnc97VnCvyj6Tf9lyUR9EDAIeiVLv56jsMcoXEwjW8Kv5yPY52lw==", + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/NaturalIntelligence" + } + ], + "license": "MIT", + "engines": { + "node": ">=16.0.0" + } + }, "node_modules/xtend": { "version": "4.0.2", "resolved": "https://registry.npmjs.org/xtend/-/xtend-4.0.2.tgz", diff --git a/apps/task-scheduler/src/__tests__/config.test.js b/apps/task-scheduler/src/__tests__/config.test.js index 21ce740..2eb917c 100644 --- a/apps/task-scheduler/src/__tests__/config.test.js +++ b/apps/task-scheduler/src/__tests__/config.test.js @@ -17,10 +17,11 @@ const ENV_KEYS_TO_BACKUP = [ // 必填(缺漏 throw)— 測試前必須補齊 - 'MEMBER_CENTER_ISSUER', - 'MEMBER_CENTER_JWKS_URL', + // Phase 0.8b A4 砍除:MEMBER_CENTER_ISSUER / MEMBER_CENTER_JWKS_URL / + // KNERON_CONVERTER_AUDIENCE(visionA → converter 不再走 OAuth/JWKS) + // 保留:MEMBER_CENTER_TOKEN_URL / KNERON_CONVERTER_CLIENT_ID/SECRET(oauthClient + // for converter → FAA 仍需) 'MEMBER_CENTER_TOKEN_URL', - 'KNERON_CONVERTER_AUDIENCE', 'KNERON_CONVERTER_CLIENT_ID', 'KNERON_CONVERTER_CLIENT_SECRET', 'FILE_ACCESS_AGENT_BASE_URL', @@ -32,11 +33,12 @@ const ENV_KEYS_TO_BACKUP = [ 'MAX_CONCURRENT_UPLOADS', 'UPLOAD_RETRY_AFTER_SECONDS', // 其他 optional - 'CONVERTER_TENANT_ID', - 'CONVERTER_SCOPE_WRITE', - 'CONVERTER_SCOPE_READ', + // Phase 0.8b A4 砍除:CONVERTER_TENANT_ID / CONVERTER_SCOPE_* / JWKS_* / + // JWT_CLOCK_TOLERANCE_SEC(visionA → converter 已不做 tenant / scope check) 'PROMOTE_TIMEOUT_MS', 'NODE_ENV', + // Phase 0.8b A2 新增 + 'CONVERTER_API_KEY', ]; let backedUpEnv = {}; @@ -61,10 +63,8 @@ function restoreEnv() { function setMinimumValidEnv() { // 滿足必填 — 用 .invalid placeholder(DNS 不解析,安全) - process.env.MEMBER_CENTER_ISSUER = 'https://auth.test.invalid'; - process.env.MEMBER_CENTER_JWKS_URL = 'https://auth.test.invalid/.well-known/jwks'; + // Phase 0.8b A4:砍除 ISSUER / JWKS_URL / CONVERTER_AUDIENCE(不再讀) process.env.MEMBER_CENTER_TOKEN_URL = 'https://auth.test.invalid/oauth/token'; - process.env.KNERON_CONVERTER_AUDIENCE = 'kneron_converter_api'; process.env.KNERON_CONVERTER_CLIENT_ID = 'kneron_converter_test'; process.env.KNERON_CONVERTER_CLIENT_SECRET = 'test-secret'; process.env.FILE_ACCESS_AGENT_BASE_URL = 'https://files.test.invalid'; @@ -187,10 +187,54 @@ describe('config — multipart object is frozen', () => { }); }); +describe('config — converter.apiKey (Phase 0.8b)', () => { + // 抑制 warn / info log,避免測試輸出被結構化 log 蓋掉 + let _origWarn; + let _origLog; + beforeAll(() => { + _origWarn = console.warn; + _origLog = console.log; + console.warn = () => {}; + console.log = () => {}; + }); + afterAll(() => { + console.warn = _origWarn; + console.log = _origLog; + }); + + it('defaults to empty string when CONVERTER_API_KEY env not set', () => { + // 不設 env — 看預設行為(warn-only,不 throw) + const cfg = loadConfigFresh(); + expect(cfg.converter.apiKey).toBe(''); + }); + + it('reads CONVERTER_API_KEY from env when set', () => { + process.env.CONVERTER_API_KEY = + 'a3f9b2c1d8e7f6a5b4c3d2e1f0987654321fedcba9876543210abcdef1234567'; + const cfg = loadConfigFresh(); + expect(cfg.converter.apiKey).toBe( + 'a3f9b2c1d8e7f6a5b4c3d2e1f0987654321fedcba9876543210abcdef1234567' + ); + }); + + it('trims whitespace from CONVERTER_API_KEY', () => { + process.env.CONVERTER_API_KEY = ' secretkey123 '; + const cfg = loadConfigFresh(); + expect(cfg.converter.apiKey).toBe('secretkey123'); + }); + + it('does not throw when CONVERTER_API_KEY is empty (warn-only)', () => { + delete process.env.CONVERTER_API_KEY; + expect(() => loadConfigFresh()).not.toThrow(); + }); +}); + describe('config — fail fast on missing required env (regression check)', () => { - it('throws when MEMBER_CENTER_ISSUER missing', () => { - delete process.env.MEMBER_CENTER_ISSUER; - expect(() => loadConfigFresh()).toThrow(/MEMBER_CENTER_ISSUER/); + // Phase 0.8b A4:MEMBER_CENTER_ISSUER 已被砍除(不再驗 JWKS) + // 改測 MEMBER_CENTER_TOKEN_URL(oauthClient 用、仍必填)作為 regression check + it('throws when MEMBER_CENTER_TOKEN_URL missing', () => { + delete process.env.MEMBER_CENTER_TOKEN_URL; + expect(() => loadConfigFresh()).toThrow(/MEMBER_CENTER_TOKEN_URL/); }); it('throws when KNERON_CONVERTER_CLIENT_SECRET missing', () => { diff --git a/apps/task-scheduler/src/app.js b/apps/task-scheduler/src/app.js index 5378fa8..588f0d9 100644 --- a/apps/task-scheduler/src/app.js +++ b/apps/task-scheduler/src/app.js @@ -60,9 +60,35 @@ function createApp(deps, opts) { const app = express(); + // Phase 0.8b A7:trust proxy — 影響 `req.ip` 取真實 caller IP(給 audit log forensic 用)。 + // + // 設計:從 opts.config.trustProxy 取(config.js 已從 TRUST_PROXY env 讀取並 normalize), + // 缺漏 fallback 'loopback'(最安全的預設、只信任 localhost)。 + // + // ⚠️ 安全 trade-off:設過寬(如 `true`)會讓 attacker 可偽造 X-Forwarded-For 欺騙 + // audit log。每個部署環境必須與實際 hop 數一致。 + const trustProxyValue = + opts && opts.config && Object.prototype.hasOwnProperty.call(opts.config, 'trustProxy') + ? opts.config.trustProxy + : 'loopback'; + app.set('trust proxy', trustProxyValue); + + // 啟動時印一行 INFO log,讓 ops 能在啟動 log 確認當前 trust proxy 設定(不含敏感資訊)。 + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + level: 'INFO', + service: 'task-scheduler', + action: 'app.trust_proxy_configured', + // 注意:值本身不算 secret(部署架構描述),可直接印出方便除錯 + trust_proxy: typeof trustProxyValue === 'string' ? trustProxyValue : String(trustProxyValue), + timestamp: new Date().toISOString(), + }) + ); + app.use(helmet()); // T3:requestId 必須早於所有需要 log 或回 error response 的 middleware, - // 確保 morgan / errorHandler / requireAuth 都能拿到 req.requestId。 + // 確保 morgan / errorHandler / requireApiKey 都能拿到 req.requestId。 app.use(requestIdMiddleware); app.use(compression()); app.use(morgan('short')); diff --git a/apps/task-scheduler/src/auth/__tests__/apiKeyMiddleware.test.js b/apps/task-scheduler/src/auth/__tests__/apiKeyMiddleware.test.js new file mode 100644 index 0000000..73c1113 --- /dev/null +++ b/apps/task-scheduler/src/auth/__tests__/apiKeyMiddleware.test.js @@ -0,0 +1,874 @@ +/** + * Unit tests for src/auth/apiKeyMiddleware.js + * + * Phase 0.8b A1 寫 20 case smoke、A6 補完成完整套件: + * 1. Happy path(A1 / 維持) + * 2. 驗證失敗各情境(A1 / 維持) + * 3. fail-fast:env 未設定(A1 / 維持) + * 4. _internals smoke(A1 / 維持) + * + * A6 新增: + * 5. 邊界:特殊字元(/ = unicode)+ 超長 key 不 DoS + * 6. 邊界:Authorization header 為陣列(多值)→ Express 行為(取第一個或拒絕) + * 7. error path:sendApiKeyError 在 headersSent 情境的 destroy 行為 + * 8. Log 不洩漏:spy console.error/log 確認任何 path 都沒印出 expected key 或 token + * 9. Timing(sanity check):跑 N 次 wrong key(不同 prefix / suffix / 全異) + * elapsed 差異 < 5%(非嚴格 timing test,僅 sanity) + * + * 測試策略: + * - 用 deps.expectedApiKey 注入,不碰 config.js(避免 env 副作用) + * - 借鏡 middleware.test.js 的 makeReqResNext fixture(簡化版 req/res mock) + * - 抑制 middleware 內 error log(驗證失敗時是正常行為) + */ + +'use strict'; + +const apiKeyModule = require('../apiKeyMiddleware'); + +// ---------------------------------------------------------------------------- +// 共用 fixture +// ---------------------------------------------------------------------------- + +const TEST_API_KEY = 'a3f9b2c1d8e7f6a5b4c3d2e1f0987654321fedcba9876543210abcdef1234567'; +const WRONG_API_KEY = 'wrongwrongwrongwrongwrongwrongwrongwrongwrongwrongwrongwrongwron'; + +// 抑制 middleware 內 error log(503 fail-fast / unexpected_error 的 log 是正常行為, +// 斷言驗 status code 即可,log 內容非斷言對象) +// 抑制 middleware 內 audit log(A7:success / failure path 都會印 INFO/WARN 結構化 log, +// 對 happy path test 是正常輸出但會污染 jest stdout;個別需要 spy 的 test 自己 restore) +let _origError; +let _origLog; +beforeAll(() => { + _origError = console.error; + _origLog = console.log; + console.error = () => {}; + console.log = () => {}; +}); +afterAll(() => { + console.error = _origError; + console.log = _origLog; +}); + +/** + * 建立簡化版 req / res / next 三件套(對齊 middleware.test.js makeReqResNext)。 + */ +function makeReqResNext(authHeader, opts = {}) { + const socket = { + destroyed: false, + destroy: jest.fn(function destroyImpl() { + socket.destroyed = true; + }), + }; + const req = { + headers: authHeader === undefined ? {} : { authorization: authHeader }, + socket, + requestId: 'req-test-api-key-001', + ...opts.reqExtra, + }; + + const headers = {}; + const finishListeners = []; + const res = { + headersSent: opts.headersSentInitially === true, + statusCode: 200, + body: null, + setHeader: jest.fn((k, v) => { + headers[k] = v; + }), + getHeader: (k) => headers[k], + status: jest.fn(function statusImpl(code) { + res.statusCode = code; + return res; + }), + json: jest.fn(function jsonImpl(body) { + res.body = body; + res.headersSent = true; + // 模擬 'finish' 事件(async 觸發,下個 microtask) + Promise.resolve().then(() => { + for (const l of finishListeners.splice(0)) { + try { + l(); + } catch (_) { + /* noop */ + } + } + }); + return res; + }), + once: jest.fn((evt, cb) => { + if (evt === 'finish') finishListeners.push(cb); + }), + on: jest.fn((evt, cb) => { + if (evt === 'finish') finishListeners.push(cb); + }), + _flush: () => new Promise((resolve) => setImmediate(resolve)), + }; + + const next = jest.fn(); + return { req, res, next, socket, headers }; +} + +// ---------------------------------------------------------------------------- +// Tests — requireApiKey: happy path +// ---------------------------------------------------------------------------- + +describe('requireApiKey — happy path', () => { + it('should call next() and set req.auth on correct API key', () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket } = makeReqResNext(`Bearer ${TEST_API_KEY}`); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + expect(socket.destroy).not.toHaveBeenCalled(); + expect(res.status).not.toHaveBeenCalled(); + expect(req.auth).toEqual({ + sub: 'visionA-service', + clientId: 'visionA-service', + tenantId: null, + scopes: ['converter:job.write', 'converter:job.read'], + // Phase 0.8b Phase B:tokenFingerprint 給 /result 端點 bandwidth quota / + // rate limit / audit log 用(§9.2 / §11.2)。值是 sha256(token).slice(0,12)。 + tokenFingerprint: apiKeyModule._internals.tokenFingerprint(TEST_API_KEY), + raw: { authType: 'api_key' }, + }); + // Sanity:fingerprint 是 12 hex chars(48 bit) + expect(req.auth.tokenFingerprint).toMatch(/^[a-f0-9]{12}$/); + }); + + it('should accept lowercase "bearer" prefix', () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(`bearer ${TEST_API_KEY}`); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + expect(req.auth).toBeDefined(); + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — requireApiKey: failure paths +// ---------------------------------------------------------------------------- + +describe('requireApiKey — 驗證失敗路徑', () => { + it('should 401 invalid_token when Authorization header missing', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket, headers } = makeReqResNext(undefined); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(res.body.error.request_id).toBe('req-test-api-key-001'); + expect(headers['Connection']).toBe('close'); + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + expect(req.auth).toBeUndefined(); + }); + + it('should 401 invalid_token when Authorization is not Bearer format', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket } = makeReqResNext('Basic abc123'); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + }); + + it('should 401 invalid_token when token is empty after Bearer', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket } = makeReqResNext('Bearer '); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + }); + + it('should 401 invalid_token when API key does not match', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket, headers } = makeReqResNext(`Bearer ${WRONG_API_KEY}`); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(headers['Connection']).toBe('close'); + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + expect(req.auth).toBeUndefined(); + }); + + it('should 401 invalid_token when token length differs from expected', async () => { + // 短於 expected — 驗證 constantTimeEquals 長度先比的分支 + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next, socket } = makeReqResNext('Bearer short-token'); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — requireApiKey: fail-fast on missing config +// ---------------------------------------------------------------------------- + +describe('requireApiKey — fail-fast on missing config', () => { + it('should 503 service_unavailable when expectedApiKey is empty string', async () => { + // 明確注入空字串:模擬 CONVERTER_API_KEY env 未設 + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + const { req, res, next, socket, headers } = makeReqResNext(`Bearer ${TEST_API_KEY}`); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(503); + expect(res.body.error.code).toBe('service_unavailable'); + expect(res.body.error.message).toBe('API key not configured'); + expect(res.body.error.request_id).toBe('req-test-api-key-001'); + expect(headers['Connection']).toBe('close'); + // 503 也要 destroy socket(避免 client 繼續送 body) + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + expect(req.auth).toBeUndefined(); + }); + + it('should 503 even when Authorization header is missing (config check first)', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + const { req, res, next } = makeReqResNext(undefined); + + middleware(req, res, next); + await res._flush(); + + // Fail-fast 在 token 檢查之前,所以即使沒 header 也是 503 + expect(res.statusCode).toBe(503); + expect(res.body.error.code).toBe('service_unavailable'); + expect(next).not.toHaveBeenCalled(); + }); +}); + +// ---------------------------------------------------------------------------- +// A6 新增:邊界 — 特殊字元 + 超長 key +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A6 邊界:特殊字元 + 超長 key', () => { + it('correctly matches API key containing slashes / equals / pluses', () => { + // base64-like key 可能含 / = + 字元 + const specialKey = 'abc/def+ghi==xyz'; + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: specialKey }); + const { req, res, next } = makeReqResNext(`Bearer ${specialKey}`); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + expect(res.status).not.toHaveBeenCalled(); + expect(req.auth).toBeDefined(); + }); + + it('correctly matches API key containing unicode (utf8 bytes preserved)', () => { + // 雖然實務上 API key 不會用 unicode(增加部署複雜度),但 constantTimeEquals + // 應該安全地處理 utf8 byte compare + const unicodeKey = '密鑰-test-🔑-fixture'; + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: unicodeKey }); + const { req, res, next } = makeReqResNext(`Bearer ${unicodeKey}`); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + expect(req.auth).toBeDefined(); + }); + + it('rejects unicode key when bytes differ from expected', async () => { + const expected = '密鑰-test-A'; + const wrong = '密鑰-test-B'; // 同 byte 長度但最後一個 byte 不同 + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: expected }); + const { req, res, next } = makeReqResNext(`Bearer ${wrong}`); + + middleware(req, res, next); + await res._flush(); + + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(next).not.toHaveBeenCalled(); + }); + + it('handles very long key without DoS (10MB) — constantTimeEquals is O(n) but reasonable', () => { + // 1MB 長度 key 在 constant-time compare 下仍 O(n) 但 n=1MB 對 Node Buffer + // 是輕量級操作。驗證不 throw、不卡死。 + // 用 1MB 而非 10MB 是因為 Buffer.from('utf8') 對 unicode-free ASCII 是 1 byte/char + // 但 jest worker 預設 heap 是 512MB,10MB string 仍會有一些壓力。1MB 足夠 sanity 驗證。 + const longKey = 'x'.repeat(1024 * 1024); // 1 MB + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: longKey }); + const { req, res, next } = makeReqResNext(`Bearer ${longKey}`); + + const start = Date.now(); + middleware(req, res, next); + const elapsed = Date.now() - start; + + expect(next).toHaveBeenCalledTimes(1); + // 1MB constant-time compare 應在 100ms 內完成(多數現代機器 < 10ms) + expect(elapsed).toBeLessThan(500); + }); + + it('rejects very long wrong key without DoS', async () => { + const expected = 'x'.repeat(1024 * 1024); + const wrong = 'y'.repeat(1024 * 1024); + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: expected }); + const { req, res, next } = makeReqResNext(`Bearer ${wrong}`); + + const start = Date.now(); + middleware(req, res, next); + await res._flush(); + const elapsed = Date.now() - start; + + expect(res.statusCode).toBe(401); + expect(next).not.toHaveBeenCalled(); + expect(elapsed).toBeLessThan(500); + }); +}); + +// ---------------------------------------------------------------------------- +// A6 新增:邊界 — Authorization header 多值處理 +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A6 邊界:Authorization header 邊界值', () => { + it('handles Authorization header that is an array (Node http duplicates) — extractBearerToken returns null', async () => { + // Node http 對重複 header 預設只回傳第一個值的 string;但若上層 framework + // 直接傳入陣列、我們的 typeof string 檢查會把它當無效。驗證行為。 + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(undefined); + // 直接覆寫 headers.authorization 為 array + req.headers.authorization = [`Bearer ${TEST_API_KEY}`, `Bearer ${WRONG_API_KEY}`]; + + middleware(req, res, next); + await res._flush(); + + // typeof 陣列 === 'object',extractBearerToken 回 null → 401 + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(next).not.toHaveBeenCalled(); + }); + + it('rejects Authorization header with leading/trailing whitespace inside token', async () => { + // "Bearer abc" → regex 抓 "abc"(trim 後) + // "Bearer abc " → regex 抓 "abc " 再 trim → "abc" + // 兩種情境都應走 constant-time compare(這裡用「Bearer abc 」測 trim 行為) + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: 'abc' }); + const { req, res, next } = makeReqResNext('Bearer abc '); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + expect(req.auth).toBeDefined(); + }); +}); + +// ---------------------------------------------------------------------------- +// A6 新增:error path — sendApiKeyError 在 headersSent 情境的 destroy 行為 +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A6 error path: sendApiKeyError with headersSent', () => { + it('still destroys socket when res.headersSent is true (double protection)', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + // 故意初始化 res.headersSent = true,模擬「response 已開始送出但 middleware + // 仍被呼叫」的 race(理論上不該發生,但 sendApiKeyError 設計成兜底保護) + const { req, res, next, socket } = makeReqResNext(undefined, { + headersSentInitially: true, + }); + + middleware(req, res, next); + await res._flush(); + + // res.status 不應被呼叫(headersSent 已寫過) + expect(res.status).not.toHaveBeenCalled(); + // 但 socket 仍應被 destroy(避免 client 繼續送 body) + expect(socket.destroy).toHaveBeenCalledTimes(1); + expect(next).not.toHaveBeenCalled(); + }); + + it('handles socket.destroy throwing without crashing middleware', async () => { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(undefined); + // 讓 socket.destroy 拋例外(模擬 socket 已 detached / 異常狀態) + req.socket.destroy = jest.fn(() => { + throw new Error('socket already detached'); + }); + + // 不該 throw + expect(() => middleware(req, res, next)).not.toThrow(); + await res._flush(); + + // 401 仍正常送出 + expect(res.statusCode).toBe(401); + expect(res.body.error.code).toBe('invalid_token'); + expect(next).not.toHaveBeenCalled(); + }); +}); + +// ---------------------------------------------------------------------------- +// A6 新增:Log 不洩漏 secret +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A6 Log 不洩漏 secret', () => { + it('does not log the expected API key in any failure path', async () => { + // 還原 console.error 以便 spy(前面 beforeAll 把它設成 noop) + const errSpy = jest.fn(); + const logSpy = jest.fn(); + const warnSpy = jest.fn(); + const origErr = console.error; + const origLog = console.log; + const origWarn = console.warn; + console.error = errSpy; + console.log = logSpy; + console.warn = warnSpy; + + try { + const expectedSecret = 'SECRET-EXPECTED-KEY-zzz-zzz-zzz-zzz-zzz-zzz-zzz-zzz'; + const wrongSecret = 'SECRET-WRONG-KEY-yyy-yyy-yyy-yyy-yyy-yyy-yyy-yyy-yyy'; + + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: expectedSecret }); + + // 跑各種失敗 path + for (const authHeader of [ + undefined, + 'Basic xxx', + 'Bearer ', + `Bearer ${wrongSecret}`, + `Bearer ${'X'.repeat(expectedSecret.length)}`, // 同長度但不符 + ]) { + const { req, res, next } = makeReqResNext(authHeader); + middleware(req, res, next); + await res._flush(); + } + + // Fail-fast path(503) + const blankMw = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + const { req, res, next } = makeReqResNext(`Bearer ${expectedSecret}`); + blankMw(req, res, next); + await res._flush(); + + // 蒐集所有 log 輸出 + const allOutputs = [ + ...errSpy.mock.calls, + ...logSpy.mock.calls, + ...warnSpy.mock.calls, + ] + .flat() + .map((x) => (typeof x === 'string' ? x : JSON.stringify(x))) + .join(' | '); + + // 絕不能含 expected secret + expect(allOutputs).not.toContain(expectedSecret); + // 也不能含 wrong secret(避免回 token 進 log) + expect(allOutputs).not.toContain(wrongSecret); + } finally { + console.error = origErr; + console.log = origLog; + console.warn = origWarn; + } + }); + + it('logs auth.api_key.not_configured action when expected is empty (without leaking)', async () => { + const errSpy = jest.fn(); + const origErr = console.error; + console.error = errSpy; + + try { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + const { req, res, next } = makeReqResNext(`Bearer ${TEST_API_KEY}`); + middleware(req, res, next); + await res._flush(); + + // 應該有印 not_configured log + const logs = errSpy.mock.calls.flat().join(' | '); + expect(logs).toContain('auth.api_key.not_configured'); + // 但仍不能含 TEST_API_KEY 本身 + expect(logs).not.toContain(TEST_API_KEY); + } finally { + console.error = origErr; + } + }); +}); + +// ---------------------------------------------------------------------------- +// A6 新增:timing sanity check(非嚴格 timing test) +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A6 timing sanity check', () => { + it('rejection time for different-prefix vs different-suffix wrong keys is similar (constant-time compare)', () => { + // 此 test 是 sanity check,非嚴格 timing measurement(jest worker 環境不可靠)。 + // 重點:constantTimeEquals 不應該因為「第一個 byte 就不同」就 short-circuit、 + // 也不應該因為「最後一個 byte 不同」才 reject。 + // + // 跑 N 次、取 elapsed 中位數比較;差異 > 50% 視為 sanity failed。 + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + + function measureNRuns(wrongKey, n) { + const samples = []; + for (let i = 0; i < n; i += 1) { + const { req, res, next } = makeReqResNext(`Bearer ${wrongKey}`); + const t0 = process.hrtime.bigint(); + middleware(req, res, next); + const t1 = process.hrtime.bigint(); + samples.push(Number(t1 - t0)); // nanoseconds + } + // 取中位數(避免 GC outlier) + samples.sort((a, b) => a - b); + return samples[Math.floor(samples.length / 2)]; + } + + const N = 100; + // 不同 prefix:第一個 byte 起就不一樣 + const wrongPrefix = 'Z' + TEST_API_KEY.slice(1); + // 不同 suffix:最後一個 byte 才不一樣 + const wrongSuffix = TEST_API_KEY.slice(0, -1) + 'Z'; + // 完全不同 + const wrongFull = 'Z'.repeat(TEST_API_KEY.length); + + const tPrefix = measureNRuns(wrongPrefix, N); + const tSuffix = measureNRuns(wrongSuffix, N); + const tFull = measureNRuns(wrongFull, N); + + // 三個時間應該相近(constant-time 的本質)。差異 < 100%(即最快/最慢比 ≤ 2x) + // 因為 jest 環境噪音大,這只是 sanity check(嚴格 timing test 應在 isolated env)。 + const times = [tPrefix, tSuffix, tFull]; + const minT = Math.min(...times); + const maxT = Math.max(...times); + const ratio = maxT / minT; + + expect(ratio).toBeLessThan(5); // sanity:不該超過 5x + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — _internals.constantTimeEquals smoke +// ---------------------------------------------------------------------------- + +describe('_internals.constantTimeEquals', () => { + const { constantTimeEquals } = apiKeyModule._internals; + + it('returns true for identical strings', () => { + expect(constantTimeEquals('abc123', 'abc123')).toBe(true); + expect(constantTimeEquals(TEST_API_KEY, TEST_API_KEY)).toBe(true); + }); + + it('returns false for different strings of same length', () => { + expect(constantTimeEquals('abc123', 'xyz456')).toBe(false); + expect(constantTimeEquals(TEST_API_KEY, WRONG_API_KEY)).toBe(false); + }); + + it('returns false for strings of different length (must not throw)', () => { + // 這個 case 重要:timingSafeEqual 在長度不同時會 throw RangeError, + // 我們的 helper 必須先比長度避免 throw + expect(() => constantTimeEquals('short', 'much-longer-string')).not.toThrow(); + expect(constantTimeEquals('short', 'much-longer-string')).toBe(false); + expect(constantTimeEquals('', 'a')).toBe(false); + }); + + it('returns false for non-string inputs', () => { + expect(constantTimeEquals(null, 'abc')).toBe(false); + expect(constantTimeEquals('abc', null)).toBe(false); + expect(constantTimeEquals(undefined, undefined)).toBe(false); + expect(constantTimeEquals(123, '123')).toBe(false); + expect(constantTimeEquals({}, {})).toBe(false); + }); + + it('returns true for both empty strings (edge case)', () => { + // 兩個空字串:buffer length 都 0、timingSafeEqual 也允許 0-length buffer + // 此 case 不影響安全(middleware 自己有 fail-fast 擋 empty expected) + expect(constantTimeEquals('', '')).toBe(true); + }); + + // A6 新增:special chars + it('handles utf8 byte length differences correctly', () => { + // 中文 byte length: 3 bytes per char in UTF-8 + expect(constantTimeEquals('密', '密')).toBe(true); + expect(constantTimeEquals('密', '碼')).toBe(false); // 同 byte 長度,不同 bytes + expect(constantTimeEquals('密', 'a')).toBe(false); // 不同 byte 長度(3 vs 1) + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — _internals.extractBearerToken smoke +// ---------------------------------------------------------------------------- + +describe('_internals.extractBearerToken', () => { + const { extractBearerToken } = apiKeyModule._internals; + + it('extracts token from "Bearer "', () => { + expect(extractBearerToken('Bearer abc123')).toBe('abc123'); + }); + + it('accepts lowercase bearer prefix', () => { + expect(extractBearerToken('bearer abc123')).toBe('abc123'); + }); + + it('trims trailing whitespace', () => { + expect(extractBearerToken('Bearer abc123 ')).toBe('abc123'); + }); + + it('returns null for empty token after Bearer', () => { + expect(extractBearerToken('Bearer ')).toBeNull(); + expect(extractBearerToken('Bearer ')).toBeNull(); + }); + + it('returns null for non-Bearer prefix', () => { + expect(extractBearerToken('Basic abc123')).toBeNull(); + expect(extractBearerToken('Token abc123')).toBeNull(); + }); + + it('returns null for undefined / null / empty', () => { + expect(extractBearerToken(undefined)).toBeNull(); + expect(extractBearerToken(null)).toBeNull(); + expect(extractBearerToken('')).toBeNull(); + }); + + // A6 新增:array / object 輸入 + it('returns null for non-string inputs (arrays, objects, numbers)', () => { + expect(extractBearerToken(['Bearer abc'])).toBeNull(); + expect(extractBearerToken({ raw: 'Bearer abc' })).toBeNull(); + expect(extractBearerToken(123)).toBeNull(); + }); + + it('handles tokens with special chars (slash, equals, plus)', () => { + expect(extractBearerToken('Bearer abc/def+ghi==xyz')).toBe('abc/def+ghi==xyz'); + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — _internals.tokenFingerprint smoke(Phase 0.8b A7) +// ---------------------------------------------------------------------------- + +describe('_internals.tokenFingerprint', () => { + const { tokenFingerprint } = apiKeyModule._internals; + + it('returns 12 hex chars for non-empty token', () => { + const fp = tokenFingerprint(TEST_API_KEY); + expect(typeof fp).toBe('string'); + expect(fp.length).toBe(12); + expect(/^[0-9a-f]{12}$/.test(fp)).toBe(true); + }); + + it('returns same fingerprint for same token (deterministic)', () => { + expect(tokenFingerprint(TEST_API_KEY)).toBe(tokenFingerprint(TEST_API_KEY)); + }); + + it('returns different fingerprints for different tokens', () => { + const fpA = tokenFingerprint(TEST_API_KEY); + const fpB = tokenFingerprint(WRONG_API_KEY); + expect(fpA).not.toBe(fpB); + }); + + it('does NOT contain any substring of original token (one-way)', () => { + // 反向 check:取 token 任意連續 4 字元,fingerprint 都不該含 + const fp = tokenFingerprint(TEST_API_KEY); + for (let i = 0; i < TEST_API_KEY.length - 4; i += 8) { + const slice = TEST_API_KEY.slice(i, i + 4); + expect(fp).not.toContain(slice); + } + }); + + it('returns empty string for empty / non-string input', () => { + expect(tokenFingerprint('')).toBe(''); + expect(tokenFingerprint(null)).toBe(''); + expect(tokenFingerprint(undefined)).toBe(''); + expect(tokenFingerprint(123)).toBe(''); + }); +}); + +// ---------------------------------------------------------------------------- +// Tests — A7 audit log(success / missing / invalid / unconfigured) +// ---------------------------------------------------------------------------- + +describe('requireApiKey — A7 audit log', () => { + const { tokenFingerprint } = apiKeyModule._internals; + + /** + * 攔截 console.log / console.error 把所有 audit log 字串收集起來,給斷言用。 + * 注意:beforeAll 已 mute 兩個 console,這裡 override 並在 afterEach restore。 + */ + function spyAuditLogs() { + const logs = []; + const origLog = console.log; + const origErr = console.error; + const capture = (raw) => { + if (typeof raw === 'string') logs.push(raw); + else logs.push(JSON.stringify(raw)); + }; + console.log = (...args) => args.forEach(capture); + console.error = (...args) => args.forEach(capture); + return { + logs, + restore() { + console.log = origLog; + console.error = origErr; + }, + }; + } + + it('writes auth.api_key.authenticated audit log on success (with source_ip + fingerprint + request_id)', () => { + const spy = spyAuditLogs(); + try { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(`Bearer ${TEST_API_KEY}`, { + reqExtra: { ip: '203.0.113.42', method: 'POST', path: '/api/v1/jobs' }, + }); + + middleware(req, res, next); + + expect(next).toHaveBeenCalledTimes(1); + const audit = spy.logs.find((l) => l.includes('auth.api_key.authenticated')); + expect(audit).toBeDefined(); + const parsed = JSON.parse(audit); + expect(parsed.level).toBe('INFO'); + expect(parsed.action).toBe('auth.api_key.authenticated'); + expect(parsed.auth_type).toBe('api_key'); + expect(parsed.client_id).toBe('visionA-service'); + expect(parsed.source_ip).toBe('203.0.113.42'); + expect(parsed.request_id).toBe('req-test-api-key-001'); + expect(parsed.http_method).toBe('POST'); + expect(parsed.http_path).toBe('/api/v1/jobs'); + expect(parsed.token_fingerprint).toBe(tokenFingerprint(TEST_API_KEY)); + // 絕不能含 token 本身 + expect(audit).not.toContain(TEST_API_KEY); + } finally { + spy.restore(); + } + }); + + it('writes auth.api_key.missing audit log on missing Authorization (no fingerprint)', async () => { + const spy = spyAuditLogs(); + try { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(undefined, { + reqExtra: { ip: '203.0.113.43', method: 'GET', path: '/api/v1/jobs/x' }, + }); + + middleware(req, res, next); + await res._flush(); + + const audit = spy.logs.find((l) => l.includes('auth.api_key.missing')); + expect(audit).toBeDefined(); + const parsed = JSON.parse(audit); + expect(parsed.level).toBe('WARN'); + expect(parsed.action).toBe('auth.api_key.missing'); + expect(parsed.source_ip).toBe('203.0.113.43'); + expect(parsed.request_id).toBe('req-test-api-key-001'); + expect(parsed.http_method).toBe('GET'); + expect(parsed.http_path).toBe('/api/v1/jobs/x'); + // missing path 沒 token → 不該有 token_fingerprint 欄位 + expect(parsed.token_fingerprint).toBeUndefined(); + } finally { + spy.restore(); + } + }); + + it('writes auth.api_key.invalid audit log on wrong token (with fingerprint of wrong token, NOT token itself)', async () => { + const spy = spyAuditLogs(); + try { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: TEST_API_KEY }); + const { req, res, next } = makeReqResNext(`Bearer ${WRONG_API_KEY}`, { + reqExtra: { ip: '203.0.113.44', method: 'POST', path: '/api/v1/jobs' }, + }); + + middleware(req, res, next); + await res._flush(); + + const audit = spy.logs.find((l) => l.includes('auth.api_key.invalid')); + expect(audit).toBeDefined(); + const parsed = JSON.parse(audit); + expect(parsed.level).toBe('WARN'); + expect(parsed.action).toBe('auth.api_key.invalid'); + expect(parsed.source_ip).toBe('203.0.113.44'); + expect(parsed.request_id).toBe('req-test-api-key-001'); + // wrong token 的 fingerprint(forensic 用、可 cluster 同攻擊者) + expect(parsed.token_fingerprint).toBe(tokenFingerprint(WRONG_API_KEY)); + // 絕不能含 wrong token 本身、也不能含 expected token + expect(audit).not.toContain(WRONG_API_KEY); + expect(audit).not.toContain(TEST_API_KEY); + } finally { + spy.restore(); + } + }); + + it('writes auth.api_key.not_configured audit log on 503 (with source_ip + request_id, no fingerprint)', async () => { + const spy = spyAuditLogs(); + try { + const middleware = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + const { req, res, next } = makeReqResNext(`Bearer ${TEST_API_KEY}`, { + reqExtra: { ip: '203.0.113.45', method: 'POST', path: '/api/v1/jobs' }, + }); + + middleware(req, res, next); + await res._flush(); + + const audit = spy.logs.find((l) => l.includes('auth.api_key.not_configured')); + expect(audit).toBeDefined(); + const parsed = JSON.parse(audit); + expect(parsed.level).toBe('ERROR'); + expect(parsed.action).toBe('auth.api_key.not_configured'); + expect(parsed.source_ip).toBe('203.0.113.45'); + expect(parsed.request_id).toBe('req-test-api-key-001'); + expect(parsed.http_method).toBe('POST'); + expect(parsed.http_path).toBe('/api/v1/jobs'); + // not_configured path 不該印 token_fingerprint(即使 caller 帶了 token,也不算) + expect(parsed.token_fingerprint).toBeUndefined(); + // 絕不能含 token 本身 + expect(audit).not.toContain(TEST_API_KEY); + } finally { + spy.restore(); + } + }); + + it('audit log NEVER contains expected or wrong token verbatim across all 4 paths', async () => { + const spy = spyAuditLogs(); + try { + const expectedSecret = 'AUDIT-SECRET-EXPECTED-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; + const wrongSecret = 'AUDIT-SECRET-WRONG-bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'; + + // path 1: success + const mwOk = apiKeyModule.requireApiKey({ expectedApiKey: expectedSecret }); + let ctx = makeReqResNext(`Bearer ${expectedSecret}`, { + reqExtra: { ip: '198.51.100.1', method: 'POST', path: '/api/v1/jobs' }, + }); + mwOk(ctx.req, ctx.res, ctx.next); + + // path 2: missing + ctx = makeReqResNext(undefined); + mwOk(ctx.req, ctx.res, ctx.next); + await ctx.res._flush(); + + // path 3: invalid + ctx = makeReqResNext(`Bearer ${wrongSecret}`); + mwOk(ctx.req, ctx.res, ctx.next); + await ctx.res._flush(); + + // path 4: not_configured(503) + const mwBlank = apiKeyModule.requireApiKey({ expectedApiKey: '' }); + ctx = makeReqResNext(`Bearer ${expectedSecret}`); + mwBlank(ctx.req, ctx.res, ctx.next); + await ctx.res._flush(); + + const all = spy.logs.join(' | '); + expect(all).not.toContain(expectedSecret); + expect(all).not.toContain(wrongSecret); + } finally { + spy.restore(); + } + }); +}); diff --git a/apps/task-scheduler/src/auth/__tests__/jwks.test.js b/apps/task-scheduler/src/auth/__tests__/jwks.test.js deleted file mode 100644 index 00754cf..0000000 --- a/apps/task-scheduler/src/auth/__tests__/jwks.test.js +++ /dev/null @@ -1,285 +0,0 @@ -/** - * Unit + Integration tests for src/auth/jwks.js - * - * 測試策略: - * - jose 在 Node CJS 環境下用 node:http / node:https 直接抓 JWKS(不走 global.fetch), - * 所以這份測試啟動一個本機 http server 提供 JWKS endpoint,再讓 jose 真實抓取 - * - 涵蓋正常驗證、過期、issuer 錯、audience 錯、簽章錯、缺 token、alg=none 等情境 - * - 驗證 RemoteJWKSet 的模組層級 cache 命中(_resetForTests) - */ - -'use strict'; - -const http = require('http'); -const { generateKeyPair, exportJWK, SignJWT } = require('jose'); - -const jwksModule = require('../jwks'); - -const TEST_ISSUER = 'https://auth.test.local'; -const TEST_AUDIENCE = 'kneron_converter_api'; - -/** - * 啟動一個本機 http server,提供 GET /.well-known/jwks 回 JWK Set。 - * - * @param {Array} jwks - JWK 陣列(含 kid / alg / use) - * @returns {Promise<{server: import('http').Server, url: string}>} - */ -async function startJwksServer(jwks) { - const server = http.createServer((req, res) => { - if (req.url === '/.well-known/jwks') { - res.writeHead(200, { 'Content-Type': 'application/json' }); - res.end(JSON.stringify({ keys: jwks })); - return; - } - res.writeHead(404); - res.end(); - }); - await new Promise((resolve) => server.listen(0, '127.0.0.1', resolve)); - const addr = server.address(); - return { server, url: `http://127.0.0.1:${addr.port}/.well-known/jwks` }; -} - -async function signTestJwt(privateKey, kid, payload, expirationTime) { - const now = Math.floor(Date.now() / 1000); - const exp = expirationTime !== undefined ? expirationTime : now + 300; - return new SignJWT(payload) - .setProtectedHeader({ alg: 'RS256', kid }) - .setIssuedAt(now) - .setExpirationTime(exp) - .setIssuer(TEST_ISSUER) - .setAudience(TEST_AUDIENCE) - .sign(privateKey); -} - -describe('src/auth/jwks', () => { - let privateKey; - let publicJwk; - const KID = 'test-key-1'; - let jwksServer; - let jwksUrl; - - beforeAll(async () => { - const { privateKey: priv, publicKey: pub } = await generateKeyPair('RS256', { - modulusLength: 2048, - }); - privateKey = priv; - publicJwk = await exportJWK(pub); - - const started = await startJwksServer([ - { ...publicJwk, kid: KID, use: 'sig', alg: 'RS256' }, - ]); - jwksServer = started.server; - jwksUrl = started.url; - }); - - afterAll(async () => { - if (jwksServer) { - await new Promise((resolve) => jwksServer.close(resolve)); - } - }); - - beforeEach(() => { - // 每次測試重置模組層級 cache,避免互相影響 - jwksModule._resetForTests(); - }); - - describe('getJWKS', () => { - it('should require jwksUrl', () => { - expect(() => jwksModule.getJWKS('')).toThrow(/jwksUrl is required/); - expect(() => jwksModule.getJWKS(null)).toThrow(/jwksUrl is required/); - }); - - it('should throw on invalid URL', () => { - expect(() => jwksModule.getJWKS('not-a-url')).toThrow(/Invalid JWKS URL/); - }); - - it('should return the same instance for the same URL (module-level cache)', () => { - const a = jwksModule.getJWKS(jwksUrl); - const b = jwksModule.getJWKS(jwksUrl); - expect(a).toBe(b); - }); - - it('should return different instances for different URLs', () => { - const a = jwksModule.getJWKS(jwksUrl); - const b = jwksModule.getJWKS('http://127.0.0.1:1/other-jwks'); - expect(a).not.toBe(b); - }); - }); - - describe('verifyToken', () => { - let baseOpts; - beforeAll(() => { - baseOpts = { - jwksUrl, - issuer: TEST_ISSUER, - audience: TEST_AUDIENCE, - clockToleranceSec: 60, - }; - }); - - it('should verify a valid token', async () => { - const token = await signTestJwt(privateKey, KID, { - sub: 'user-1', - client_id: 'client-1', - scope: 'converter:job.write', - }); - - const result = await jwksModule.verifyToken(token, baseOpts); - expect(result).toBeDefined(); - expect(result.payload.sub).toBe('user-1'); - expect(result.payload.client_id).toBe('client-1'); - expect(result.payload.scope).toBe('converter:job.write'); - }); - - it('should throw ERR_JWT_EXPIRED for expired token', async () => { - // 過期 1 小時,超過 clockTolerance(60 秒) - const expired = Math.floor(Date.now() / 1000) - 3600; - const token = await signTestJwt( - privateKey, - KID, - { sub: 'user-1', scope: 'converter:job.write' }, - expired - ); - - await expect(jwksModule.verifyToken(token, baseOpts)).rejects.toMatchObject({ - code: 'ERR_JWT_EXPIRED', - }); - }); - - it('should throw on wrong issuer', async () => { - const token = await new SignJWT({ sub: 'user-1', scope: 'converter:job.write' }) - .setProtectedHeader({ alg: 'RS256', kid: KID }) - .setIssuedAt() - .setExpirationTime('5m') - .setIssuer('https://wrong.issuer.example') - .setAudience(TEST_AUDIENCE) - .sign(privateKey); - - await expect(jwksModule.verifyToken(token, baseOpts)).rejects.toThrow(); - }); - - it('should throw on wrong audience', async () => { - const token = await new SignJWT({ sub: 'user-1', scope: 'converter:job.write' }) - .setProtectedHeader({ alg: 'RS256', kid: KID }) - .setIssuedAt() - .setExpirationTime('5m') - .setIssuer(TEST_ISSUER) - .setAudience('wrong-audience') - .sign(privateKey); - - await expect(jwksModule.verifyToken(token, baseOpts)).rejects.toThrow(); - }); - - it('should throw on signature mismatch (different signing key, same kid)', async () => { - const { privateKey: otherPriv } = await generateKeyPair('RS256', { - modulusLength: 2048, - }); - const token = await signTestJwt(otherPriv, KID, { - sub: 'user-1', - scope: 'converter:job.write', - }); - - await expect(jwksModule.verifyToken(token, baseOpts)).rejects.toThrow(); - }); - - it('should throw on missing kid (no matching key in JWKS)', async () => { - const token = await signTestJwt(privateKey, 'unknown-kid', { - sub: 'user-1', - scope: 'converter:job.write', - }); - - await expect(jwksModule.verifyToken(token, baseOpts)).rejects.toThrow(); - }); - - it('should reject empty token', async () => { - await expect(jwksModule.verifyToken('', baseOpts)).rejects.toMatchObject({ - code: 'ERR_JWS_INVALID', - }); - }); - - it('should reject malformed token (not a JWT)', async () => { - await expect( - jwksModule.verifyToken('not-a-real-jwt', baseOpts) - ).rejects.toThrow(); - }); - - it('should require options.issuer', async () => { - await expect( - jwksModule.verifyToken('x.y.z', { jwksUrl, audience: TEST_AUDIENCE }) - ).rejects.toThrow(/issuer is required/); - }); - - it('should require options.audience', async () => { - await expect( - jwksModule.verifyToken('x.y.z', { jwksUrl, issuer: TEST_ISSUER }) - ).rejects.toThrow(/audience is required/); - }); - - it('should reject alg=none token', async () => { - const header = Buffer.from(JSON.stringify({ alg: 'none', kid: KID })).toString( - 'base64url' - ); - const payload = Buffer.from( - JSON.stringify({ - sub: 'user-1', - iss: TEST_ISSUER, - aud: TEST_AUDIENCE, - exp: Math.floor(Date.now() / 1000) + 300, - scope: 'converter:job.write', - }) - ).toString('base64url'); - const unsignedToken = `${header}.${payload}.`; - - await expect(jwksModule.verifyToken(unsignedToken, baseOpts)).rejects.toThrow(); - }); - - // Sec m3:HMAC 演算法應被拒絕(混淆攻擊防禦) - it('should reject HMAC alg=HS256 token (Sec m3 algorithms pin)', async () => { - // 即便 attacker 用 JWKS 的 RSA public key 當 HMAC secret 簽 token, - // 因為 algorithms pin 為 RSA/ECDSA,jose 會直接 reject 拋錯。 - const fakeSecret = new TextEncoder().encode('fake-hmac-secret-32-bytes-long-x'); - const token = await new SignJWT({ - sub: 'user-1', - scope: 'converter:job.write', - }) - .setProtectedHeader({ alg: 'HS256', kid: KID }) - .setIssuedAt() - .setExpirationTime('5m') - .setIssuer(TEST_ISSUER) - .setAudience(TEST_AUDIENCE) - .sign(fakeSecret); - - await expect( - jwksModule.verifyToken(token, baseOpts) - ).rejects.toThrow(); - }); - - it('should expose ALLOWED_JWT_ALGS list (Sec m3)', () => { - const algs = jwksModule.ALLOWED_JWT_ALGS; - expect(Array.isArray(algs)).toBe(true); - expect(algs).toContain('RS256'); - expect(algs).toContain('ES256'); - expect(algs).toContain('PS256'); - expect(algs).not.toContain('HS256'); - expect(algs).not.toContain('none'); - }); - - it('should accept token within clock skew tolerance', async () => { - // 設一個剛過期 30 秒的 token,但 clockTolerance = 60 秒應該還能通過 - const justExpired = Math.floor(Date.now() / 1000) - 30; - const token = await new SignJWT({ - sub: 'user-1', - scope: 'converter:job.write', - }) - .setProtectedHeader({ alg: 'RS256', kid: KID }) - .setIssuedAt(justExpired - 600) - .setExpirationTime(justExpired) - .setIssuer(TEST_ISSUER) - .setAudience(TEST_AUDIENCE) - .sign(privateKey); - - const result = await jwksModule.verifyToken(token, baseOpts); - expect(result.payload.sub).toBe('user-1'); - }); - }); -}); diff --git a/apps/task-scheduler/src/auth/__tests__/middleware.test.js b/apps/task-scheduler/src/auth/__tests__/middleware.test.js deleted file mode 100644 index 46ff6ac..0000000 --- a/apps/task-scheduler/src/auth/__tests__/middleware.test.js +++ /dev/null @@ -1,763 +0,0 @@ -/** - * Unit + Integration tests for src/auth/middleware.js - * - * 測試重點: - * 1. 各種驗證失敗路徑(缺 header / 簽章錯 / issuer 錯 / audience 錯 / 過期 / scope 不夠 / tenant 不符) - * 2. M2:每次失敗都必須 - * - 設 `Connection: close` header - * - 在 res 'finish' 後 destroy req.socket - * 3. 成功路徑:req.auth 設好 + next() 被呼叫 - * 4. Integration:用 supertest + Express 真打一次,確認 socket 真的被斷 - */ - -'use strict'; - -const express = require('express'); -const http = require('http'); -const { generateKeyPair, exportJWK, SignJWT } = require('jose'); - -// 注意:這份 test 用 jest.resetModules + 注入版 verify,不依賴真實 config -const middlewareModule = require('../middleware'); - -// ---------------------------------------------------------------------------- -// 共用 fixture -// ---------------------------------------------------------------------------- - -const TEST_CONFIG = { - memberCenter: { - issuer: 'https://auth.test.local', - jwksUrl: 'https://auth.test.local/.well-known/jwks', - tokenUrl: '', - }, - converter: { - audience: 'kneron_converter_api', - clientId: '', - clientSecret: '', - tenantId: '', - scopeWrite: 'converter:job.write', - scopeRead: 'converter:job.read', - }, - fileAccessAgent: { baseUrl: '', audience: 'file_access_api' }, - jwks: { cacheMaxAgeMs: 600000, cooldownMs: 30000, clockToleranceSec: 60 }, -}; - -const TEST_CONFIG_WITH_TENANT = { - ...TEST_CONFIG, - converter: { ...TEST_CONFIG.converter, tenantId: 'tenant-A' }, -}; - -let privateKey; -let publicJwk; -const KID = 'test-key-1'; - -beforeAll(async () => { - const { privateKey: priv, publicKey: pub } = await generateKeyPair('RS256', { - modulusLength: 2048, - }); - privateKey = priv; - publicJwk = await exportJWK(pub); -}); - -// 抑制驗證失敗時 middleware 的 warn log(避免測試輸出被結構化 log 蓋掉) -// 這些 warn 是「驗證失敗時必輸出」的正常行為,已由斷言驗證 status / code, -// log 內容不是斷言對象。 -let _origWarn; -beforeAll(() => { - _origWarn = console.warn; - console.warn = () => {}; -}); -afterAll(() => { - console.warn = _origWarn; -}); - -/** - * 簽一個測試 JWT。 - */ -async function makeToken(overrides = {}, opts = {}) { - const now = Math.floor(Date.now() / 1000); - const payload = { - sub: 'user-1', - client_id: 'client-1', - scope: 'converter:job.write', - ...overrides, - }; - const expirationTime = - opts.expirationTime !== undefined ? opts.expirationTime : now + 300; - const signKey = opts.signKey || privateKey; - const kid = opts.kid || KID; - const issuer = opts.issuer || TEST_CONFIG.memberCenter.issuer; - const audience = opts.audience || TEST_CONFIG.converter.audience; - - return new SignJWT(payload) - .setProtectedHeader({ alg: 'RS256', kid }) - .setIssuedAt(now) - .setExpirationTime(expirationTime) - .setIssuer(issuer) - .setAudience(audience) - .sign(signKey); -} - -/** - * 假的 verify function(注入版)— 直接用 jose.jwtVerify 但不打網路。 - * 用內建的 JWKSet (從 publicJwk 建)。 - */ -function makeInjectedVerify() { - // 動態 import jwtVerify 與 createLocalJWKSet(jose v5+) - const { jwtVerify, createLocalJWKSet } = require('jose'); - const localJwks = createLocalJWKSet({ - keys: [{ ...publicJwk, kid: KID, use: 'sig', alg: 'RS256' }], - }); - - return async function injectedVerify(token, options) { - return jwtVerify(token, localJwks, { - issuer: options.issuer, - audience: options.audience, - clockTolerance: options.clockToleranceSec, - }); - }; -} - -/** - * 建立一組假的 req / res / next,內含 spy 給 socket.destroy。 - */ -function makeReqResNext(authHeader) { - const socket = { - destroyed: false, - destroy: jest.fn(function destroyImpl() { - socket.destroyed = true; - }), - }; - const req = { - headers: authHeader === undefined ? {} : { authorization: authHeader }, - socket, - requestId: 'req-test-001', - }; - - // 簡化版 res:只關心 setHeader / status / json / on('finish') / headersSent - const headers = {}; - const finishListeners = []; - const res = { - headersSent: false, - statusCode: 200, - body: null, - setHeader: jest.fn((k, v) => { - headers[k] = v; - }), - getHeader: (k) => headers[k], - status: jest.fn(function statusImpl(code) { - res.statusCode = code; - return res; - }), - json: jest.fn(function jsonImpl(body) { - res.body = body; - res.headersSent = true; - // 模擬 'finish' 事件(async,下個 microtask 觸發) - Promise.resolve().then(() => { - for (const l of finishListeners.splice(0)) { - try { - l(); - } catch (_) { - /* noop */ - } - } - }); - return res; - }), - once: jest.fn((evt, cb) => { - if (evt === 'finish') finishListeners.push(cb); - }), - on: jest.fn((evt, cb) => { - if (evt === 'finish') finishListeners.push(cb); - }), - _flush: () => - new Promise((resolve) => - // 等下個 microtask - setImmediate(resolve) - ), - }; - - const next = jest.fn(); - return { req, res, next, socket, headers }; -} - -// ---------------------------------------------------------------------------- -// Tests -// ---------------------------------------------------------------------------- - -describe('requireAuth — 驗證失敗路徑', () => { - let verify; - beforeAll(() => { - verify = makeInjectedVerify(); - }); - - it('should 401 + destroy when Authorization header missing', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket, headers } = makeReqResNext(undefined); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(res.body.error.request_id).toBe('req-test-001'); - expect(headers['Connection']).toBe('close'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 + destroy when Authorization header malformed (not Bearer)', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket, headers } = makeReqResNext('Basic abc123'); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(headers['Connection']).toBe('close'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 + destroy when token is empty after Bearer', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext('Bearer '); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 token_expired + destroy when token is expired', async () => { - const expiredToken = await makeToken({}, { expirationTime: 100 }); // 1970 早就過期 - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket, headers } = makeReqResNext(`Bearer ${expiredToken}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('token_expired'); - expect(headers['Connection']).toBe('close'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 invalid_token + destroy when issuer is wrong', async () => { - const token = await makeToken({}, { issuer: 'https://evil.example.com' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 invalid_token + destroy when audience is wrong', async () => { - const token = await makeToken({}, { audience: 'wrong-audience' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 401 invalid_token + destroy when signature is wrong', async () => { - const { privateKey: otherPriv } = await generateKeyPair('RS256', { - modulusLength: 2048, - }); - // 用 KID 對得上但簽章對不上 - const token = await makeToken({}, { signKey: otherPriv }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(401); - expect(res.body.error.code).toBe('invalid_token'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 403 insufficient_scope + destroy when scope is missing', async () => { - const token = await makeToken({ scope: 'converter:job.read' }); // 沒有 .write - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket, headers } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(403); - expect(res.body.error.code).toBe('insufficient_scope'); - expect(res.body.error.details).toEqual({ - required_scope: 'converter:job.write', - provided_scopes: ['converter:job.read'], - }); - expect(headers['Connection']).toBe('close'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 403 insufficient_scope when scope claim is empty', async () => { - const token = await makeToken({ scope: '' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(403); - expect(res.body.error.code).toBe('insufficient_scope'); - expect(res.body.error.details.provided_scopes).toEqual([]); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should 403 tenant_mismatch + destroy when tenant_id differs', async () => { - const token = await makeToken({ tenant_id: 'tenant-B' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG_WITH_TENANT, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(res.statusCode).toBe(403); - expect(res.body.error.code).toBe('tenant_mismatch'); - expect(res.body.error.details.expected_tenant).toBe('tenant-A'); - // 不洩漏 token 的 tenant_id - expect(res.body.error.details).not.toHaveProperty('actual_tenant'); - expect(socket.destroy).toHaveBeenCalledTimes(1); - expect(next).not.toHaveBeenCalled(); - }); - - it('should not check tenant when config.tenantId is empty', async () => { - const token = await makeToken({ tenant_id: 'any-tenant' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, // tenantId = '' - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - await res._flush(); - - expect(next).toHaveBeenCalledTimes(1); - expect(socket.destroy).not.toHaveBeenCalled(); - }); -}); - -describe('requireAuth — 驗證成功路徑', () => { - let verify; - beforeAll(() => { - verify = makeInjectedVerify(); - }); - - it('should call next() and set req.auth on valid token with correct scope', async () => { - const token = await makeToken({ - sub: 'user-99', - client_id: 'visionA-backend', - scope: 'converter:job.write converter:job.read', - tenant_id: 'tenant-A', - }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - - expect(next).toHaveBeenCalledTimes(1); - expect(socket.destroy).not.toHaveBeenCalled(); - expect(res.status).not.toHaveBeenCalled(); - expect(req.auth).toBeDefined(); - expect(req.auth.sub).toBe('user-99'); - expect(req.auth.clientId).toBe('visionA-backend'); - expect(req.auth.tenantId).toBe('tenant-A'); - expect(req.auth.scopes).toEqual(['converter:job.write', 'converter:job.read']); - expect(req.auth.raw).toBeDefined(); - expect(req.auth.raw.sub).toBe('user-99'); - }); - - it('should support scp array claim (instead of scope string)', async () => { - const token = await makeToken({ - sub: 'user-1', - scope: undefined, - scp: ['converter:job.write', 'converter:job.read'], - }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - - expect(next).toHaveBeenCalledTimes(1); - expect(req.auth.scopes).toEqual(['converter:job.write', 'converter:job.read']); - }); - - it('should fall back clientId to sub when client_id is absent', async () => { - const token = await makeToken({ - sub: 'user-only', - client_id: undefined, - scope: 'converter:job.write', - }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next } = makeReqResNext(`Bearer ${token}`); - - await middleware(req, res, next); - - expect(next).toHaveBeenCalledTimes(1); - expect(req.auth.clientId).toBe('user-only'); - }); - - it('should accept lowercase "bearer" prefix', async () => { - const token = await makeToken({ scope: 'converter:job.write' }); - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next } = makeReqResNext(`bearer ${token}`); - - await middleware(req, res, next); - expect(next).toHaveBeenCalledTimes(1); - }); -}); - -describe('requireAuth — M2 destroy 連線行為(單元層)', () => { - let verify; - beforeAll(() => { - verify = makeInjectedVerify(); - }); - - it('should set Connection: close header BEFORE writing body', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, headers } = makeReqResNext(undefined); - - await middleware(req, res, next); - await res._flush(); - - // 確認 setHeader 在 res.status 之前被呼叫 - const setHeaderOrder = res.setHeader.mock.invocationCallOrder[0]; - const statusOrder = res.status.mock.invocationCallOrder[0]; - expect(setHeaderOrder).toBeLessThan(statusOrder); - expect(headers['Connection']).toBe('close'); - }); - - it('should destroy socket only AFTER res finish event (not before)', async () => { - // 自製一個「不會自動觸發 finish」的 res,讓我們能精確控制觸發時機 - const socket = { destroyed: false, destroy: jest.fn(() => { socket.destroyed = true; }) }; - const finishListeners = []; - const headers = {}; - const res = { - headersSent: false, - statusCode: 200, - body: null, - setHeader: jest.fn((k, v) => { headers[k] = v; }), - status: jest.fn(function s(code) { this.statusCode = code; return this; }), - json: jest.fn(function j(b) { this.body = b; this.headersSent = true; return this; }), - // 注意:這個 once 只把 listener 推進陣列,不自動觸發 finish - once: jest.fn((evt, cb) => { if (evt === 'finish') finishListeners.push(cb); }), - on: jest.fn((evt, cb) => { if (evt === 'finish') finishListeners.push(cb); }), - }; - const req = { headers: {}, socket, requestId: 'req-test-001' }; - const next = jest.fn(); - - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - - await middleware(req, res, next); - - // 此時 res.status / res.json 都已執行,但 'finish' 事件還沒被觸發 - expect(res.json).toHaveBeenCalledTimes(1); - expect(socket.destroy).not.toHaveBeenCalled(); - - // 手動觸發 finish 事件(模擬 Node 真實行為:response 寫入完畢後才會觸發) - for (const cb of finishListeners.splice(0)) cb(); - - expect(socket.destroy).toHaveBeenCalledTimes(1); - }); - - it('should use res.once not res.on (to avoid duplicate destroy on retries)', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next } = makeReqResNext(undefined); - - await middleware(req, res, next); - - expect(res.once).toHaveBeenCalledWith('finish', expect.any(Function)); - }); - - it('should not throw if socket is already destroyed', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next, socket } = makeReqResNext(undefined); - // 預先把 socket 設為 destroyed - socket.destroyed = true; - - await middleware(req, res, next); - await res._flush(); - - // 因為 destroyed=true,不應該再呼叫 destroy() - expect(socket.destroy).not.toHaveBeenCalled(); - }); - - it('should handle missing req.socket gracefully', async () => { - const middleware = middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }); - const { req, res, next } = makeReqResNext(undefined); - delete req.socket; - - // 不應 throw - await expect(middleware(req, res, next)).resolves.not.toThrow(); - await res._flush(); - }); -}); - -// ---------------------------------------------------------------------------- -// Integration test:用 supertest(內建 http server)驗證真連線被斷 -// ---------------------------------------------------------------------------- - -describe('requireAuth — Integration(真實 Express + http server)', () => { - let verify; - let app; - let server; - let baseUrl; - - beforeAll(async () => { - verify = makeInjectedVerify(); - }); - - beforeEach(async () => { - app = express(); - // 一個極簡的 requestId middleware,模擬 T3 行為 - app.use((req, _res, n) => { - req.requestId = req.headers['x-request-id'] || 'req-int-001'; - n(); - }); - - app.get( - '/protected', - middlewareModule.requireAuth('converter:job.write', { - config: TEST_CONFIG, - verify, - }), - (req, res) => { - res.status(200).json({ ok: true, sub: req.auth.sub }); - } - ); - - await new Promise((resolve) => { - server = app.listen(0, '127.0.0.1', resolve); - }); - const addr = server.address(); - baseUrl = `http://127.0.0.1:${addr.port}`; - }); - - afterEach(async () => { - if (server) { - await new Promise((resolve) => server.close(resolve)); - server = null; - } - }); - - it('should return 401 with Connection: close and close the connection on missing token', async () => { - const res = await fetch(`${baseUrl}/protected`); - const body = await res.json(); - - expect(res.status).toBe(401); - expect(res.headers.get('connection')).toBe('close'); - expect(body.error.code).toBe('invalid_token'); - expect(body.error.request_id).toBe('req-int-001'); - }); - - it('should return 200 + payload on valid token', async () => { - const token = await makeToken({ sub: 'user-int-1', scope: 'converter:job.write' }); - const res = await fetch(`${baseUrl}/protected`, { - headers: { Authorization: `Bearer ${token}` }, - }); - const body = await res.json(); - - expect(res.status).toBe(200); - expect(body.ok).toBe(true); - expect(body.sub).toBe('user-int-1'); - }); - - it('should return 403 insufficient_scope with correct details on integration path', async () => { - const token = await makeToken({ scope: 'converter:job.read' }); - const res = await fetch(`${baseUrl}/protected`, { - headers: { Authorization: `Bearer ${token}` }, - }); - const body = await res.json(); - - expect(res.status).toBe(403); - expect(res.headers.get('connection')).toBe('close'); - expect(body.error.code).toBe('insufficient_scope'); - expect(body.error.details.required_scope).toBe('converter:job.write'); - }); - - it('should detect socket close from client side after 401', async () => { - // 用低階 http 模組實際觀察 socket close 事件 - await new Promise((resolve, reject) => { - const url = new URL(`${baseUrl}/protected`); - const req = http.request( - { - hostname: url.hostname, - port: url.port, - path: url.pathname, - method: 'GET', - }, - (res) => { - let raw = ''; - res.on('data', (c) => { - raw += c.toString(); - }); - res.on('end', () => { - try { - expect(res.statusCode).toBe(401); - expect(res.headers.connection).toBe('close'); - const body = JSON.parse(raw); - expect(body.error.code).toBe('invalid_token'); - resolve(); - } catch (e) { - reject(e); - } - }); - } - ); - req.on('error', reject); - req.end(); - }); - }); -}); - -// ---------------------------------------------------------------------------- -// Helper / internals tests -// ---------------------------------------------------------------------------- - -describe('internals.extractBearerToken', () => { - const { extractBearerToken } = middlewareModule._internals; - - it('returns null on undefined / empty', () => { - expect(extractBearerToken(undefined)).toBeNull(); - expect(extractBearerToken('')).toBeNull(); - expect(extractBearerToken(null)).toBeNull(); - }); - - it('returns null on non-Bearer scheme', () => { - expect(extractBearerToken('Basic abc')).toBeNull(); - expect(extractBearerToken('Token abc')).toBeNull(); - }); - - it('returns trimmed token on valid Bearer', () => { - expect(extractBearerToken('Bearer xyz123')).toBe('xyz123'); - expect(extractBearerToken('Bearer xyz123 ')).toBe('xyz123'); - expect(extractBearerToken('bearer xyz123')).toBe('xyz123'); - }); - - it('returns null when token portion is empty', () => { - expect(extractBearerToken('Bearer ')).toBeNull(); - expect(extractBearerToken('Bearer ')).toBeNull(); - }); -}); - -describe('internals.extractScopes', () => { - const { extractScopes } = middlewareModule._internals; - - it('parses space-separated scope string', () => { - expect(extractScopes({ scope: 'a b c' })).toEqual(['a', 'b', 'c']); - }); - - it('parses scp array', () => { - expect(extractScopes({ scp: ['a', 'b'] })).toEqual(['a', 'b']); - }); - - it('handles array scope claim', () => { - expect(extractScopes({ scope: ['a', 'b'] })).toEqual(['a', 'b']); - }); - - it('returns empty array when neither present', () => { - expect(extractScopes({})).toEqual([]); - }); - - it('strips empty string entries', () => { - expect(extractScopes({ scope: 'a b' })).toEqual(['a', 'b']); - expect(extractScopes({ scp: ['', 'a'] })).toEqual(['a']); - }); -}); - -describe('internals.sendAuthError — edge cases', () => { - const { sendAuthError } = middlewareModule._internals; - - it('does not double-write when headersSent already', () => { - const { req, res, socket } = makeReqResNext(undefined); - res.headersSent = true; - - sendAuthError(req, res, 401, 'invalid_token', 'msg'); - - // 不該再 setHeader 或 status / json - expect(res.setHeader).not.toHaveBeenCalled(); - expect(res.status).not.toHaveBeenCalled(); - // 但仍嘗試 destroy(保險) - expect(socket.destroy).toHaveBeenCalledTimes(1); - }); -}); diff --git a/apps/task-scheduler/src/auth/apiKeyMiddleware.js b/apps/task-scheduler/src/auth/apiKeyMiddleware.js new file mode 100644 index 0000000..5a8655f --- /dev/null +++ b/apps/task-scheduler/src/auth/apiKeyMiddleware.js @@ -0,0 +1,350 @@ +/** + * `requireApiKey()` Express middleware — Phase 0.8b 新增。 + * + * 用途:取代 visionA → converter 對外 API 的 OAuth JWT 驗證,改用 + * pre-shared API key(1:1 internal trust)。OAuth client(converter → + * FAA promote)完全不動。 + * + * 設計(對齊 docs/autoflow/04-architecture/auth.md §1): + * 1. 解 `Authorization: Bearer `(沿用 Bearer 格式) + * 2. `crypto.timingSafeEqual` constant-time compare(防 timing attack) + * - 必須先比長度,timingSafeEqual 長度不同會 throw RangeError + * - 長度本身不算 secret(API key 長度固定 64 hex chars,公開資訊) + * 3. 不做 scope / tenant check — API key 即「caller 是 visionA」的完整證明 + * 4. 失敗行為對齊既有 OAuth middleware sendAuthError 的 M2: + * - 設 Connection: close header + * - 用 `res.status(401).json(...)` 寫 401 body + * - `res.once('finish', () => req.socket.destroy())` 主動斷線 + * 5. CONVERTER_API_KEY env 未設定(config.converter.apiKey 為空)→ 503 + * `service_unavailable`,不要 silently allow + * + * 為什麼 inline sendApiKeyError: + * 減少模組耦合 + Phase 0.8b A4 後 visionA → converter 已無其他 auth middleware、 + * 不需共用 helper。M2 邏輯極簡(~15 行),維護成本低。 + * (Phase 0.8b A4 已砍除既有 OAuth middleware.js / jwks.js;本 inline 設計成為 + * 唯一 v1 endpoint auth path) + * + * req.auth shape(成功後):保留與舊 OAuth middleware 一致的 shape,讓 logging / + * per-client rate limiter 等下游 read pattern 無須改: + * { + * sub: 'visionA-service', + * clientId: 'visionA-service', + * tenantId: null, + * scopes: ['converter:job.write', 'converter:job.read'], + * raw: { authType: 'api_key' }, + * } + * + * 為什麼 scopes 仍給兩個固定值(雖然 A3 後不再被 endpoint handler check): + * - per-client rate limiter / logging infra 可能讀 `req.auth.scopes`(雖 grep 在 + * A3 階段確認沒實際依賴;保留 placeholder 是 fail-safe) + * - 標明「implicit full access」語意(visionA 對 converter 為 1:1 trust, + * 既可建/讀 job 也可 promote) + * + * Log 規則(對齊 auth.md §1.8 + Phase 0.8b A7 audit log 強化): + * - 啟動時印一次:INFO(api_key_enabled,含 api_key_length,**絕不印 key 本身**) + * - Middleware 收到 request 但 expected 未設 → ERROR (auth.api_key.not_configured) + * - Middleware 兜底 exception → ERROR (auth.api_key.unexpected_error) + * - A7 新增:認證成功 / 失敗 path 都寫 audit log(含 source_ip / token_fingerprint / + * request_id),給 forensic 用。token_fingerprint = sha256(token).slice(0,12), + * 48-bit 識別空間、足以區分多把 key、不可逆推 token。 + * - 絕不 log token 本身、只 log fingerprint。 + * - 失敗 path 的 fingerprint 是 attacker 傳來的 wrong token 的 fingerprint + * (forensic 用:同攻擊者可被 cluster)。對 attacker 而言 fingerprint 無 reconnaissance value。 + */ + +'use strict'; + +const crypto = require('crypto'); + +/** + * 對 token 取 SHA-256 並截前 12 hex chars 作為 fingerprint(48 bits 識別空間)。 + * + * 設計(Phase 0.8b A7,給 audit log forensic 用): + * - 48 bits 足以區分多把 key(2^48 ≈ 2.8×10^14,撞率對人類有意義的場景下為 0) + * - SHA-256 不可逆推回 token(即使 attacker 拿到 log 也無法還原 key) + * - 12 hex chars 短到不會吃 log 寬度,又足夠人類視覺對照 + * - 對失敗 path 的 wrong token 也算 fingerprint:forensic cluster 同攻擊者的多次嘗試 + * + * 為什麼不 cache: + * - 現代 CPU 上 sha256(64-byte string) < 1μs;每 request 算一次成本可忽略 + * - cache 需要管理 lifetime + key 等於 expected key 本身(不可進 module-level Map) + * - 簡化方案:每 request 直接算 + * + * @param {string} token + * @returns {string} 12 hex chars + */ +function tokenFingerprint(token) { + if (typeof token !== 'string' || token.length === 0) return ''; + return crypto.createHash('sha256').update(token, 'utf8').digest('hex').slice(0, 12); +} + +/** + * 寫 audit log(JSON 結構化、stdout)。 + * + * 設計:固定吃一個 fields object,timestamp / service 由本函式補;caller 提供 + * level / action / 其他欄位。token 內容絕不會出現在 fields(caller 已避免)。 + * + * @param {object} fields + */ +function logAudit(fields) { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + service: 'task-scheduler', + timestamp: new Date().toISOString(), + ...fields, + }) + ); +} + +/** + * sendApiKeyError — destroy socket M2 行為(避免 401 後 client 繼續灌大 body + * 吃記憶體)。 + * + * 嚴格順序: + * 1. Connection: close header + * 2. res.status().json() 寫 body + * 3. res.once('finish', () => req.socket.destroy()) 在 response 完整送出後斷線 + * + * Inline 實作(無 details 參數);API key 路線下不需要 OAuth 的 details payload + * (required_scope / provided_scopes 等都不適用)。 + * + * @param {import('express').Request} req + * @param {import('express').Response} res + * @param {number} status + * @param {string} code - error.code(如 'invalid_token' / 'service_unavailable') + * @param {string} message - 對外訊息(zh-TW) + */ +function sendApiKeyError(req, res, status, code, message) { + if (res.headersSent) { + // 雙重保險:response 已送過,仍嘗試 destroy + try { + if (req.socket && !req.socket.destroyed) { + req.socket.destroy(); + } + } catch (_) { + /* noop */ + } + return; + } + + res.setHeader('Connection', 'close'); + + res.status(status).json({ + error: { + code, + message, + request_id: req.requestId || null, + }, + }); + + res.once('finish', () => { + try { + if (req.socket && !req.socket.destroyed) { + req.socket.destroy(); + } + } catch (_) { + /* noop — socket 可能已被 client 主動關閉或 Node 內部釋放 */ + } + }); +} + +/** + * 解析 `Authorization: Bearer ` header。 + * (Inline 實作;regex 大小寫不敏感對齊 RFC 6750 與多數 HTTP client 慣例) + * + * @param {string|undefined} headerValue + * @returns {string|null} + */ +function extractBearerToken(headerValue) { + if (typeof headerValue !== 'string' || headerValue.length === 0) { + return null; + } + // 大小寫不敏感(RFC 6750 規定 case-insensitive scheme name) + const match = headerValue.match(/^Bearer\s+(.+)$/i); + if (!match) { + return null; + } + const token = match[1].trim(); + if (token === '') return null; + return token; +} + +/** + * Constant-time string compare(防 timing attack)。 + * + * 注意事項: + * - 必須先比長度(timingSafeEqual 長度不同會 throw RangeError) + * - 長度本身不算 secret(API key 長度為公開資訊) + * - 比較完整 byte,不可截短 + * + * @param {string} a + * @param {string} b + * @returns {boolean} + */ +function constantTimeEquals(a, b) { + if (typeof a !== 'string' || typeof b !== 'string') return false; + const bufA = Buffer.from(a, 'utf8'); + const bufB = Buffer.from(b, 'utf8'); + if (bufA.length !== bufB.length) return false; + return crypto.timingSafeEqual(bufA, bufB); +} + +/** + * 建立一個 requireApiKey middleware。 + * + * 用法(Phase 0.8b A3 起,取代既有 OAuth requireAuth(scope) middleware): + * const { requireApiKey } = require('../../auth/apiKeyMiddleware'); + * router.post('/jobs', requireApiKey(), handler); + * + * @param {object} [deps] - 依賴注入(測試用) + * @param {string} [deps.expectedApiKey] - 明文 API key;不傳則 lazy load from config + * @returns {import('express').RequestHandler} + */ +function requireApiKey(deps = {}) { + // Lazy-load config,讓測試能在 require 階段不需設環境變數(對齊 middleware.js pattern) + let expected = deps.expectedApiKey; + let configLoaded = expected !== undefined; + + return function apiKeyMiddleware(req, res, next) { + try { + if (!configLoaded) { + // 第一次呼叫才載入 config(避免 require 階段就觸發 env check;對齊 既有 + // lazy-load pattern:讓單元測試能在 require 階段不需設環境變數) + const config = require('../config').loadConfig(); + expected = config.converter.apiKey; + configLoaded = true; + } + + // A7 audit log:先計算 source_ip / request_id(4 個 path 都會用到) + const sourceIp = req.ip || null; + const requestId = req.requestId || null; + const method = req.method || null; + const path = req.path || (req.originalUrl ? req.originalUrl.split('?')[0] : null); + + // Fail-fast:API key 未設定(env 缺)→ 503 service_unavailable + // 不要 silently allow(會讓未配置的 stage 變成完全開放) + if (!expected || expected === '') { + // eslint-disable-next-line no-console + console.error( + JSON.stringify({ + level: 'ERROR', + service: 'task-scheduler', + action: 'auth.api_key.not_configured', + message: 'CONVERTER_API_KEY env not set; rejecting all requests', + source_ip: sourceIp, + request_id: requestId, + http_method: method, + http_path: path, + timestamp: new Date().toISOString(), + }) + ); + return sendApiKeyError( + req, + res, + 503, + 'service_unavailable', + 'API key not configured' + ); + } + + // 1. 取 Bearer token + const token = extractBearerToken(req.headers && req.headers.authorization); + if (!token) { + // A7 audit log:missing / malformed Authorization header(無 fingerprint:沒 token) + logAudit({ + level: 'WARN', + action: 'auth.api_key.missing', + auth_type: 'api_key', + source_ip: sourceIp, + request_id: requestId, + http_method: method, + http_path: path, + }); + return sendApiKeyError( + req, + res, + 401, + 'invalid_token', + '缺少或格式錯誤的 Authorization header(需為 Bearer )' + ); + } + + // 2. Constant-time compare + if (!constantTimeEquals(token, expected)) { + // A7 audit log:wrong token;attach fingerprint of wrong token(不含 token 本身) + // forensic 價值:同 attacker 用不同 wrong key 偵察時、fingerprint 能 cluster + logAudit({ + level: 'WARN', + action: 'auth.api_key.invalid', + auth_type: 'api_key', + source_ip: sourceIp, + request_id: requestId, + http_method: method, + http_path: path, + token_fingerprint: tokenFingerprint(token), + }); + return sendApiKeyError(req, res, 401, 'invalid_token', 'API key 驗證失敗'); + } + + // 3. 驗證成功 — 設 req.auth 給下游(對齊 OAuth middleware shape) + // + // Phase 0.8b Phase B 新增 tokenFingerprint:給 `/result` 端點 + // 的 bandwidth quota / rate limit / audit log 用(§9.2 / §11.2)。 + // bucket key 用 fingerprint 而非 clientId:1:1 trust 下所有 caller 都 + // `visionA-service`、bucket 平坦化;fingerprint 在 Phase 0.8b 等同 caller id、 + // 與 audit log forensic cross-correlate 同 key。 + req.auth = { + sub: 'visionA-service', + clientId: 'visionA-service', + tenantId: null, + scopes: ['converter:job.write', 'converter:job.read'], + tokenFingerprint: tokenFingerprint(expected), + raw: { authType: 'api_key' }, + }; + + // A7 audit log:authenticated request;token_fingerprint 給多把 key 並存場景 / forensic 用 + logAudit({ + level: 'INFO', + action: 'auth.api_key.authenticated', + auth_type: 'api_key', + client_id: 'visionA-service', + source_ip: sourceIp, + request_id: requestId, + http_method: method, + http_path: path, + token_fingerprint: tokenFingerprint(expected), + }); + + return next(); + } catch (err) { + // 兜底:理論上不該走到這裡(constantTimeEquals 已防 throw) + // 統一回 401 invalid_token,避免 5xx 洩漏內部細節 + // eslint-disable-next-line no-console + console.error( + JSON.stringify({ + level: 'ERROR', + action: 'auth.api_key.unexpected_error', + // 截短 message 避免 log injection + message: + err && err.message + ? String(err.message).slice(0, 100) + : 'unknown', + timestamp: new Date().toISOString(), + }) + ); + return sendApiKeyError(req, res, 401, 'invalid_token', 'API key 驗證失敗'); + } + }; +} + +module.exports = { + requireApiKey, + // 測試 / 內部用 + _internals: { + sendApiKeyError, + extractBearerToken, + constantTimeEquals, + tokenFingerprint, + }, +}; diff --git a/apps/task-scheduler/src/auth/jwks.js b/apps/task-scheduler/src/auth/jwks.js deleted file mode 100644 index 2c3be19..0000000 --- a/apps/task-scheduler/src/auth/jwks.js +++ /dev/null @@ -1,155 +0,0 @@ -/** - * JWKS cache 與 JWT 驗證封裝。 - * - * 採用 `jose` 套件的 `createRemoteJWKSet`,內建: - * - TTL cache(cacheMaxAge,預設 10 分鐘) - * - 失敗冷卻(cooldownDuration,預設 30 秒,避免 thundering herd) - * - 自動 stale-while-revalidate - * - 拒絕 alg=none(jose 預設) - * - cache 大小有上限(jose 預設) - * - * 範圍(T1): - * - 暴露 `getJWKS()` 給 middleware 用 - * - 暴露 `verifyToken(token, opts)` 一站式驗證 - * - 不負責 scope / tenant 檢查(middleware 處理) - * - * 安全注意: - * - 絕對不在 log 中印出 token 內容或 payload - * - 不接受 alg=none(jose 預設) - * - 不允許自帶的 key set(防止「JWKS poisoning」) - */ - -'use strict'; - -const { createRemoteJWKSet, jwtVerify } = require('jose'); - -/** - * 模組層級 cache:以 jwksUrl 為 key 共用一個 RemoteJWKSet 實例。 - * - * 為什麼用模組層級 cache 而非每次 new: - * - `createRemoteJWKSet` 內建 TTL cache 與 cooldown,重複 new 會破壞 cache 命中率 - * - 同一個 process 內所有 middleware 共用同一個 JWKSet - * - * 暴露 `_resetForTests()` 讓測試重置。 - */ -const _jwksByUrl = new Map(); - -/** - * 允許的 JWT 簽章演算法白名單(Sec m3 修正)。 - * - * 為什麼明確 pin: - * - 雖然 jose 預設拒絕 alg=none,但保留了 HMAC(`HS256`/`HS384`/`HS512`)作為 - * 合法選項;HMAC 簽章用對稱金鑰,attacker 拿到 JWKS 公鑰後可能用同一個 key - * 做 HMAC 偽造(演算法混淆攻擊) - * - 明確 pin 為非對稱演算法,攻擊面收窄 - * - * 選擇的 algs: - * - `RS256`:RSA SHA-256,OAuth 2.0 / OIDC 業界標準(Member Center 預期主用) - * - `ES256`:ECDSA P-256 SHA-256,新興 OIDC provider 常用(Auth0、Okta 等) - * - `PS256`:RSA-PSS SHA-256,比 RS256 更安全的 RSA 變體 - */ -const ALLOWED_JWT_ALGS = Object.freeze(['RS256', 'ES256', 'PS256']); - -/** - * 取得(或建立)對應 jwksUrl 的 RemoteJWKSet。 - * - * @param {string} jwksUrl - JWKS endpoint URL - * @param {{ cacheMaxAgeMs?: number, cooldownMs?: number }} [options] - * @returns {Function} - jose 的 RemoteJWKSet(可作為 jwtVerify 的第二參數) - */ -function getJWKS(jwksUrl, options = {}) { - if (typeof jwksUrl !== 'string' || jwksUrl.trim() === '') { - throw new Error('[jwks] jwksUrl is required'); - } - - const cached = _jwksByUrl.get(jwksUrl); - if (cached) { - return cached; - } - - const cacheMaxAgeMs = options.cacheMaxAgeMs ?? 10 * 60 * 1000; - const cooldownMs = options.cooldownMs ?? 30 * 1000; - - let url; - try { - url = new URL(jwksUrl); - } catch (err) { - throw new Error(`[jwks] Invalid JWKS URL: ${jwksUrl} (${err.message})`); - } - - const jwks = createRemoteJWKSet(url, { - cacheMaxAge: cacheMaxAgeMs, - cooldownDuration: cooldownMs, - }); - - _jwksByUrl.set(jwksUrl, jwks); - return jwks; -} - -/** - * 驗證 JWT token:簽章、issuer、audience、過期。 - * - * 不檢查 scope / tenant — 由 middleware 層處理。 - * - * @param {string} token - JWT compact token - * @param {{ - * jwksUrl: string, - * issuer: string, - * audience: string, - * clockToleranceSec?: number, - * cacheMaxAgeMs?: number, - * cooldownMs?: number, - * }} options - * @returns {Promise<{ payload: object, protectedHeader: object }>} - * - * @throws {Error} - jose 的 JOSEError 子類,呼叫端應檢查 `err.code`: - * - `ERR_JWT_EXPIRED` → token 過期 - * - `ERR_JWS_INVALID` → 簽章錯 - * - `ERR_JWS_SIGNATURE_VERIFICATION_FAILED` → 簽章驗證失敗 - * - `ERR_JWKS_NO_MATCHING_KEY` → JWKS 找不到 kid - * - `ERR_JWT_CLAIM_VALIDATION_FAILED` → issuer / audience 不符 - */ -async function verifyToken(token, options) { - if (typeof token !== 'string' || token === '') { - const err = new Error('Token is empty'); - err.code = 'ERR_JWS_INVALID'; - throw err; - } - if (!options || typeof options !== 'object') { - throw new Error('[jwks] verifyToken requires options'); - } - - const { jwksUrl, issuer, audience, clockToleranceSec = 60 } = options; - - if (!issuer) throw new Error('[jwks] options.issuer is required'); - if (!audience) throw new Error('[jwks] options.audience is required'); - - const jwks = getJWKS(jwksUrl, { - cacheMaxAgeMs: options.cacheMaxAgeMs, - cooldownMs: options.cooldownMs, - }); - - // jose.jwtVerify 預設拒絕 alg=none、會驗 signature、exp、nbf。 - // Sec m3:明確 pin algorithms 白名單,避免 HMAC 演算法混淆攻擊。 - return jwtVerify(token, jwks, { - issuer, - audience, - clockTolerance: clockToleranceSec, - algorithms: ALLOWED_JWT_ALGS, - }); -} - -/** - * 測試用:清空模組層級 cache。 - * 生產環境不應呼叫。 - */ -function _resetForTests() { - _jwksByUrl.clear(); -} - -module.exports = { - getJWKS, - verifyToken, - ALLOWED_JWT_ALGS, - _resetForTests, -}; diff --git a/apps/task-scheduler/src/auth/middleware.js b/apps/task-scheduler/src/auth/middleware.js deleted file mode 100644 index 6def9db..0000000 --- a/apps/task-scheduler/src/auth/middleware.js +++ /dev/null @@ -1,286 +0,0 @@ -/** - * `requireAuth(scope)` Express middleware。 - * - * 職責: - * 1. 驗證 `Authorization: Bearer ` - * 2. 透過 jose(JWKS)驗 issuer / audience / 簽章 / 過期 - * 3. 檢查 token 是否含 requiredScope - * 4. 若 config 有設 tenantId,檢查 token 的 tenant_id 是否吻合 - * 5. 驗證成功 → 把解析好的 auth 資訊掛到 req.auth,呼叫 next() - * 6. 驗證失敗 → 統一錯誤格式回 401/403,並**主動斷線**(M2) - * - * M2(Review m2 落實): - * Express 的 `res.status(401).json(...)` 不會主動關閉底層 socket;攻擊者若已 - * 開始上傳 500MB body,Node 會繼續往 socket buffer 灌資料,吃記憶體與頻寬。 - * 為此 sendAuthError 在 response 完整送出後(`res.on('finish')`)才 destroy - * socket,確保: - * (a) client 收得到 401/403 JSON - * (b) 後續的 body bytes 不會繼續被 Node 接收 - * - * **這只是「盡力而為」**。實際大檔護欄靠 Nginx `client_max_body_size 600M` - * (TDD §7.1,DevOps 任務),這層只是減輕應用層的負擔。 - * - * 已知限制: - * - 在 `res.on('finish')` 之前,Node 的 read buffer 仍可能累積一些 bytes - * (通常為 `highWaterMark`,預設 16KB) - * - 若用戶端用 HTTP/2 或 keep-alive,destroy socket 也會中斷該連線上的其他 - * pipelined request;T1 範圍內可接受(v1 端點目前只有 jobs/promote) - */ - -'use strict'; - -const { verifyToken } = require('./jwks'); - -/** - * 統一的錯誤回應 helper(M2 — 含 destroy 連線)。 - * - * 嚴格順序(**勿改**): - * 1. 設 `Connection: close` header — 告訴 client 不要 reuse 連線 - * 2. 用 `res.status().json()` 把 401/403 JSON 寫出 - * 3. 監聽 `res.on('finish')` —— 在 response 已寫完且發送完畢後 —— - * destroy underlying socket,讓 client 沒辦法繼續灌 body - * - * 為什麼不能直接 `req.socket.destroy()` 在 send response 之前: - * 會在 response 還沒寫完就斷線,client 收不到 401 訊息,看到的是 - * ECONNRESET,無法判斷是被 reject 還是 server 異常。 - * - * 為什麼用 `req.socket` 而非 `res.socket`: - * 兩者通常是同一個 underlying socket;用 req.socket 可避免 res 在某些 - * 狀況下已被釋放的情境(例如 res 已 detach)。 - * - * @param {import('express').Request} req - * @param {import('express').Response} res - * @param {number} status - HTTP status code(401 / 403) - * @param {string} code - error.code(如 'invalid_token') - * @param {string} message - 對外訊息(zh-TW) - * @param {object} [details] - error.details(可選) - */ -function sendAuthError(req, res, status, code, message, details) { - // 雙重保險:若 response 已送過(不該發生但保險),不要 double send - if (res.headersSent) { - // 仍嘗試 destroy,避免 client 繼續灌 body - try { - if (req.socket && !req.socket.destroyed) { - req.socket.destroy(); - } - } catch (_) { - /* noop */ - } - return; - } - - res.setHeader('Connection', 'close'); - - const body = { - error: { - code, - message, - // request_id 由 T3 的 requestId middleware 提供;T1 階段尚未掛上時 - // 會是 undefined,這裡保持一致格式(即使 undefined 也輸出 key 以 - // 利下游解析)—— 但 JSON.stringify 會 omit undefined value。 - // T3 接管後會自動有值。 - request_id: req.requestId || null, - }, - }; - if (details !== undefined) { - body.error.details = details; - } - - res.status(status).json(body); - - // 在 response 完整送出(finish 事件)後 destroy socket。 - // - finish:response 寫完且 OS buffer 已 flush - // - 此時可安全 destroy,client 已收到完整 401/403 - // 用 once 避免多次觸發;包 try/catch 防止 socket 已被別處 destroy。 - res.once('finish', () => { - try { - if (req.socket && !req.socket.destroyed) { - req.socket.destroy(); - } - } catch (_) { - /* noop — socket 可能已被 client 主動關閉或 Node 內部釋放 */ - } - }); -} - -/** - * 解析 `Authorization: Bearer ` header。 - * - * @param {string|undefined} headerValue - * @returns {string|null} - 成功時回 token;格式錯或缺值回 null - */ -function extractBearerToken(headerValue) { - if (typeof headerValue !== 'string' || headerValue.length === 0) { - return null; - } - // 嚴格匹配 'Bearer ' 開頭(大小寫敏感對齊 RFC 6750;多數 client 用大寫) - // 允許大小寫不敏感以提高互操作性 - const match = headerValue.match(/^Bearer\s+(.+)$/i); - if (!match) { - return null; - } - const token = match[1].trim(); - if (token === '') return null; - return token; -} - -/** - * 從 token claims 中取出 scopes 陣列。 - * - * RFC 8693 / OAuth 2 的 `scope` claim 為「空白分隔字串」; - * 部分授權伺服器使用 `scp` claim(陣列)。本函數兩者都支援。 - * - * @param {object} claims - * @returns {string[]} - */ -function extractScopes(claims) { - if (Array.isArray(claims.scp)) { - return claims.scp.filter((s) => typeof s === 'string' && s.length > 0); - } - if (typeof claims.scope === 'string') { - return claims.scope.split(/\s+/).filter(Boolean); - } - if (Array.isArray(claims.scope)) { - return claims.scope.filter((s) => typeof s === 'string' && s.length > 0); - } - return []; -} - -/** - * 建立一個 requireAuth middleware。 - * - * 用法: - * const auth = require('./middleware'); - * app.post('/api/v1/jobs', auth.requireAuth(config.converter.scopeWrite), handler); - * - * @param {string} requiredScope - 此端點要求的 scope(如 'converter:job.write') - * @param {object} [deps] - 依賴注入(測試用) - * @param {object} [deps.config] - 完整 config object(從 config.loadConfig() 取) - * @param {Function} [deps.verify] - 注入版的 verifyToken(測試用) - * @returns {import('express').RequestHandler} - */ -function requireAuth(requiredScope, deps = {}) { - if (typeof requiredScope !== 'string' || requiredScope === '') { - throw new Error('[requireAuth] requiredScope is required and must be a string'); - } - - // Lazy-load config,讓測試能在 require 階段不需設環境變數 - let config = deps.config; - const verify = deps.verify || verifyToken; - - return async function authMiddleware(req, res, next) { - try { - if (!config) { - // 第一次呼叫才載入,避免測試時 import middleware 即觸發 config check - config = require('../config').loadConfig(); - } - - // 1. 取出 Bearer token - const token = extractBearerToken(req.headers && req.headers.authorization); - if (!token) { - return sendAuthError( - req, - res, - 401, - 'invalid_token', - '缺少或格式錯誤的 Authorization header(需為 Bearer )' - ); - } - - // 2. 透過 JWKS 驗 issuer / audience / 簽章 / 過期 - let result; - try { - result = await verify(token, { - jwksUrl: config.memberCenter.jwksUrl, - issuer: config.memberCenter.issuer, - audience: config.converter.audience, - clockToleranceSec: config.jwks.clockToleranceSec, - cacheMaxAgeMs: config.jwks.cacheMaxAgeMs, - cooldownMs: config.jwks.cooldownMs, - }); - } catch (err) { - // jose 的 error.code 對映到對外錯誤碼 - const errCode = err && err.code ? String(err.code) : ''; - - if (errCode === 'ERR_JWT_EXPIRED') { - return sendAuthError(req, res, 401, 'token_expired', 'Token 已過期'); - } - - // 簽章 / kid / 任何驗證失敗統一回 invalid_token,避免洩漏內部資訊 - // (安全考量:不告訴攻擊者「issuer 對了但 audience 錯了」這類細節) - // 注意:這裡也涵蓋了 issuer / audience 不符(ERR_JWT_CLAIM_VALIDATION_FAILED)。 - // 這是刻意的:對外只需知道「token 不被接受」即可。 - // log 細節給 ops 看(不含 token 內容)。 - // eslint-disable-next-line no-console - console.warn( - JSON.stringify({ - level: 'WARN', - action: 'auth.verify_failed', - error_code: errCode || 'unknown', - message: err && err.message ? err.message : 'verify failed', - timestamp: new Date().toISOString(), - }) - ); - return sendAuthError(req, res, 401, 'invalid_token', 'Token 驗證失敗'); - } - - const claims = result.payload; - - // 3. 檢查 scope - const scopes = extractScopes(claims); - if (!scopes.includes(requiredScope)) { - return sendAuthError(req, res, 403, 'insufficient_scope', 'Token 缺少必要權限', { - required_scope: requiredScope, - provided_scopes: scopes, - }); - } - - // 4. 檢查 tenant(若 config.converter.tenantId 為空字串則跳過) - // TDD §5.1:「若有,等於 CONVERTER_TENANT_ID(Phase 1 可先 warn-only)」 - // 本實作採嚴格策略:config 設了就一定要對;空字串時不檢查。 - if (config.converter.tenantId) { - const claimTenant = claims.tenant_id; - if (claimTenant !== config.converter.tenantId) { - return sendAuthError(req, res, 403, 'tenant_mismatch', '租戶不符', { - expected_tenant: config.converter.tenantId, - // 不回傳 token 中真正的 tenant_id(避免資訊洩露) - }); - } - } - - // 5. 掛 req.auth 給下游使用 - req.auth = { - sub: claims.sub || null, - clientId: claims.client_id || claims.sub || null, - tenantId: claims.tenant_id || null, - scopes, - // 完整 claims 物件給需要的 handler 用;不暴露 token 字串 - raw: claims, - }; - - return next(); - } catch (err) { - // 兜底:理論上不該走到這裡 - // eslint-disable-next-line no-console - console.error( - JSON.stringify({ - level: 'ERROR', - action: 'auth.middleware_unexpected_error', - message: err && err.message ? err.message : 'unknown', - timestamp: new Date().toISOString(), - }) - ); - return sendAuthError(req, res, 401, 'invalid_token', 'Token 驗證失敗'); - } - }; -} - -module.exports = { - requireAuth, - // 測試 / 內部用 - _internals: { - sendAuthError, - extractBearerToken, - extractScopes, - }, -}; diff --git a/apps/task-scheduler/src/config.js b/apps/task-scheduler/src/config.js index a30d04f..21cf26a 100644 --- a/apps/task-scheduler/src/config.js +++ b/apps/task-scheduler/src/config.js @@ -1,9 +1,15 @@ /** * 集中讀取所有環境變數,啟動時 fail fast。 * - * 範圍:T1/T2 — 讀取 OAuth / JWKS / Converter 身份 / OAuth Client 相關欄位。 + * 範圍: + * - visionA → converter 對外 API 認證:CONVERTER_API_KEY(Phase 0.8b A2 新增) + * - converter → File Access Agent OAuth client:MEMBER_CENTER_TOKEN_URL / + * KNERON_CONVERTER_CLIENT_ID / KNERON_CONVERTER_CLIENT_SECRET / + * FILE_ACCESS_AGENT_* / OAUTH_TOKEN_* (oauthClient.js 用) + * - 上傳行為控制:MULTIPART_* / MAX_CONCURRENT_UPLOADS / UPLOAD_RETRY_AFTER_SECONDS + * * 其他既有欄位(PORT, REDIS_URL, MINIO_*, JOB_DATA_DIR 等)暫時沿用 server.js - * 既有讀法,待 T4 重構時再合併進來。 + * 既有讀法,待後續重構合併進來。 * * 設計原則: * - 必填變數缺漏 → 立刻 throw,避免進到 runtime 才爆炸 @@ -11,10 +17,16 @@ * - 對外 export 一個凍結 object,避免被改動 * * 變更歷程: - * - T1:先把 token URL / client id / client secret 設 optional,因 T1 沒呼叫 token endpoint - * - T2(本任務):實作 OAuth client,依 TDD §9 將上述三項收緊為必填(修 D1/D2) - * - T10:新增 multipart 與 uploadConcurrency 段(修 D5)。所有 multipart limit 與 - * per-process upload concurrency 上限由 env 控制,避免改原始碼才能調整。 + * - T1 / T2:實作 OAuth client(converter→FAA),收緊 token/client_id/client_secret 為必填 + * - T10:新增 multipart 與 uploadConcurrency 段(修 D5) + * - Phase 0.8b A2:新增 CONVERTER_API_KEY(visionA → converter 對外 API 認證) + * - Phase 0.8b A4:砍除 OAuth resource-server 相關 env(visionA → converter 不再走 + * OAuth/JWKS):MEMBER_CENTER_ISSUER / MEMBER_CENTER_JWKS_URL / + * KNERON_CONVERTER_AUDIENCE / CONVERTER_TENANT_ID / CONVERTER_SCOPE_* / + * JWKS_CACHE_MAX_AGE_MS / JWKS_COOLDOWN_MS / JWT_CLOCK_TOLERANCE_SEC + * (converter→FAA OAuth client 鏈條保留不動,那條與 visionA→converter 認證無關) + * - Phase 0.8b A7:新增 TRUST_PROXY(給 Express `app.set('trust proxy', ...)`, + * 讓 req.ip 可從 X-Forwarded-For 取到真實 caller IP;audit log forensic 用) */ 'use strict'; @@ -81,43 +93,102 @@ function optionalIntEnv(name, defaultValue) { * 讓 process 直接 exit(fail fast)。 * * @returns {Readonly<{ - * memberCenter: { issuer: string, jwksUrl: string, tokenUrl: string }, + * memberCenter: { tokenUrl: string }, * converter: { - * audience: string, * clientId: string, * clientSecret: string, - * tenantId: string, - * scopeWrite: string, - * scopeRead: string, + * apiKey: string, * }, + * trustProxy: boolean | number | string, * fileAccessAgent: { baseUrl: string, audience: string, promoteTimeoutMs: number }, - * jwks: { cacheMaxAgeMs: number, cooldownMs: number, clockToleranceSec: number }, * oauthClient: { refreshSkewMs: number, timeoutMs: number }, * multipart: { modelMaxBytes: number, refImageMaxBytes: number, refImagesMaxCount: number }, * uploadConcurrency: { maxConcurrent: number, retryAfterSeconds: number }, * }>} */ function loadConfig() { - // === Member Center(OAuth Authorization Server) === - const mcIssuer = requireEnv('MEMBER_CENTER_ISSUER'); - const mcJwksUrl = requireEnv('MEMBER_CENTER_JWKS_URL'); - // T2:對齊 TDD §9 改為必填。OAuth Client 取 token 必用此 endpoint。 + // === Member Center Token Endpoint(converter → FAA OAuth client 用,保留) === + // 給 oauthClient.js 取 service token 用;visionA→converter 路線已不再經 JWKS 驗證。 const mcTokenUrl = requireEnv('MEMBER_CENTER_TOKEN_URL'); - // === Converter as Resource Server(接收他人 token) === - const audience = requireEnv('KNERON_CONVERTER_AUDIENCE'); - - // === Converter as OAuth Client(呼叫 File Access Agent,僅 promote 用) === - // T2:對齊 TDD §9 將 client_id / client_secret 收緊為必填。兩者必須成對出現。 + // === Converter as OAuth Client(呼叫 File Access Agent,僅 promote 用,保留) === + // 兩者必須成對出現。promote handler 內 faaClient.putFile 取 token 用。 const clientId = requireEnv('KNERON_CONVERTER_CLIENT_ID'); const clientSecret = requireEnv('KNERON_CONVERTER_CLIENT_SECRET'); - // === Tenant 隔離(可選) === - const tenantId = optionalEnv('CONVERTER_TENANT_ID', ''); + // === Trust Proxy(Phase 0.8b A7,影響 req.ip 取得真實 caller IP)=== + // 設計參考:docs/autoflow/04-architecture/auth.md §1.9(audit log)+ Express docs + // + // 為什麼需要: + // audit log 要記錄 source_ip(forensic 用);converter 跑在 Nginx / cloud LB + // 後面時,Node 直接 socket 連線拿到的 IP 永遠是反代理的內網 IP,必須信任 + // `X-Forwarded-For` header 才能取到真實 caller。 + // + // ⚠️ 安全提醒: + // 設錯(如 stage / prod 留 'loopback' 或不設)→ source_ip 永遠是反代理 IP、 + // forensic 失效;設過寬(如 `true` 信任所有 hop)→ attacker 可偽造 + // `X-Forwarded-For: ` 欺騙 audit log。詳見 Express docs: + // https://expressjs.com/en/guide/behind-proxies.html + // + // 預設與部署建議: + // - 預設 `loopback`:只信任 127.0.0.1 / ::1(local dev / 測試環境安全) + // - stage / prod(前面 1 層 Nginx):`TRUST_PROXY=1` + // - stage / prod(cloud LB + Nginx 兩層):`TRUST_PROXY=2` + // - 明確 CIDR:`TRUST_PROXY=loopback, 10.0.0.0/8` + // + // Express 4 接受的值: + // - boolean('true' / 'false'):true 信任所有 hop(**不建議 prod 用**) + // - 純數字('1' / '2'):信任 N hop + // - 字串(含 'loopback' / 'linklocal' / 'uniquelocal' / CIDR) + // + // 我們 normalize:純數字字串 → 轉成 Number;其餘維持 string。 + const trustProxyRaw = optionalEnv('TRUST_PROXY', 'loopback'); + let trustProxy; + if (trustProxyRaw === 'true') { + trustProxy = true; + } else if (trustProxyRaw === 'false') { + trustProxy = false; + } else if (/^\d+$/.test(trustProxyRaw)) { + trustProxy = Number.parseInt(trustProxyRaw, 10); + } else { + trustProxy = trustProxyRaw; + } - // === Scope 命名(可覆寫,預設值對齊 TDD §8) === - const scopeWrite = optionalEnv('CONVERTER_SCOPE_WRITE', 'converter:job.write'); - const scopeRead = optionalEnv('CONVERTER_SCOPE_READ', 'converter:job.read'); + // === API Key(Phase 0.8b A2,visionA → converter 對外 API auth) === + // 設計參考:docs/autoflow/04-architecture/auth.md §1 + §4 + // + // 為什麼 optional(不 requireEnv): + // - dev 環境可能還沒設 key(local 跑 legacy Web UI 路徑也應該能啟動) + // - apiKeyMiddleware 自己會做 fail-fast(env 未設時 middleware 回 503) + // - 啟動時 warn-only log 提醒,避免無聲問題 + // + // 為什麼不 log key 本身:4.5 外洩處理風險高、log 一旦進日誌系統很難清除 + // 對齊 visionA repo 的 `api_key_set: boolean` pattern,只印 length 作為「有設」的證據 + const apiKey = optionalEnv('CONVERTER_API_KEY', ''); + if (!apiKey) { + // eslint-disable-next-line no-console + console.warn( + JSON.stringify({ + level: 'WARN', + action: 'config.api_key_not_set', + message: + 'CONVERTER_API_KEY env not set; API key middleware will reject all requests', + timestamp: new Date().toISOString(), + }) + ); + } else { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + level: 'INFO', + action: 'config.api_key_enabled', + message: 'API key middleware enabled', + // 絕不印 key 本身;length 是公開資訊(key 長度固定 64 hex chars) + api_key_length: apiKey.length, + timestamp: new Date().toISOString(), + }) + ); + } // === File Access Agent(T7 起為必填)=== // T7:promote 流程已上線,FAA URL / audience 必須在啟動時驗證;少了就 fail-fast。 @@ -148,12 +219,7 @@ function loadConfig() { // 單檔 PUT timeout,預設 300s(500MB @ 5MB/s 下界),對齊 TDD §6.4。 const promoteTimeoutMs = optionalIntEnv('PROMOTE_TIMEOUT_MS', 300 * 1000); - // === JWKS cache 行為 === - const jwksCacheMaxAgeMs = optionalIntEnv('JWKS_CACHE_MAX_AGE_MS', 10 * 60 * 1000); // 10 分鐘 - const jwksCooldownMs = optionalIntEnv('JWKS_COOLDOWN_MS', 30 * 1000); // 30 秒 - const jwtClockToleranceSec = optionalIntEnv('JWT_CLOCK_TOLERANCE_SEC', 60); // 60 秒 - - // === OAuth Client(取 token 用,T2)=== + // === OAuth Client(converter → FAA 取 token 用,保留)=== // refresh skew:cache 內 token 距離 expiresAt 還有多少 ms 時就主動 refresh。 // 預設 60s,避免 race condition(取 token 時剛好過期)。 const oauthRefreshSkewMs = optionalIntEnv('OAUTH_TOKEN_REFRESH_SKEW_MS', 60 * 1000); @@ -234,28 +300,19 @@ function loadConfig() { return Object.freeze({ memberCenter: Object.freeze({ - issuer: mcIssuer, - jwksUrl: mcJwksUrl, tokenUrl: mcTokenUrl, }), converter: Object.freeze({ - audience, clientId, clientSecret, - tenantId, - scopeWrite, - scopeRead, + apiKey, }), + trustProxy, fileAccessAgent: Object.freeze({ baseUrl: faaBaseUrl, audience: faaAudience, promoteTimeoutMs, }), - jwks: Object.freeze({ - cacheMaxAgeMs: jwksCacheMaxAgeMs, - cooldownMs: jwksCooldownMs, - clockToleranceSec: jwtClockToleranceSec, - }), oauthClient: Object.freeze({ refreshSkewMs: oauthRefreshSkewMs, timeoutMs: oauthTimeoutMs, diff --git a/apps/task-scheduler/src/middleware/__tests__/perClientRateLimit.test.js b/apps/task-scheduler/src/middleware/__tests__/perClientRateLimit.test.js index 32dac51..272774e 100644 --- a/apps/task-scheduler/src/middleware/__tests__/perClientRateLimit.test.js +++ b/apps/task-scheduler/src/middleware/__tests__/perClientRateLimit.test.js @@ -29,7 +29,7 @@ afterAll(() => { /** * 啟動一個小 app: * - requestId middleware - * - 一個假的 requireAuth → 把 query.clientId 寫到 req.auth.clientId + * - 一個假的 requireApiKey stub → 把 query.clientId 寫到 req.auth.clientId * - perClientRateLimiter * - 一個 echo handler * - errorHandler 在最後 diff --git a/apps/task-scheduler/src/middleware/perClientRateLimit.js b/apps/task-scheduler/src/middleware/perClientRateLimit.js index 9960865..062c4d2 100644 --- a/apps/task-scheduler/src/middleware/perClientRateLimit.js +++ b/apps/task-scheduler/src/middleware/perClientRateLimit.js @@ -8,16 +8,16 @@ * - per-client_id 則對齊 TDD §1.1:300 req / 5 min per client_id,是商務層的 * 合約上限(vendor SLA) * - * 為什麼必須掛在 requireAuth 之後: + * 為什麼必須掛在 requireApiKey 之後(Phase 0.8b A3 起;先前為 requireAuth(scope)): * - 要拿 `req.auth.clientId` 當 keyGenerator 的 key - * - 沒驗證的 request 會在 requireAuth 階段就被 401 擋掉,不會走到 limiter - * - 結果:未驗證流量先被 IP-based limiter(外層)+ requireAuth 擋; + * - 沒驗證的 request 會在 requireApiKey 階段就被 401 擋掉,不會走到 limiter + * - 結果:未驗證流量先被 IP-based limiter(外層)+ requireApiKey 擋; * 驗證過的流量再被 per-client_id limiter(內層)擋 * * 為什麼必須掛在 multer 之前: * - multer 會把 multipart body 全部讀進 memoryStorage(最大 500MB) * - 若 limiter 在 multer 之後,超過 quota 的 client 仍會把 500MB 灌進 server 才拒 - * - 結論:requireAuth → perClientRateLimit → multer → handler 是唯一正確順序 + * - 結論:requireApiKey → perClientRateLimit → multer → handler 是唯一正確順序 * * 安全: * - express-rate-limit 預設用 memory store,是「per Node process」計數 @@ -48,6 +48,15 @@ const DEFAULT_MAX = 300; * @param {object} [opts] * @param {number} [opts.windowMs=300000] * @param {number} [opts.max=300] + * @param {(req: import('express').Request) => string} [opts.keyGenerator] + * - Phase 0.8b Phase B 新增:客製 bucket key 抽取(如 token_fingerprint)。 + * 不傳則沿用既有 clientId-based 行為(向後相容)。 + * @param {object} [opts.errorDetails] + * - Phase 0.8b Phase B 新增:429 ApiError details 額外欄位(如 + * `{ limit_type: 'burst' }`)。會 merge 進 details 與 `retry_after_seconds`。 + * @param {(req: import('express').Request, retryAfterSec: number) => void} [opts.onLimitExceeded] + * - Phase 0.8b Phase B 新增:限流命中時的 audit hook(不寫 audit 就傳 undef)。 + * 用於 /result 寫 `result.rate_limited` audit log;既有 endpoint 保持原行為。 * @returns {import('express').RequestHandler} */ function createPerClientRateLimiter(opts = {}) { @@ -56,33 +65,65 @@ function createPerClientRateLimiter(opts = {}) { : DEFAULT_WINDOW_MS; const max = Number.isInteger(opts.max) && opts.max > 0 ? opts.max : DEFAULT_MAX; + // 預設 keyGenerator:clientId-based(向後相容既有 endpoint) + // 新 endpoint(/result)可注入 keyGenerator 改用 token_fingerprint + const keyGenerator = + typeof opts.keyGenerator === 'function' + ? opts.keyGenerator + : function defaultKeyGenerator(req) { + // requireApiKey 已在前面跑過 → req.auth.clientId 必有;保險起見 fallback + // 到 IP,避免 undefined key 把所有 anon 計成同一個 bucket。 + const clientId = + req && req.auth && typeof req.auth.clientId === 'string' + ? req.auth.clientId + : null; + if (clientId) return `cid:${clientId}`; + // fallback 不應該發生(middleware 順序保證),這裡用 IP 防 NaN-keyed bucket + return `ip:${req.ip || 'unknown'}`; + }; + + const errorDetails = + opts.errorDetails && typeof opts.errorDetails === 'object' + ? opts.errorDetails + : null; + const onLimitExceeded = + typeof opts.onLimitExceeded === 'function' ? opts.onLimitExceeded : null; + return rateLimit({ windowMs, max, // 開啟標準 RateLimit-* header(RFC draft);同時保留 X-RateLimit-* legacy standardHeaders: true, legacyHeaders: true, - keyGenerator(req) { - // requireAuth 已在前面跑過 → req.auth.clientId 必有;保險起見 fallback - // 到 IP,避免 undefined key 把所有 anon 計成同一個 bucket。 - const clientId = - req && req.auth && typeof req.auth.clientId === 'string' - ? req.auth.clientId - : null; - if (clientId) return `cid:${clientId}`; - // fallback 不應該發生(middleware 順序保證),這裡用 IP 防 NaN-keyed bucket - return `ip:${req.ip || 'unknown'}`; - }, + keyGenerator, handler(req, res, next /* , options */) { // 統一走 errorHandler,回 v1 標準格式 // express-rate-limit 已經設好 Retry-After / RateLimit-* headers;不要 res.json 自己回 // 透過 next(ApiError) 走 errorHandler 才能含 request_id - const retryAfterSec = res.getHeader('Retry-After'); + const retryAfterHeader = res.getHeader('Retry-After'); + const retryAfterSec = + typeof retryAfterHeader === 'string' + ? Number(retryAfterHeader) + : retryAfterHeader; + // audit hook(給 /result 寫 result.rate_limited) + if (onLimitExceeded) { + try { + onLimitExceeded(req, retryAfterSec); + } catch (_) { + /* noop — audit log 失敗不阻塞回應 */ + } + } + const baseDetails = { retry_after_seconds: retryAfterSec }; + const finalDetails = errorDetails + ? { ...baseDetails, ...errorDetails } + : baseDetails; return next( - new ApiError(429, 'rate_limit_exceeded', '請求頻率過高,請稍後再試', { - retry_after_seconds: - typeof retryAfterSec === 'string' ? Number(retryAfterSec) : retryAfterSec, - }) + new ApiError( + 429, + 'rate_limit_exceeded', + '請求頻率過高,請稍後再試', + finalDetails + ) ); }, }); diff --git a/apps/task-scheduler/src/middleware/resultBandwidthQuota.js b/apps/task-scheduler/src/middleware/resultBandwidthQuota.js new file mode 100644 index 0000000..a6c1ff4 --- /dev/null +++ b/apps/task-scheduler/src/middleware/resultBandwidthQuota.js @@ -0,0 +1,305 @@ +/** + * Bandwidth quota middleware for `/result`(Phase 0.8b Phase B、AC-3 / api-result.md §9)。 + * + * 用途:限制 caller(per `token_fingerprint`)在 sliding hourly / daily window + * 內可下載的 NEF binary 總 bytes。pre-check 拒「一次性過量」+ post-stream + * 累計實際 stream 過的 bytes(不是 Content-Length,避免 client 中斷時錯誤累計)。 + * + * 為什麼新寫不複用 `perClientRateLimit.js`: + * - 既有 limiter 是「次數軸」(req count / window) + * - 本 middleware 是「容量軸」(bytes / window) + * - 語意不同;attacker 的攻擊面是 bandwidth 不是 req count(即使 20 req/min + * 都剛好踩線,仍可下載 720GB/6hr) + * + * 為什麼複用 `requireApiKey` 的 token_fingerprint: + * - bucket key 用 `clientId` 在 1:1 trust 下所有 caller 都 `visionA-service`、 + * bucket 平坦化、無區分力 + * - `token_fingerprint` 在 Phase 0.8b 1:1 trust 下實際等同 caller id;Phase 2 + * 引入 per-caller credential 後自動對齊 + * - audit log 已記錄 fingerprint、限流統計與 forensic 同 key 可 cross-correlate + * + * 兩階段檢查(pre-check + post-stream incr): + * 1. pre-check:用 Content-Length(從 MinIO HEAD)估算下個 req 會用多少 bytes, + * 若 used + estimated > quota → 429(拒絕、不浪費頻寬) + * 2. post-stream:在 res.on('finish') 累計實際 stream 過的 bytes + * (handler 在 stream.on('data') 累計到 res._bytesStreamed 給本 middleware 讀) + * + * Multi-instance 限制: + * in-memory counter(per-process)。Phase 2 多 instance 部署前必切 Redis,否則 + * quota 會被「乘以 instance 數」放鬆(2 × 1 GB/hr = 2 GB/hr 實際 quota)。 + * 見 api-result.md §9.9 + security.md 候補 #8。 + * + * Sliding window 重置: + * - hourly:counter 內記 resetAt = now + 3600_000;超過時 reset 並重設 resetAt + * - daily:counter 內記 resetAt = now + 86_400_000;超過時 reset 並重設 resetAt + * - 兩軸獨立 reset(不是 rolling window precise;對 attacker 偵測夠用) + * + * Audit log: + * - quota 超限時必寫 `result.bandwidth_quota_exceeded`(WARN level) + * - 沿用 A.7 五欄 + /result 四欄 + event 特有欄位 + * - 拒絕本身就是 forensic 訊號(pattern:同 fingerprint 重複命中 → identify abuser) + */ + +'use strict'; + +const { ApiError } = require('./errorHandler'); + +const HOUR_MS = 60 * 60 * 1000; +const DAY_MS = 24 * 60 * 60 * 1000; + +/** + * 預設值。對齊 api-result.md §9.2 量化分析(normal user P95 ≈ 120 req/min ÷ 10 caller + * ≈ 12 req/min/key、留 1.7× headroom)。 + */ +const DEFAULT_HOURLY_LIMIT_BYTES = 1024 * 1024 * 1024; // 1 GB +const DEFAULT_DAILY_LIMIT_BYTES = 6 * 1024 * 1024 * 1024; // 6 GB + +/** + * 結構化 audit log(與 promote / apiKeyMiddleware 同樣格式)。 + * + * @param {object} fields + */ +function defaultLogAudit(fields) { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + service: 'task-scheduler', + timestamp: new Date().toISOString(), + ...fields, + }) + ); +} + +/** + * 建立 bandwidth quota middleware(exposed factory)。 + * + * @param {object} [opts] + * @param {number} [opts.hourlyLimitBytes=1073741824] - 1 hr 上限 bytes + * @param {number} [opts.dailyLimitBytes=6442450944] - 24 hr 上限 bytes + * @param {(req: import('express').Request) => string} [opts.keyGenerator] + * - 預設用 `req.auth?.tokenFingerprint || 'unknown'` + * @param {(fields: object) => void} [opts.onLog] - 結構化 log hook(測試友善) + * @returns {{ + * middleware: import('express').RequestHandler, + * consume: (key: string, bytes: number) => void, // handler 在 stream 完成後呼叫 + * getState: (key: string) => { hourlyBytes: number, dailyBytes: number, + * hourlyResetAt: number, dailyResetAt: number } | null, + * }} + */ +function createResultBandwidthQuota(opts = {}) { + const hourlyLimitBytes = + Number.isInteger(opts.hourlyLimitBytes) && opts.hourlyLimitBytes > 0 + ? opts.hourlyLimitBytes + : DEFAULT_HOURLY_LIMIT_BYTES; + const dailyLimitBytes = + Number.isInteger(opts.dailyLimitBytes) && opts.dailyLimitBytes > 0 + ? opts.dailyLimitBytes + : DEFAULT_DAILY_LIMIT_BYTES; + const keyGenerator = + typeof opts.keyGenerator === 'function' + ? opts.keyGenerator + : (req) => + req && req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : 'unknown'; + const onLog = typeof opts.onLog === 'function' ? opts.onLog : defaultLogAudit; + + /** + * In-memory counter:Map + * + * 為什麼 Map 而非 object:Map 對 string key 安全(不會撞 __proto__)、 + * 操作語意明確、Phase 2 切 Redis 時 wire 介面更接近。 + */ + const counters = new Map(); + + /** + * 取得 / 建立 counter、自動 reset 過期 window。 + * + * @param {string} key + * @returns {{ hourlyBytes: number, hourlyResetAt: number, + * dailyBytes: number, dailyResetAt: number }} + */ + function getOrCreate(key) { + const now = Date.now(); + let c = counters.get(key); + if (!c) { + c = { + hourlyBytes: 0, + hourlyResetAt: now + HOUR_MS, + dailyBytes: 0, + dailyResetAt: now + DAY_MS, + }; + counters.set(key, c); + return c; + } + // sliding window reset + if (now >= c.hourlyResetAt) { + c.hourlyBytes = 0; + c.hourlyResetAt = now + HOUR_MS; + } + if (now >= c.dailyResetAt) { + c.dailyBytes = 0; + c.dailyResetAt = now + DAY_MS; + } + return c; + } + + /** + * Handler 在 res.on('finish') 完成 stream 後呼叫,累計實際 stream 過的 bytes。 + * + * 為什麼分離 consume 與 middleware: + * - middleware 階段拿不到 stream 實際 bytes(pre-check 用 estimated) + * - handler 在 stream 完成後(accumulator 已知精確值)才呼叫 consume + * - 兩階段拆分讓單元測試更容易(middleware 與 consume 各自獨立測) + * + * 防呆:non-number / negative / NaN 一律忽略(防呆 caller 寫錯)。 + * + * @param {string} key + * @param {number} bytes + */ + function consume(key, bytes) { + if (typeof bytes !== 'number' || !Number.isFinite(bytes) || bytes <= 0) { + return; + } + const c = getOrCreate(key); + c.hourlyBytes += bytes; + c.dailyBytes += bytes; + } + + /** + * 暴露給單元測試 / health check 用。 + * + * @param {string} key + */ + function getState(key) { + const c = counters.get(key); + if (!c) return null; + return { + hourlyBytes: c.hourlyBytes, + hourlyResetAt: c.hourlyResetAt, + dailyBytes: c.dailyBytes, + dailyResetAt: c.dailyResetAt, + }; + } + + /** + * Express middleware:pre-check + audit log。 + * + * Pre-check 邏輯: + * - 若 used >= limit(已滿)→ 直接拒(pre-check 用 used 而非 used + 0,避免 + * handler 不知道 estimated size 時誤判) + * - 若 used + estimatedSize > limit(估算後會超)→ 拒 + * - estimatedSize 由 caller(handler)在 middleware 跑前塞 `req.estimatedResultSize` + * 如果沒塞、retry conservatively 用 `used >= limit` 判斷(safer side) + * + * @param {import('express').Request} req + * @param {import('express').Response} res + * @param {import('express').NextFunction} next + */ + function middleware(req, res, next) { + const key = keyGenerator(req); + const c = getOrCreate(key); + const now = Date.now(); + + // Pre-check:estimatedResultSize 由 handler 在 middleware 跑前注入(若 size 已知) + // 若沒注入,conservatively 用 used >= limit 判斷(不檢 incremental) + const estimatedSize = + typeof req.estimatedResultSize === 'number' && + Number.isFinite(req.estimatedResultSize) && + req.estimatedResultSize > 0 + ? req.estimatedResultSize + : 0; + + // 共用 audit log context(A.7 五欄 + /result 四欄) + const auditBase = { + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: req.method || 'GET', + http_path: req.originalUrl || (req.url || ''), + job_id: req.params && req.params.id ? req.params.id : null, + }; + + // hourly check + if ( + c.hourlyBytes >= hourlyLimitBytes || + (estimatedSize > 0 && c.hourlyBytes + estimatedSize > hourlyLimitBytes) + ) { + const retryAfterSec = Math.max(1, Math.ceil((c.hourlyResetAt - now) / 1000)); + onLog({ + level: 'WARN', + action: 'result.bandwidth_quota_exceeded', + ...auditBase, + duration_ms: 0, + limit_type: 'bandwidth_hourly', + bytes_used_in_window: c.hourlyBytes, + retry_after_seconds: retryAfterSec, + }); + res.setHeader('Retry-After', String(retryAfterSec)); + return next( + new ApiError( + 429, + 'bandwidth_quota_exceeded', + '下載額度已用完,請稍後再試', + { + limit_type: 'bandwidth_hourly', + retry_after_seconds: retryAfterSec, + } + ) + ); + } + + // daily check + if ( + c.dailyBytes >= dailyLimitBytes || + (estimatedSize > 0 && c.dailyBytes + estimatedSize > dailyLimitBytes) + ) { + const retryAfterSec = Math.max(1, Math.ceil((c.dailyResetAt - now) / 1000)); + onLog({ + level: 'WARN', + action: 'result.bandwidth_quota_exceeded', + ...auditBase, + duration_ms: 0, + limit_type: 'bandwidth_daily', + bytes_used_in_window: c.dailyBytes, + retry_after_seconds: retryAfterSec, + }); + res.setHeader('Retry-After', String(retryAfterSec)); + return next( + new ApiError( + 429, + 'bandwidth_quota_exceeded', + '下載額度已用完,請稍後再試', + { + limit_type: 'bandwidth_daily', + retry_after_seconds: retryAfterSec, + } + ) + ); + } + + // pass → handler 後續會在 stream 完成時呼叫 consume() + return next(); + } + + return { + middleware, + consume, + getState, + // 測試 / monitoring 用 + _internals: { + getOrCreate, + HOUR_MS, + DAY_MS, + }, + }; +} + +module.exports = { + createResultBandwidthQuota, + DEFAULT_HOURLY_LIMIT_BYTES, + DEFAULT_DAILY_LIMIT_BYTES, +}; diff --git a/apps/task-scheduler/src/middleware/resultStreamConcurrency.js b/apps/task-scheduler/src/middleware/resultStreamConcurrency.js new file mode 100644 index 0000000..339dd11 --- /dev/null +++ b/apps/task-scheduler/src/middleware/resultStreamConcurrency.js @@ -0,0 +1,156 @@ +/** + * Concurrent stream cap middleware for `/result`(Phase 0.8b Phase B、AC-4 / api-result.md §15.3)。 + * + * 用途:限制 server 同時進行中的 `/result` stream 數量(per-process counter)。 + * 超過上限時立即拒(503 service_busy + Retry-After),避免: + * - File descriptor exhaustion(Node 預設 1024-65536 fd) + * - MinIO upstream connection 飽和 + * - Slowloris 攻擊(10 個 slow connection × 5 min timeout = 50 min 霸佔) + * + * 為什麼新寫不複用 `uploadConcurrency.js`: + * - `uploadConcurrency.js` 語意是「同 jobID 不能重複 upload」(per-job key) + * - 本 middleware 語意是「server 整體最多 N 個 stream」(global counter) + * - 觸發回應不同:upload 用 409 conflict、download 用 503 service_busy + * - 兩者語意根本不同(互斥 vs 容量),不該複用同一支 middleware + * + * 為什麼用 counter(global)而非 per-fingerprint: + * - `/result` 的攻擊面在「server fd / connection pool 耗盡」,這是 global 資源 + * - per-fingerprint cap 已由 rate limit / bandwidth quota(§9)覆蓋 + * - 雙層防禦:per-fingerprint 軸防個別 abuser、global 軸防 server 過載 + * + * Release 三個情境(idempotent flag 確保只 release 一次): + * - `res.once('finish')`:response 正常送完(200 happy path) + * - `res.once('close')`:底層 socket 關閉(client abort、error、timeout) + * - `res.once('error')`:response stream 出錯(罕見、保險) + * + * Multi-instance 限制: + * in-memory counter(per-process)。Phase 2 多 instance 部署前必切 Redis + * 分散式 semaphore,否則 cap 會被「乘以 instance 數」放鬆。 + * 見 api-result.md §15.3 + security.md 候補 #8。 + * + * Audit log: + * - 超限時必寫 `result.rate_limited`(WARN、limit_type: 'concurrent') + * - 沿用 A.7 五欄 + /result 四欄(rate_limited 在 streaming 前觸發,size_bytes + * / stream_completed 不適用,只記 job_id + duration_ms = 0) + * + * 為什麼 503 而非 429: + * - 503 = server 暫時不可用(capacity);client 應 retry-as-is(不需降速) + * - 429 = client rate / quota 超標;client 應降速 + 指數退避 + * - concurrent cap hit 是「server 滿載」、不是「caller 超標」、應用 503 + */ + +'use strict'; + +const { ApiError } = require('./errorHandler'); + +const DEFAULT_MAX_CONCURRENT = 10; +const DEFAULT_RETRY_AFTER_SECONDS = 30; + +/** + * 結構化 audit log。 + * + * @param {object} fields + */ +function defaultLogAudit(fields) { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + service: 'task-scheduler', + timestamp: new Date().toISOString(), + ...fields, + }) + ); +} + +/** + * 建立 concurrency limiter middleware。 + * + * @param {object} [opts] + * @param {number} [opts.maxConcurrent=10] + * @param {number} [opts.retryAfterSeconds=30] + * @param {(fields: object) => void} [opts.onLog] + * @returns {{ + * middleware: import('express').RequestHandler, + * getInFlight: () => number, + * getMax: () => number, + * }} + */ +function createResultStreamConcurrencyLimiter(opts = {}) { + const maxConcurrent = + Number.isInteger(opts.maxConcurrent) && opts.maxConcurrent > 0 + ? opts.maxConcurrent + : DEFAULT_MAX_CONCURRENT; + const retryAfterSeconds = + Number.isInteger(opts.retryAfterSeconds) && opts.retryAfterSeconds > 0 + ? opts.retryAfterSeconds + : DEFAULT_RETRY_AFTER_SECONDS; + const onLog = typeof opts.onLog === 'function' ? opts.onLog : defaultLogAudit; + + let activeStreams = 0; + + function middleware(req, res, next) { + if (activeStreams >= maxConcurrent) { + // 共用 audit log context + const auditBase = { + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: req.method || 'GET', + http_path: req.originalUrl || (req.url || ''), + job_id: req.params && req.params.id ? req.params.id : null, + }; + onLog({ + level: 'WARN', + action: 'result.rate_limited', + ...auditBase, + duration_ms: 0, + limit_type: 'concurrent', + retry_after_seconds: retryAfterSeconds, + active_streams_at_reject: activeStreams, + }); + res.setHeader('Retry-After', String(retryAfterSeconds)); + return next( + new ApiError( + 503, + 'service_busy', + '伺服器忙碌中,請稍後再試', + { + limit_type: 'concurrent', + retry_after_seconds: retryAfterSeconds, + } + ) + ); + } + + // Acquire slot + activeStreams += 1; + + // idempotent release(finish / close / error 任一觸發即釋放,且只釋放一次) + let released = false; + const release = () => { + if (released) return; + released = true; + activeStreams = Math.max(0, activeStreams - 1); + }; + res.once('finish', release); + res.once('close', release); + res.once('error', release); + + return next(); + } + + return { + middleware, + getInFlight: () => activeStreams, + getMax: () => maxConcurrent, + }; +} + +module.exports = { + createResultStreamConcurrencyLimiter, + DEFAULT_MAX_CONCURRENT, + DEFAULT_RETRY_AFTER_SECONDS, +}; diff --git a/apps/task-scheduler/src/middleware/uploadConcurrency.js b/apps/task-scheduler/src/middleware/uploadConcurrency.js index f71bd61..1a56384 100644 --- a/apps/task-scheduler/src/middleware/uploadConcurrency.js +++ b/apps/task-scheduler/src/middleware/uploadConcurrency.js @@ -18,7 +18,7 @@ * 設計原則: * - **必須掛在 multer 之前**:要在 multipart parse 開始前就決定收不收這個請求; * 若先 multer 才檢查 concurrency,500MB 已經灌進記憶體,limit 失去意義 - * - **必須掛在 requireAuth + rate limit 之後**:避免 unauthorized / 超 quota 流量 + * - **必須掛在 requireApiKey + rate limit 之後**:避免 unauthorized / 超 quota 流量 * 擠占有限的 slot;先讓那兩層擋掉非法流量 * - **acquire 在 middleware 進入時、release 在 response close/finish 時**: * `res.on('close')` 涵蓋所有結束情境(成功 / error / abort),保證 counter diff --git a/apps/task-scheduler/src/routes/v1/__tests__/createJob.integration.test.js b/apps/task-scheduler/src/routes/v1/__tests__/createJob.integration.test.js index 0536b15..defaf83 100644 --- a/apps/task-scheduler/src/routes/v1/__tests__/createJob.integration.test.js +++ b/apps/task-scheduler/src/routes/v1/__tests__/createJob.integration.test.js @@ -1,17 +1,17 @@ /** - * POST /api/v1/jobs 整合測試(T5)。 + * POST /api/v1/jobs 整合測試(T5;Phase 0.8b A4 改 API key、A6 dead code 清乾淨)。 * * 測試範圍: - * - 401 invalid_token:缺 Authorization - * - 403 insufficient_scope:token 缺 converter:job.write + * - 401 invalid_token:缺 Authorization / wrong API key / 同長度但不符 / 不同長度 * - 400 validation_error:缺欄位 / 副檔名錯 * - 413 file_too_large:multer LIMIT_FILE_SIZE * - 500 misconfiguration:STORAGE_BACKEND !== 'minio' * - 502 storage_unavailable:MinIO 寫失敗 + * - 503 service_unavailable:config.converter.apiKey 為空(fail-secure) * - 409 user_has_active_job:同 user 已有 active job(M5 重點) - * - 201 happy path:完整流程,含 ref_images + * - 201 happy path:完整流程,含 ref_images + audit log fields * - * 啟動方式:用 createApp + 注入 mock deps(包含 verify 函數注入), + * 啟動方式:用 createApp + 注入 mock deps + requireApiKey({ expectedApiKey }), * app.listen(0),用 fetch / FormData 真打 HTTP。 */ @@ -23,7 +23,10 @@ const { createApp } = require('../../../app'); const { createSseService } = require('../../../services/sseService'); const { createJobService } = require('../../../services/jobService'); const { createUploader } = require('../../../middleware/upload'); -const { requireAuth } = require('../../../auth/middleware'); +const { requireApiKey } = require('../../../auth/apiKeyMiddleware'); + +// 給 startApp 注入的固定 API key,用於 happy / valid path 測試 +const TEST_API_KEY = 'integration-test-api-key-12345678901234567890123456789012'; // Mock luaScripts to control claim / release outcome without real Redis Lua jest.mock('../../../redis/luaScripts', () => ({ @@ -37,48 +40,23 @@ jest.mock('../../../redis/luaScripts', () => ({ })); const { claimActiveJob, releaseActiveJob } = require('../../../redis/luaScripts'); +// Phase 0.8b A4:FAKE_CONFIG 已大幅簡化(砍除 OAuth resource-server 段、jwks 段)。 +// 只留 createJobsRouter / createPromoteRouter 實際 destructure 用到的 keys。 +// A6 確認:grep `config.converter.scopeWrite` / `config.converter.scopeRead` 在 +// src/ 下無命中,無需保留 scope* placeholder。 const FAKE_CONFIG = Object.freeze({ - memberCenter: { - issuer: 'https://auth.test.local', - jwksUrl: 'https://auth.test.local/.well-known/jwks', - tokenUrl: '', - }, converter: { - audience: 'kneron_converter_api', - clientId: '', - clientSecret: '', - tenantId: '', - scopeWrite: 'converter:job.write', - scopeRead: 'converter:job.read', + apiKey: TEST_API_KEY, }, fileAccessAgent: { baseUrl: '', audience: 'file_access_api' }, - jwks: { cacheMaxAgeMs: 60000, cooldownMs: 30000, clockToleranceSec: 60 }, + // multipart 用既有預設不需指定(jobs.js 內 fallback 100 / 預設值) }); // --------------------------------------------------------------------------- // Helpers +// Happy path 統一帶 `Bearer ${TEST_API_KEY}`;錯誤 path 用任意不匹配字串觸發 401。 // --------------------------------------------------------------------------- -/** - * 建立 verify 函數:根據 token 字串決定回 claims / throw。 - */ -function makeVerifier({ tokens }) { - return async (token) => { - const entry = tokens[token]; - if (!entry) { - const err = new Error('invalid token'); - err.code = 'ERR_JWS_SIGNATURE_VERIFICATION_FAILED'; - throw err; - } - if (entry.expired) { - const err = new Error('expired'); - err.code = 'ERR_JWT_EXPIRED'; - throw err; - } - return { payload: entry.claims }; - }; -} - function makeFakeRedis() { const store = new Map(); return { @@ -125,20 +103,25 @@ function makeFakeMinio({ uploadFails = false } = {}) { /** * 建立有完整 deps 的 app(含 v1 POST 真實 handler)。 * - * 為了能注入「假的 verify 函數」(避免去打真的 JWKS),我們在 app 啟動前把 - * `requireAuth` 預先 bind 到 verify mock,再透過 v1 router 的 deps.config 傳遞。 + * Phase 0.8b A4 起,認證從 OAuth JWT 改為 API key(pre-shared),不需要 verify mock。 + * 此 startApp 直接組裝 router、注入 `requireApiKey({ expectedApiKey: TEST_API_KEY })`, + * 不走 createApp 的整合 path(保留旁路是因為 createApp 不接受 middleware 注入)。 * - * 但目前 jobs.js 的 buildCreateJobHandler 是直接 require requireAuth;要注入 - * verify 函數需要從 v1Deps 多帶一個 `verify` 給 requireAuth。最小改動:把 - * verify 注入到 requireAuth 的 deps 中。 + * @param {object} opts + * @param {'minio' | 'local'} [opts.storageBackend='minio'] + * @param {boolean} [opts.uploadFails=false] - mock MinIO 上傳失敗 + * @param {object} [opts.rateLimit] + * @param {number} [opts.maxFileSize] - 給 413 測試用(覆寫 multer fileSize limit) + * @param {string} [opts.expectedApiKey] - middleware 預期的 API key; + * 預設 TEST_API_KEY;傳空字串可測 503 fail-secure path */ async function startApp({ storageBackend = 'minio', uploadFails = false, rateLimit = { windowMs: 60000, max: 1000 }, - tokens, - maxFileSize, // 給 413 測試用(覆寫 multer fileSize limit) -}) { + maxFileSize, + expectedApiKey = TEST_API_KEY, +} = {}) { const redis = makeFakeRedis(); const minio = makeFakeMinio({ uploadFails }); const sseService = createSseService(); @@ -150,15 +133,6 @@ async function startApp({ }); const uploader = createUploader(maxFileSize ? { maxFileSize } : undefined); - // 為了不讓 requireAuth 真的去打 JWKS,我們這裡 monkey-patch jobs.js 的 module - // 太重;改用更直接的方式:寫一個薄層 app 直接 mount jobs.js 的 router 但 - // **預先把 requireAuth 改造**為「使用我們的 verify mock」。 - // - // 實際採用:透過 jobs.js 的 createJobsRouter(deps) 注入 config + verify? - // 目前 createJobsRouter(deps) 內部 requireAuth(scope, { config }) 沒帶 verify。 - // 解法:在 createApp 之外,直接組裝 router,把 verify 注入。 - - // 為了簡化,我們直接在這裡組 app(不用 createApp 的整合 path) const app = express(); const helmet = require('helmet'); const cors = require('cors'); @@ -180,13 +154,9 @@ async function startApp({ app.use(express.json({ limit: '10mb' })); app.use(express.urlencoded({ extended: true, limit: '10mb' })); - // v1 router with verify injection + // v1 router with API key injection(Phase 0.8b A4) const v1 = express.Router(); - const verify = makeVerifier({ tokens }); - const requireWriteAuth = requireAuth(FAKE_CONFIG.converter.scopeWrite, { - config: FAKE_CONFIG, - verify, - }); + const requireWriteAuth = requireApiKey({ expectedApiKey }); const perClientLimiter = createPerClientRateLimiter(rateLimit); const handler = jobsInternals.buildCreateJobHandler({ jobService, @@ -244,35 +214,6 @@ function buildFormData({ modelBuffer, modelFilename = 'model.onnx', refImages = return fd; } -const HAPPY_TOKENS = { - 'good-write-token': { - claims: { - sub: 'kneron_converter_client', - client_id: 'visionA-backend-client', - scope: 'converter:job.write converter:job.read', - }, - }, - 'read-only-token': { - claims: { - sub: 'reader', - client_id: 'visionA-backend-client', - scope: 'converter:job.read', // 缺 write - }, - }, - 'expired-token': { - expired: true, - claims: {}, - }, - // 用於 409 衝突情境的另一個 client(避免被前一個測試的 quota 累計影響) - 'good-write-token-alt': { - claims: { - sub: 'kneron_converter_client', - client_id: 'visionA-backend-client-alt', - scope: 'converter:job.write', - }, - }, -}; - beforeAll(() => { jest.spyOn(console, 'log').mockImplementation(() => {}); jest.spyOn(console, 'warn').mockImplementation(() => {}); @@ -307,7 +248,7 @@ const happyFields = () => ({ describe('POST /api/v1/jobs — auth', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: HAPPY_TOKENS }); + ctx = await startApp(); }); afterEach(async () => { await ctx.close(); @@ -325,7 +266,7 @@ describe('POST /api/v1/jobs — auth', () => { expect(typeof body.error.request_id).toBe('string'); }); - it('returns 401 invalid_token when Bearer token unknown', async () => { + it('returns 401 invalid_token when Bearer token unknown (different length)', async () => { const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', @@ -337,32 +278,21 @@ describe('POST /api/v1/jobs — auth', () => { expect(body.error.code).toBe('invalid_token'); }); - it('returns 401 token_expired with expired token', async () => { + // A6 新增:同長度但不匹配的 key(驗證 constant-time compare 走完整 byte 比對) + it('returns 401 invalid_token when Bearer token has same length but different bytes', async () => { + // 與 TEST_API_KEY 同長度(TEST_API_KEY.length,當前為 57 字元)但完全不同 bytes + const sameLenWrongKey = 'X'.repeat(TEST_API_KEY.length); + expect(sameLenWrongKey.length).toBe(TEST_API_KEY.length); + const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer expired-token' }, + headers: { Authorization: `Bearer ${sameLenWrongKey}` }, body: fd, }); expect(res.status).toBe(401); const body = await res.json(); - expect(body.error.code).toBe('token_expired'); - }); - - it('returns 403 insufficient_scope with read-only token', async () => { - const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); - const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { - method: 'POST', - headers: { Authorization: 'Bearer read-only-token' }, - body: fd, - }); - expect(res.status).toBe(403); - const body = await res.json(); - expect(body.error.code).toBe('insufficient_scope'); - expect(body.error.details).toMatchObject({ - required_scope: 'converter:job.write', - }); - expect(body.error.details.provided_scopes).toEqual(['converter:job.read']); + expect(body.error.code).toBe('invalid_token'); }); it('sets Connection: close on 401 (M2)', async () => { @@ -378,6 +308,104 @@ describe('POST /api/v1/jobs — auth', () => { // 改驗 status + 連線後續行為(fetch 本身會處理);至少 status 是 401 expect(res.status).toBe(401); }); + + // A6 新增:fail-secure path — server 端 CONVERTER_API_KEY 未設定 → 503 + it('returns 503 service_unavailable when CONVERTER_API_KEY env not configured', async () => { + const blankCtx = await startApp({ expectedApiKey: '' }); + try { + const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); + // 即使帶了正確 key 也應該被擋(fail-secure:env 未設時一律拒絕) + const res = await fetch(`${blankCtx.baseUrl}/api/v1/jobs`, { + method: 'POST', + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + body: fd, + }); + expect(res.status).toBe(503); + const body = await res.json(); + expect(body.error.code).toBe('service_unavailable'); + expect(body.error.message).toBe('API key not configured'); + } finally { + await blankCtx.close(); + } + }); + + // A7 新增:authenticated request audit log 驗證 + it('writes auth.api_key.authenticated audit log on successful auth (source_ip + fingerprint + request_id)', async () => { + // beforeAll 已 mock console.log;從 mock calls 撈出 audit log + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + + const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); + // 預期 claim 失敗(claimActiveJob mock 預設 undefined → 不重要、handler 之後的 path 不影響 audit log) + claimActiveJob.mockResolvedValueOnce({ ok: true }); + await fetch(`${ctx.baseUrl}/api/v1/jobs`, { + method: 'POST', + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + body: fd, + }); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.authenticated')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.level).toBe('INFO'); + expect(parsed.action).toBe('auth.api_key.authenticated'); + expect(parsed.client_id).toBe('visionA-service'); + expect(typeof parsed.source_ip).toBe('string'); // 127.0.0.1 或 ::ffff:127.0.0.1 + expect(typeof parsed.request_id).toBe('string'); + expect(parsed.http_method).toBe('POST'); + expect(parsed.http_path).toBe('/jobs'); // express subrouter 下 req.path 是 '/jobs' + expect(typeof parsed.token_fingerprint).toBe('string'); + expect(parsed.token_fingerprint.length).toBe(12); + // 絕不能含 TEST_API_KEY 本身 + expect(auditLine).not.toContain(TEST_API_KEY); + }); + + // A7 新增:missing Authorization audit log 驗證 + it('writes auth.api_key.missing audit log when Authorization absent (no fingerprint)', async () => { + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + + const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); + await fetch(`${ctx.baseUrl}/api/v1/jobs`, { + method: 'POST', + body: fd, + }); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.missing')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.action).toBe('auth.api_key.missing'); + expect(typeof parsed.source_ip).toBe('string'); + expect(parsed.token_fingerprint).toBeUndefined(); + }); + + // A7 新增:invalid token audit log(fingerprint of wrong token + NOT token itself) + it('writes auth.api_key.invalid audit log with fingerprint but NOT token verbatim', async () => { + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + const wrongToken = 'A7-wrong-token-for-audit-log-test-zzzzzzzzzzzzzzzzzzzzzzz'; + + const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); + await fetch(`${ctx.baseUrl}/api/v1/jobs`, { + method: 'POST', + headers: { Authorization: `Bearer ${wrongToken}` }, + body: fd, + }); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.invalid')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.action).toBe('auth.api_key.invalid'); + expect(typeof parsed.token_fingerprint).toBe('string'); + expect(parsed.token_fingerprint.length).toBe(12); + // 絕不能含 wrongToken 本身 + expect(auditLine).not.toContain(wrongToken); + // 也不能含 TEST_API_KEY + expect(auditLine).not.toContain(TEST_API_KEY); + }); }); // --------------------------------------------------------------------------- @@ -387,7 +415,7 @@ describe('POST /api/v1/jobs — auth', () => { describe('POST /api/v1/jobs — validation', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: HAPPY_TOKENS }); + ctx = await startApp(); }); afterEach(async () => { await ctx.close(); @@ -400,7 +428,7 @@ describe('POST /api/v1/jobs — validation', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -413,7 +441,7 @@ describe('POST /api/v1/jobs — validation', () => { const fd = buildFormData({ fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -430,7 +458,7 @@ describe('POST /api/v1/jobs — validation', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -448,7 +476,7 @@ describe('POST /api/v1/jobs — validation', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -463,7 +491,7 @@ describe('POST /api/v1/jobs — validation', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -478,13 +506,13 @@ describe('POST /api/v1/jobs — validation', () => { describe('POST /api/v1/jobs — misconfiguration', () => { it('returns 500 misconfiguration when STORAGE_BACKEND !== minio', async () => { - const ctx = await startApp({ storageBackend: 'local', tokens: HAPPY_TOKENS }); + const ctx = await startApp({ storageBackend: 'local' }); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); // shouldn't be reached const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(500); @@ -504,15 +532,12 @@ describe('POST /api/v1/jobs — misconfiguration', () => { describe('POST /api/v1/jobs — storage failure (M5 方案 A)', () => { it('returns 502 storage_unavailable and Redis stays clean', async () => { - const ctx = await startApp({ - uploadFails: true, - tokens: HAPPY_TOKENS, - }); + const ctx = await startApp({ uploadFails: true }); try { const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(502); @@ -534,7 +559,7 @@ describe('POST /api/v1/jobs — storage failure (M5 方案 A)', () => { describe('POST /api/v1/jobs — 409 user_has_active_job', () => { // Sec M4:active_job 已存在時,pre-check 在 MinIO 寫入前就 reject,避免寫入放大 it('returns 409 via pre-check (Sec M4) — no MinIO write when active_job exists', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { // 預先放一個 active job record(M4 pre-check 會先 GET 到) ctx.redis.store.set('user:visionA-user-12345:active_job', 'existing-job-id'); @@ -552,7 +577,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { const fd = buildFormData({ modelBuffer: Buffer.from('mmmm'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); @@ -576,7 +601,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { // Race scenario:pre-check 通過(active_job 不存在)但 Lua claim 回 conflict // (兩個 client 同時通過 pre-check,最後只有一個能透過 Lua claim) it('returns 409 via Lua conflict (race) — MinIO uploaded then cleanup called', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { // ★ pre-check 不會觸發(active_job 不在 Redis) // 但 Lua claim 模擬 race 後的 conflict @@ -601,7 +626,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { const fd = buildFormData({ modelBuffer: Buffer.from('mmmm'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); @@ -626,7 +651,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { // Reviewer Major-1:當 Lua 衝突 + getJob(claimResult.activeJobId) 取不到 record // (race:另一 worker 同步刪掉了 active job record),fallback 只回 active_job_id it('falls back to {active_job_id} only when active job record disappeared (Reviewer Major-1)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: false, @@ -638,7 +663,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(409); @@ -658,7 +683,7 @@ describe('POST /api/v1/jobs — 409 user_has_active_job', () => { describe('POST /api/v1/jobs — 201 happy path', () => { it('creates job successfully with model + ref_images', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); @@ -673,7 +698,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); @@ -711,7 +736,8 @@ describe('POST /api/v1/jobs — 201 happy path', () => { const stored = JSON.parse(claimArgs.jobJson); expect(stored.origin).toBe('api'); expect(stored.user_id).toBe('visionA-user-12345'); - expect(stored.created_by_client_id).toBe('visionA-backend-client'); + // Phase 0.8b A4:apiKeyMiddleware 寫死 clientId='visionA-service' + expect(stored.created_by_client_id).toBe('visionA-service'); expect(stored.input.filename).toBe('model.onnx'); expect(stored.input.ref_images_count).toBe(2); expect(stored.input.size_bytes).toBe(Buffer.from('model-content').length); @@ -724,6 +750,13 @@ describe('POST /api/v1/jobs — 201 happy path', () => { enable_sim_fixed: false, enable_sim_hw: false, }); + // Phase 0.8b Phase B AC-B1:source_filename + platform 頂層寫入 + // - source_filename 是 sanitized 的 safeFilename(與 input.filename 同值), + // 供 GET /jobs/:id/result 構造 download filename + // - platform 頂層欄位是 parameters.platform 的鏡像,避免 buildFilename + // 需深入 parameters 物件 + expect(stored.source_filename).toBe('model.onnx'); + expect(stored.platform).toBe('520'); // enqueue 也已呼叫(onnx queue) expect(ctx.redis.xadd).toHaveBeenCalledTimes(1); @@ -742,7 +775,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); it('handles 0 ref_images correctly', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); const fd = buildFormData({ @@ -751,7 +784,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(201); @@ -764,7 +797,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { it('returns 413 file_too_large when model exceeds limit', async () => { // 使用較小 limit(1KB)避免測試把 500MB buffer 配置進記憶體 - const ctx = await startApp({ tokens: HAPPY_TOKENS, maxFileSize: 1024 }); + const ctx = await startApp({ maxFileSize: 1024 }); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); const oversized = Buffer.alloc(2048, 0x41); // 2KB > 1KB limit @@ -774,7 +807,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(413); @@ -795,7 +828,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { // Sec C2:ref_image 超過 10MB → 413 file_too_large(per-file 限制) it('returns 413 file_too_large when ref_image exceeds 10MB (Sec C2)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); const oversizedRefImage = Buffer.alloc(10 * 1024 * 1024 + 1024, 0x42); // 10MB + 1KB @@ -808,7 +841,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(413); @@ -830,7 +863,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { // Sec M3:version XSS → 400 validation_error it('returns 400 validation_error when version contains XSS (Sec M3)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { const fd = buildFormData({ modelBuffer: Buffer.from('m'), @@ -838,7 +871,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -852,7 +885,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { // Sec M1:user_id XSS → 400 validation_error it('returns 400 validation_error when user_id contains XSS (Sec M1)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { const fd = buildFormData({ modelBuffer: Buffer.from('m'), @@ -860,7 +893,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -874,7 +907,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { // Sec M1:user_id wildcard → 400 it('returns 400 validation_error when user_id contains wildcards (Sec M1)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { const fd = buildFormData({ modelBuffer: Buffer.from('m'), @@ -882,7 +915,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(400); @@ -901,7 +934,7 @@ describe('POST /api/v1/jobs — 201 happy path', () => { describe('POST /api/v1/jobs — enqueue failure rollback (Sec M2 + Reviewer Major-2)', () => { it('releases active_job when enqueue throws (best-effort)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { // claim 成功 claimActiveJob.mockResolvedValueOnce({ ok: true }); @@ -915,7 +948,7 @@ describe('POST /api/v1/jobs — enqueue failure rollback (Sec M2 + Reviewer Majo const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); @@ -947,7 +980,7 @@ describe('POST /api/v1/jobs — enqueue failure rollback (Sec M2 + Reviewer Majo }); it('still returns 500 when releaseActiveJob also throws (fire-and-forget; no double error)', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); releaseActiveJob.mockImplementationOnce(async () => { @@ -960,7 +993,7 @@ describe('POST /api/v1/jobs — enqueue failure rollback (Sec M2 + Reviewer Majo const fd = buildFormData({ modelBuffer: Buffer.from('m'), fields: happyFields() }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); @@ -1066,7 +1099,7 @@ describe('POST /api/v1/jobs — mount-time STORAGE_BACKEND check (Sec M5)', () = }); const { port } = server.address(); try { - // T6 起 GET 已實作 — 沒帶 token 應回 401(代表 GET 已掛 requireAuth) + // T6 起 GET 已實作 — 沒帶 token 應回 401(代表 GET 已掛 requireApiKey) // 這比 501 更精確證明 GET 路由被 mount + 認證 middleware 已套用 const getRes = await fetch(`http://127.0.0.1:${port}/api/v1/jobs`); expect(getRes.status).toBe(401); @@ -1090,7 +1123,7 @@ describe('POST /api/v1/jobs — mount-time STORAGE_BACKEND check (Sec M5)', () = describe('POST /api/v1/jobs — filename sanitization (end-to-end)', () => { it('sanitizes malicious model filename to safe object key', async () => { - const ctx = await startApp({ tokens: HAPPY_TOKENS }); + const ctx = await startApp(); try { claimActiveJob.mockResolvedValueOnce({ ok: true }); const fd = buildFormData({ @@ -1101,7 +1134,7 @@ describe('POST /api/v1/jobs — filename sanitization (end-to-end)', () => { }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { method: 'POST', - headers: { Authorization: 'Bearer good-write-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, body: fd, }); expect(res.status).toBe(201); @@ -1128,8 +1161,8 @@ describe('POST /api/v1/jobs — filename sanitization (end-to-end)', () => { describe('POST /api/v1/jobs — createApp wiring smoke test', () => { it('createApp(deps, opts.config) wires v1 POST handler with auth', async () => { // 此測試使用真正的 createApp 路徑,驗證 app.js 把 v1Deps 透傳給 v1 router OK。 - // createApp 走的是真實 requireAuth(不帶 verify),所以只驗到「無 token → 401」 - // 即可——這就足以證明 wiring 正確(要 wire 錯就會 404 或 501)。 + // createApp 走的是真實 requireApiKey(從 config.converter.apiKey lazy-load), + // 所以只驗到「無 token → 401」即可——這就足以證明 wiring 正確(要 wire 錯就會 404 或 501)。 claimActiveJob.mockResolvedValue({ ok: true }); const redis = makeFakeRedis(); @@ -1158,7 +1191,7 @@ describe('POST /api/v1/jobs — createApp wiring smoke test', () => { }); const { port } = server.address(); try { - // 沒帶 token → 應走 requireAuth → 401 + // 沒帶 token → 應走 requireApiKey → 401 const res = await fetch(`http://127.0.0.1:${port}/api/v1/jobs`, { method: 'POST', body: new FormData(), diff --git a/apps/task-scheduler/src/routes/v1/__tests__/getJobs.integration.test.js b/apps/task-scheduler/src/routes/v1/__tests__/getJobs.integration.test.js index ff25038..7eda741 100644 --- a/apps/task-scheduler/src/routes/v1/__tests__/getJobs.integration.test.js +++ b/apps/task-scheduler/src/routes/v1/__tests__/getJobs.integration.test.js @@ -1,13 +1,14 @@ /** - * GET /api/v1/jobs/:id + GET /api/v1/jobs 整合測試(T6)。 + * GET /api/v1/jobs/:id + GET /api/v1/jobs 整合測試(T6;Phase 0.8b A4 改 API key、A6 清乾淨)。 * * 測試範圍: - * - 401 invalid_token:缺 Authorization - * - 403 insufficient_scope:token 缺 converter:job.read + * - 401 invalid_token:缺 Authorization / wrong API key / 同長度但不符 + * - 503 service_unavailable:CONVERTER_API_KEY env 未設定 * - GET /:id: * - 404 job_not_found:不存在 - * - 404 job_not_found:跨 client(不洩漏存在性) - * - 200 happy path:完整 record + 對外狀態映射 + * - 404 job_not_found:跨 client(不洩漏存在性,client_id 在 API key 路線下寫死, + * 此 case 用 mock 直接 seed 不同 created_by_client_id 驗證隔離邏輯仍 intact) + * - 200 happy path:完整 record + 對外狀態映射 + audit log fields * - ETag header 出現 * - 304 Not Modified:If-None-Match 命中 * - 200 + 新 ETag:If-None-Match 不命中 @@ -18,7 +19,6 @@ * - 200 happy path:列表、依 client 過濾 * - status filter(in_progress / completed / failed / all) * - limit / cursor 分頁 - * - 跨 client 隔離(同 user_id 不會看到別 client 的 job) * - limit > 50 → 400 */ @@ -28,7 +28,10 @@ const express = require('express'); const { createSseService } = require('../../../services/sseService'); const { createJobService } = require('../../../services/jobService'); -const { requireAuth } = require('../../../auth/middleware'); +const { requireApiKey } = require('../../../auth/apiKeyMiddleware'); + +// 給 startApp 注入的固定 API key +const TEST_API_KEY = 'integration-test-api-key-12345678901234567890123456789012'; // Mock luaScripts to avoid real Redis Lua loading jest.mock('../../../redis/luaScripts', () => ({ @@ -41,45 +44,10 @@ jest.mock('../../../redis/luaScripts', () => ({ }, })); -const FAKE_CONFIG = Object.freeze({ - memberCenter: { - issuer: 'https://auth.test.local', - jwksUrl: 'https://auth.test.local/.well-known/jwks', - tokenUrl: '', - }, - converter: { - audience: 'kneron_converter_api', - clientId: '', - clientSecret: '', - tenantId: '', - scopeWrite: 'converter:job.write', - scopeRead: 'converter:job.read', - }, - fileAccessAgent: { baseUrl: '', audience: 'file_access_api' }, - jwks: { cacheMaxAgeMs: 60000, cooldownMs: 30000, clockToleranceSec: 60 }, -}); - // --------------------------------------------------------------------------- // Helpers // --------------------------------------------------------------------------- -function makeVerifier({ tokens }) { - return async (token) => { - const entry = tokens[token]; - if (!entry) { - const err = new Error('invalid token'); - err.code = 'ERR_JWS_SIGNATURE_VERIFICATION_FAILED'; - throw err; - } - if (entry.expired) { - const err = new Error('expired'); - err.code = 'ERR_JWT_EXPIRED'; - throw err; - } - return { payload: entry.claims }; - }; -} - function makeFakeRedis() { const store = new Map(); const sets = new Map(); @@ -145,8 +113,16 @@ function makeFakeMinio() { /** * 啟動 GET 端點的 app。 + * + * @param {object} [opts] + * @param {object} [opts.rateLimit] + * @param {string} [opts.expectedApiKey] - middleware 預期的 API key;預設 TEST_API_KEY; + * 傳空字串可測 503 fail-secure path */ -async function startApp({ tokens, rateLimit = { windowMs: 60000, max: 1000 } }) { +async function startApp({ + rateLimit = { windowMs: 60000, max: 1000 }, + expectedApiKey = TEST_API_KEY, +} = {}) { const redis = makeFakeRedis(); const minio = makeFakeMinio(); const sseService = createSseService(); @@ -172,13 +148,9 @@ async function startApp({ tokens, rateLimit = { windowMs: 60000, max: 1000 } }) app.use(morgan('short')); app.use(express.json({ limit: '10mb' })); - // v1 router with verify mock injected into requireAuth + // v1 router with API key injection(Phase 0.8b A4) const v1 = express.Router(); - const verify = makeVerifier({ tokens }); - const requireReadAuth = requireAuth(FAKE_CONFIG.converter.scopeRead, { - config: FAKE_CONFIG, - verify, - }); + const requireReadAuth = requireApiKey({ expectedApiKey }); const perClientLimiter = createPerClientRateLimiter(rateLimit); const getJobHandler = jobsInternals.buildGetJobHandler({ jobService }); @@ -205,34 +177,6 @@ async function startApp({ tokens, rateLimit = { windowMs: 60000, max: 1000 } }) }); } -const HAPPY_TOKENS = { - 'good-read-token': { - claims: { - sub: 'visionA-backend', - client_id: 'cid-A', - scope: 'converter:job.read converter:job.write', - }, - }, - 'good-read-token-B': { - claims: { - sub: 'visionA-backend', - client_id: 'cid-B', - scope: 'converter:job.read', - }, - }, - 'write-only-token': { - claims: { - sub: 'someone', - client_id: 'cid-A', - scope: 'converter:job.write', // 缺 read - }, - }, - 'expired-token': { - expired: true, - claims: {}, - }, -}; - beforeAll(() => { jest.spyOn(console, 'log').mockImplementation(() => {}); jest.spyOn(console, 'warn').mockImplementation(() => {}); @@ -243,13 +187,13 @@ afterAll(() => { }); // --------------------------------------------------------------------------- -// Auth 共用測試(GET /jobs 與 GET /jobs/:id 同樣 require read scope) +// Auth 共用測試(GET /jobs 與 GET /jobs/:id 都掛 requireApiKey) // --------------------------------------------------------------------------- describe('GET /api/v1/jobs* — auth', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: HAPPY_TOKENS }); + ctx = await startApp(); }); afterEach(async () => { await ctx.close(); @@ -268,31 +212,74 @@ describe('GET /api/v1/jobs* — auth', () => { expect(res.status).toBe(401); }); - it('GET /:id returns 401 token_expired with expired token', async () => { + it('GET /:id returns 401 invalid_token with wrong API key (different length)', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/some-id`, { - headers: { Authorization: 'Bearer expired-token' }, + headers: { Authorization: `Bearer wrong-api-key` }, }); expect(res.status).toBe(401); - expect((await res.json()).error.code).toBe('token_expired'); + expect((await res.json()).error.code).toBe('invalid_token'); }); - it('GET /jobs returns 403 with write-only token (insufficient_scope)', async () => { - const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1`, { - headers: { Authorization: 'Bearer write-only-token' }, - }); - expect(res.status).toBe(403); - const body = await res.json(); - expect(body.error.code).toBe('insufficient_scope'); - expect(body.error.details).toMatchObject({ - required_scope: 'converter:job.read', + // A6 新增:同長度但不匹配的 key + it('GET /:id returns 401 invalid_token with same-length wrong API key', async () => { + const sameLenWrongKey = 'X'.repeat(TEST_API_KEY.length); + const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/some-id`, { + headers: { Authorization: `Bearer ${sameLenWrongKey}` }, }); + expect(res.status).toBe(401); + expect((await res.json()).error.code).toBe('invalid_token'); }); - it('GET /:id returns 403 with write-only token', async () => { - const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/x`, { - headers: { Authorization: 'Bearer write-only-token' }, + // A6 新增:fail-secure path + it('GET /:id returns 503 service_unavailable when CONVERTER_API_KEY not configured', async () => { + const blankCtx = await startApp({ expectedApiKey: '' }); + try { + // 即使帶了 TEST_API_KEY 也應該被擋(fail-secure) + const res = await fetch(`${blankCtx.baseUrl}/api/v1/jobs/some-id`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + expect(res.status).toBe(503); + const body = await res.json(); + expect(body.error.code).toBe('service_unavailable'); + expect(body.error.message).toBe('API key not configured'); + } finally { + await blankCtx.close(); + } + }); + + it('GET /jobs returns 503 service_unavailable when CONVERTER_API_KEY not configured', async () => { + const blankCtx = await startApp({ expectedApiKey: '' }); + try { + const res = await fetch(`${blankCtx.baseUrl}/api/v1/jobs?user_id=u1`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + expect(res.status).toBe(503); + expect((await res.json()).error.code).toBe('service_unavailable'); + } finally { + await blankCtx.close(); + } + }); + + // A7 新增:authenticated GET 寫 audit log(含 source_ip + fingerprint + GET method) + it('writes auth.api_key.authenticated audit log on GET with method=GET in log', async () => { + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + + // GET /jobs?user_id=u1 — 即使 jobService 無資料、middleware 仍會通過 + 寫 audit log + await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=visionA-user-12345`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); - expect(res.status).toBe(403); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.authenticated')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.action).toBe('auth.api_key.authenticated'); + expect(parsed.http_method).toBe('GET'); + expect(typeof parsed.source_ip).toBe('string'); + expect(typeof parsed.token_fingerprint).toBe('string'); + expect(parsed.token_fingerprint.length).toBe(12); + expect(auditLine).not.toContain(TEST_API_KEY); }); }); @@ -303,7 +290,7 @@ describe('GET /api/v1/jobs* — auth', () => { describe('GET /api/v1/jobs/:id', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: HAPPY_TOKENS }); + ctx = await startApp(); }); afterEach(async () => { await ctx.close(); @@ -313,7 +300,7 @@ describe('GET /api/v1/jobs/:id', () => { const job = { job_id: jobId, user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', stage: 'bie', progress: 50, @@ -352,7 +339,7 @@ describe('GET /api/v1/jobs/:id', () => { it('returns 404 job_not_found when job does not exist', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/nonexistent`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(404); const body = await res.json(); @@ -361,9 +348,9 @@ describe('GET /api/v1/jobs/:id', () => { }); it('returns 404 (not 403) when job belongs to different client (no info leak)', async () => { - seedJob('foreign-job', { created_by_client_id: 'cid-B' }); + seedJob('foreign-job', { created_by_client_id: 'cid-B-foreign' }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/foreign-job`, { - headers: { Authorization: 'Bearer good-read-token' }, // cid-A + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, // cid-A }); expect(res.status).toBe(404); const body = await res.json(); @@ -374,7 +361,7 @@ describe('GET /api/v1/jobs/:id', () => { it('returns 200 with full job shape for owner', async () => { seedJob('my-job'); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/my-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(200); const body = await res.json(); @@ -401,7 +388,7 @@ describe('GET /api/v1/jobs/:id', () => { it('strips internal field created_by_client_id from response', async () => { seedJob('my-job'); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/my-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body).not.toHaveProperty('created_by_client_id'); @@ -414,7 +401,7 @@ describe('GET /api/v1/jobs/:id', () => { stage_timings: { onnx: null, bie: null, nef: null }, }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/newly-created`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body.status).toBe('created'); @@ -433,7 +420,7 @@ describe('GET /api/v1/jobs/:id', () => { }, }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/done-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body.status).toBe('completed'); @@ -457,7 +444,7 @@ describe('GET /api/v1/jobs/:id', () => { }, }); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/failed-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body.status).toBe('failed'); @@ -471,7 +458,7 @@ describe('GET /api/v1/jobs/:id', () => { it('returns ETag header on 200 response', async () => { seedJob('etag-job'); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/etag-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(200); const etag = res.headers.get('etag'); @@ -481,14 +468,14 @@ describe('GET /api/v1/jobs/:id', () => { it('returns 304 Not Modified when If-None-Match matches', async () => { seedJob('etag-match-job'); const first = await fetch(`${ctx.baseUrl}/api/v1/jobs/etag-match-job`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const etag = first.headers.get('etag'); expect(etag).toBeTruthy(); const second = await fetch(`${ctx.baseUrl}/api/v1/jobs/etag-match-job`, { headers: { - Authorization: 'Bearer good-read-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'If-None-Match': etag, }, }); @@ -502,7 +489,7 @@ describe('GET /api/v1/jobs/:id', () => { seedJob('etag-mismatch-job'); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/etag-mismatch-job`, { headers: { - Authorization: 'Bearer good-read-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'If-None-Match': 'W/"stale"', }, }); @@ -516,7 +503,7 @@ describe('GET /api/v1/jobs/:id', () => { seedJob('star-job'); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/star-job`, { headers: { - Authorization: 'Bearer good-read-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'If-None-Match': '*', }, }); @@ -531,7 +518,7 @@ describe('GET /api/v1/jobs/:id', () => { describe('GET /api/v1/jobs (list)', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: HAPPY_TOKENS }); + ctx = await startApp(); }); afterEach(async () => { await ctx.close(); @@ -549,7 +536,7 @@ describe('GET /api/v1/jobs (list)', () => { it('returns 400 validation_error when user_id missing', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); const body = await res.json(); @@ -561,7 +548,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=${encodeURIComponent('')}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); expect(res.status).toBe(400); @@ -574,7 +561,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=${encodeURIComponent('../etc/passwd')}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); expect(res.status).toBe(400); @@ -582,7 +569,7 @@ describe('GET /api/v1/jobs (list)', () => { it('returns 400 when user_id contains wildcard (*)', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=${encodeURIComponent('*')}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); }); @@ -591,7 +578,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=${encodeURIComponent('u1:malicious')}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); expect(res.status).toBe(400); @@ -600,14 +587,14 @@ describe('GET /api/v1/jobs (list)', () => { it('returns 400 when user_id is too long', async () => { const long = 'a'.repeat(129); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=${long}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); }); it('returns empty list when user has no jobs', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u-empty`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(200); const body = await res.json(); @@ -619,7 +606,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'created-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'ONNX', stage: 'onnx', progress: 0, @@ -630,7 +617,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'running-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', stage: 'bie', progress: 50, @@ -640,7 +627,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'completed-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'COMPLETED', progress: 100, created_at: '2026-04-25T10:00:00Z', @@ -648,7 +635,7 @@ describe('GET /api/v1/jobs (list)', () => { }, ]); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(200); const body = await res.json(); @@ -661,7 +648,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'completed-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'COMPLETED', progress: 100, created_at: '2026-04-25T10:00:00Z', @@ -670,7 +657,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'running-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', progress: 50, created_at: '2026-04-25T11:00:00Z', @@ -680,7 +667,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=completed`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); const body = await res.json(); @@ -694,7 +681,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'completed-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'COMPLETED', created_at: '2026-04-25T10:00:00Z', updated_at: '2026-04-25T10:00:00Z', @@ -702,7 +689,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'running-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', created_at: '2026-04-25T11:00:00Z', updated_at: '2026-04-25T11:00:00Z', @@ -710,7 +697,7 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'failed-1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'FAILED', error: { stage: 'bie' }, created_at: '2026-04-25T09:00:00Z', @@ -718,7 +705,7 @@ describe('GET /api/v1/jobs (list)', () => { }, ]); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body.total).toBe(3); @@ -728,7 +715,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=invalid_status`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); expect(res.status).toBe(400); @@ -736,35 +723,57 @@ describe('GET /api/v1/jobs (list)', () => { expect(body.error.details.fields.map((f) => f.field)).toContain('status'); }); - it('CRITICAL: cross-client isolation — same user_id different client gets nothing', async () => { - // user u1 在 cid-B 有 job,但 cid-A 不應該看到 + // A6(重寫):cross-client isolation — Production code 的 client 隔離邏輯 + // (jobs.js listJobsByUser 內 created_by_client_id 比對)必須保留 intact。 + // + // Phase 0.8b A3 後 API key 路線下 caller clientId 寫死為 'visionA-service',但 + // Redis 內可能殘留來自舊系統 / 其他系統 / 未來其他 caller 的 job records(不同 + // created_by_client_id)。此 case 直接用 Redis seed 模擬「Redis 內混有 foreign + // job」、驗證 listJobsByUser 仍正確過濾出非屬本 caller 的 job。 + // + // 重點:這是 production code 防禦深度(defense in depth)的驗證,不是 OAuth 多 + // client 場景的驗證。隔離邏輯刪除會導致跨 client 資料洩漏。 + it('cross-client isolation: foreign-client jobs in Redis are filtered out', async () => { seedJobs('u1', [ { - job_id: 'B-job-1', + job_id: 'mine-1', user_id: 'u1', - created_by_client_id: 'cid-B', // 屬 cid-B + created_by_client_id: 'visionA-service', // 屬本 caller + status: 'BIE', + created_at: '2026-04-25T12:00:00Z', + updated_at: '2026-04-25T12:00:00Z', + }, + { + job_id: 'foreign-1', + user_id: 'u1', + created_by_client_id: 'some-other-system', // 不屬本 caller(Redis 殘留) status: 'BIE', created_at: '2026-04-25T11:00:00Z', updated_at: '2026-04-25T11:00:00Z', }, + { + job_id: 'foreign-legacy', + user_id: 'u1', + created_by_client_id: null, // legacy job(沒寫 client_id) + status: 'BIE', + created_at: '2026-04-25T10:00:00Z', + updated_at: '2026-04-25T10:00:00Z', + }, ]); - const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1`, { - headers: { Authorization: 'Bearer good-read-token' }, // cid-A + const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(200); const body = await res.json(); - expect(body.total).toBe(0); - expect(body.jobs).toEqual([]); - // 換成 cid-B 的 token 應能看到 - const resB = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1`, { - headers: { Authorization: 'Bearer good-read-token-B' }, - }); - expect(resB.status).toBe(200); - const bodyB = await resB.json(); - expect(bodyB.total).toBe(1); - expect(bodyB.jobs[0].job_id).toBe('B-job-1'); + // 只應該看到屬於本 caller(visionA-service)的 job + const jobIds = body.jobs.map((j) => j.job_id); + expect(jobIds).toContain('mine-1'); + expect(jobIds).not.toContain('foreign-1'); + // legacy job(created_by_client_id=null)的處理由 production code 決定(目前 + // 實作為「無 client_id 視為不屬於任何 caller、隱藏」),驗證之 + expect(jobIds).not.toContain('foreign-legacy'); }); it('strips internal field created_by_client_id from list items', async () => { @@ -772,14 +781,14 @@ describe('GET /api/v1/jobs (list)', () => { { job_id: 'j1', user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', created_at: '2026-04-25T12:00:00Z', updated_at: '2026-04-25T12:00:00Z', }, ]); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); const body = await res.json(); expect(body.jobs[0]).not.toHaveProperty('created_by_client_id'); @@ -791,7 +800,7 @@ describe('GET /api/v1/jobs (list)', () => { jobs.push({ job_id: `j${i}`, user_id: 'u1', - created_by_client_id: 'cid-A', + created_by_client_id: 'visionA-service', status: 'BIE', // 排序後最新到最舊:j5 j4 j3 j2 j1 created_at: `2026-04-25T${10 + i}:00:00Z`, @@ -803,7 +812,7 @@ describe('GET /api/v1/jobs (list)', () => { const page1 = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all&limit=2`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); const p1Body = await page1.json(); @@ -814,7 +823,7 @@ describe('GET /api/v1/jobs (list)', () => { const page2 = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all&limit=2&cursor=${encodeURIComponent(p1Body.next_cursor)}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); const p2Body = await page2.json(); @@ -824,7 +833,7 @@ describe('GET /api/v1/jobs (list)', () => { const page3 = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&status=all&limit=2&cursor=${encodeURIComponent(p2Body.next_cursor)}`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); const p3Body = await page3.json(); @@ -834,7 +843,7 @@ describe('GET /api/v1/jobs (list)', () => { it('returns 400 when limit > 50', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&limit=51`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); const body = await res.json(); @@ -843,14 +852,14 @@ describe('GET /api/v1/jobs (list)', () => { it('returns 400 when limit is non-integer', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&limit=abc`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); }); it('returns 400 when limit is 0', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1&limit=0`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.status).toBe(400); }); @@ -859,7 +868,7 @@ describe('GET /api/v1/jobs (list)', () => { const res = await fetch( `${ctx.baseUrl}/api/v1/jobs?user_id=u1&cursor=not-valid-base64-!!!`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, } ); expect(res.status).toBe(400); @@ -869,7 +878,7 @@ describe('GET /api/v1/jobs (list)', () => { it('returns response with X-Request-Id header', async () => { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs?user_id=u1`, { - headers: { Authorization: 'Bearer good-read-token' }, + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, }); expect(res.headers.get('x-request-id')).toBeTruthy(); }); diff --git a/apps/task-scheduler/src/routes/v1/__tests__/promote.integration.test.js b/apps/task-scheduler/src/routes/v1/__tests__/promote.integration.test.js index 9fd0159..69a8318 100644 --- a/apps/task-scheduler/src/routes/v1/__tests__/promote.integration.test.js +++ b/apps/task-scheduler/src/routes/v1/__tests__/promote.integration.test.js @@ -1,24 +1,21 @@ /** - * POST /api/v1/jobs/:id/promote 整合測試(T7)。 + * POST /api/v1/jobs/:id/promote 整合測試(T7;Phase 0.8b A4 改 API key、A6 清乾淨)。 * - * 測試範圍(對齊 tasks-phase1.md §3.7 驗收): - * - 401 invalid_token:缺 Authorization - * - 403 insufficient_scope:token 缺 converter:job.write + * 測試範圍: + * - 401 invalid_token:缺 Authorization / wrong API key / 同長度但不符 + * - 503 service_unavailable:CONVERTER_API_KEY env 未設定 * - 404 job_not_found:job 不存在 - * - 404 job_not_found:job 存在但屬於別的 client(不洩露存在性) + * - 404 job_not_found:job 存在但 created_by_client_id 不符(深度防禦驗證) * - 400 validation_error:targets 缺漏 / source 非合法 / 重複 source * - 422 invalid_object_key:含 .. / 反斜線 / 控制字元 / 開頭斜線 * - 409 job_not_ready_for_promote:status !== 'COMPLETED' * - 409 source_not_available:job 沒產該 stage 結果 * - 200 happy path:completed job + 所有 targets 上傳成功 + 寫回 promoted: true * - 200 idempotent:第二次 promote 同 job → 不重打 FAA、回既有 promoted_object_keys - * - 502 file_gateway_unavailable:FAA 5xx 重試完仍失敗 - * - 502 file_gateway_unavailable:FAA 4xx(非 401) + * - 502 file_gateway_unavailable:FAA 5xx 重試完仍失敗 / FAA 4xx(非 401) * - 503 auth_service_unavailable:FAA 401 重試後仍 401 * - SECURITY:log 不含 FAA token;錯誤 message 不洩漏 FAA 內部 - * - Stream:不 buffer 整個檔案(用 stream.Readable 模擬大檔,驗證 minio.getObjectStream 被呼叫) - * - * 啟動方式:用 createApp + 注入 mock deps(包含 verify 函數注入)+ mock faaClient + mock minio。 + * - Stream:不 buffer 整個檔案 */ 'use strict'; @@ -28,7 +25,10 @@ const { Readable } = require('stream'); const { createSseService } = require('../../../services/sseService'); const { createJobService } = require('../../../services/jobService'); -const { requireAuth } = require('../../../auth/middleware'); +const { requireApiKey } = require('../../../auth/apiKeyMiddleware'); + +// 給 startApp 注入的固定 API key +const TEST_API_KEY = 'integration-test-api-key-12345678901234567890123456789012'; // Mock luaScripts(同 createJob.integration 模式) jest.mock('../../../redis/luaScripts', () => ({ @@ -41,45 +41,10 @@ jest.mock('../../../redis/luaScripts', () => ({ }, })); -const FAKE_CONFIG = Object.freeze({ - memberCenter: { - issuer: 'https://auth.test.local', - jwksUrl: 'https://auth.test.local/.well-known/jwks', - tokenUrl: '', - }, - converter: { - audience: 'kneron_converter_api', - clientId: '', - clientSecret: '', - tenantId: '', - scopeWrite: 'converter:job.write', - scopeRead: 'converter:job.read', - }, - fileAccessAgent: { baseUrl: 'https://files.test.local', audience: 'file_access_api' }, - jwks: { cacheMaxAgeMs: 60000, cooldownMs: 30000, clockToleranceSec: 60 }, -}); - // --------------------------------------------------------------------------- // helpers // --------------------------------------------------------------------------- -function makeVerifier({ tokens }) { - return async (token) => { - const entry = tokens[token]; - if (!entry) { - const err = new Error('invalid token'); - err.code = 'ERR_JWS_SIGNATURE_VERIFICATION_FAILED'; - throw err; - } - if (entry.expired) { - const err = new Error('expired'); - err.code = 'ERR_JWT_EXPIRED'; - throw err; - } - return { payload: entry.claims }; - }; -} - function makeFakeRedis() { const store = new Map(); return { @@ -157,15 +122,23 @@ function makeFakeFaaClient(opts = {}) { } /** - * 直接組 app(避免 createApp 的 v1Deps 注入鏈在 verify mock 時太間接)。 + * 直接組 app(避免 createApp 的 v1Deps 注入鏈在 mock 時太間接)。 + * + * @param {object} [opts] + * @param {object} [opts.redis] + * @param {object} [opts.minio] + * @param {object} [opts.faaClient] + * @param {object} [opts.rateLimit] + * @param {string} [opts.expectedApiKey] - middleware 預期的 API key;預設 TEST_API_KEY; + * 傳空字串可測 503 fail-secure path */ async function startApp({ - tokens, redis, minio, faaClient, rateLimit = { windowMs: 60000, max: 1000 }, -}) { + expectedApiKey = TEST_API_KEY, +} = {}) { redis = redis || makeFakeRedis(); minio = minio || makeFakeMinio(); const sseService = createSseService(); @@ -194,13 +167,9 @@ async function startApp({ app.use(express.json({ limit: '10mb' })); app.use(express.urlencoded({ extended: true, limit: '10mb' })); - // v1 router with verify injection + // v1 router with API key injection(Phase 0.8b A4) const v1 = express.Router(); - const verify = makeVerifier({ tokens }); - const requireWriteAuth = requireAuth(FAKE_CONFIG.converter.scopeWrite, { - config: FAKE_CONFIG, - verify, - }); + const requireWriteAuth = requireApiKey({ expectedApiKey }); const perClientLimiter = createPerClientRateLimiter(rateLimit); const handler = promoteModule._internals.buildPromoteHandler({ jobService, @@ -233,30 +202,6 @@ async function startApp({ }); } -const TOKENS = { - 'good-write-token': { - claims: { - sub: 'kneron_converter_client', - client_id: 'visionA-client-A', - scope: 'converter:job.write converter:job.read', - }, - }, - 'good-write-token-other-client': { - claims: { - sub: 'other', - client_id: 'visionA-client-B', // 別的 client - scope: 'converter:job.write', - }, - }, - 'read-only-token': { - claims: { - sub: 'reader', - client_id: 'visionA-client-A', - scope: 'converter:job.read', - }, - }, -}; - /** * 建立一個已 completed 的 mock job record。 */ @@ -265,7 +210,9 @@ function makeCompletedJob(overrides = {}) { return { job_id: jobId, user_id: 'u1', - created_by_client_id: 'visionA-client-A', + // Phase 0.8b A4:apiKeyMiddleware 寫死 clientId='visionA-service'、所以 job 的 + // created_by_client_id 也必須是它、否則 promote.js 的 client isolation 會回 404。 + created_by_client_id: 'visionA-service', origin: 'api', status: 'COMPLETED', stage: null, @@ -311,7 +258,7 @@ afterAll(() => { describe('POST /api/v1/jobs/:id/promote — auth', () => { it('returns 401 invalid_token without Authorization', async () => { - const ctx = await startApp({ tokens: TOKENS, faaClient: makeFakeFaaClient() }); + const ctx = await startApp({ faaClient: makeFakeFaaClient() }); try { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/abc/promote`, { method: 'POST', @@ -326,21 +273,77 @@ describe('POST /api/v1/jobs/:id/promote — auth', () => { } }); - it('returns 403 insufficient_scope with read-only token', async () => { - const ctx = await startApp({ tokens: TOKENS, faaClient: makeFakeFaaClient() }); + // A6 新增:同長度但不匹配的 key + it('returns 401 invalid_token with same-length wrong API key', async () => { + const ctx = await startApp({ faaClient: makeFakeFaaClient() }); try { + const sameLenWrongKey = 'X'.repeat(TEST_API_KEY.length); const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/abc/promote`, { method: 'POST', headers: { - Authorization: 'Bearer read-only-token', + Authorization: `Bearer ${sameLenWrongKey}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ targets: [] }), }); - expect(res.status).toBe(403); + expect(res.status).toBe(401); + expect((await res.json()).error.code).toBe('invalid_token'); + } finally { + await ctx.close(); + } + }); + + // A6 新增:fail-secure path + it('returns 503 service_unavailable when CONVERTER_API_KEY env not configured', async () => { + const ctx = await startApp({ + faaClient: makeFakeFaaClient(), + expectedApiKey: '', + }); + try { + const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/abc/promote`, { + method: 'POST', + headers: { + Authorization: `Bearer ${TEST_API_KEY}`, + 'Content-Type': 'application/json', + }, + body: JSON.stringify({ targets: [] }), + }); + expect(res.status).toBe(503); const body = await res.json(); - expect(body.error.code).toBe('insufficient_scope'); - expect(body.error.details.required_scope).toBe('converter:job.write'); + expect(body.error.code).toBe('service_unavailable'); + expect(body.error.message).toBe('API key not configured'); + } finally { + await ctx.close(); + } + }); + + // A7 新增:authenticated promote 寫 audit log(含 source_ip + fingerprint + POST) + it('writes auth.api_key.authenticated audit log on POST promote', async () => { + const ctx = await startApp({ faaClient: makeFakeFaaClient() }); + try { + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + + await fetch(`${ctx.baseUrl}/api/v1/jobs/some-id/promote`, { + method: 'POST', + headers: { + Authorization: `Bearer ${TEST_API_KEY}`, + 'Content-Type': 'application/json', + }, + body: JSON.stringify({ targets: [{ source: 'nef', target_object_key: 'foo/bar.nef' }] }), + }); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.authenticated')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.action).toBe('auth.api_key.authenticated'); + expect(parsed.http_method).toBe('POST'); + expect(typeof parsed.source_ip).toBe('string'); + expect(typeof parsed.token_fingerprint).toBe('string'); + expect(parsed.token_fingerprint.length).toBe(12); + // 絕不能含 TEST_API_KEY 本身 + expect(auditLine).not.toContain(TEST_API_KEY); } finally { await ctx.close(); } @@ -353,12 +356,12 @@ describe('POST /api/v1/jobs/:id/promote — auth', () => { describe('POST /api/v1/jobs/:id/promote — 404 / client isolation', () => { it('returns 404 job_not_found when job does not exist', async () => { - const ctx = await startApp({ tokens: TOKENS, faaClient: makeFakeFaaClient() }); + const ctx = await startApp({ faaClient: makeFakeFaaClient() }); try { const res = await fetch(`${ctx.baseUrl}/api/v1/jobs/nonexistent/promote`, { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -373,22 +376,32 @@ describe('POST /api/v1/jobs/:id/promote — 404 / client isolation', () => { } }); - it('returns 404 job_not_found when job belongs to different client (no leak)', async () => { + // A6(重寫):cross-client isolation — Production code 防禦深度驗證。 + // + // Phase 0.8b A3 後 API key 路線下 caller clientId 寫死為 'visionA-service',但 + // Redis 內可能殘留來自舊系統 / 其他系統 / 未來其他 caller 的 job records(不同 + // created_by_client_id)。此 case 直接用 Redis seed 模擬「Redis 內有 foreign job」、 + // 驗證 promote handler 內 client 隔離邏輯仍正確 reject(promote.js L355)。 + // + // 重點:production code 防禦深度(defense in depth)驗證,不是 OAuth 多 client + // 場景的驗證。隔離邏輯刪除會導致跨 client 操作別人 job 的風險。 + it('cross-client isolation: foreign-client job returns 404 (no info leak)', async () => { const faa = makeFakeFaaClient(); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { - const job = makeCompletedJob({ - job_id: 'job-foreign', - created_by_client_id: 'visionA-client-B', // 不屬於 client-A + // seed 一個 created_by_client_id !== 'visionA-service' 的 completed job + const foreignJob = makeCompletedJob({ + job_id: 'foreign-job-001', + created_by_client_id: 'some-other-system', }); - ctx.redis.store.set('job:job-foreign', JSON.stringify(job)); + ctx.redis.store.set('job:foreign-job-001', JSON.stringify(foreignJob)); const res = await fetch( - `${ctx.baseUrl}/api/v1/jobs/job-foreign/promote`, + `${ctx.baseUrl}/api/v1/jobs/foreign-job-001/promote`, { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', // client-A token + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -396,10 +409,11 @@ describe('POST /api/v1/jobs/:id/promote — 404 / client isolation', () => { }), } ); + // 不可洩漏存在性:與「job 真的不存在」回應完全相同 expect(res.status).toBe(404); const body = await res.json(); expect(body.error.code).toBe('job_not_found'); - // 不該打 FAA + // FAA 完全不該被打到(隔離邏輯在 FAA call 之前) expect(faa.putFile).not.toHaveBeenCalled(); } finally { await ctx.close(); @@ -414,7 +428,7 @@ describe('POST /api/v1/jobs/:id/promote — 404 / client isolation', () => { describe('POST /api/v1/jobs/:id/promote — validation', () => { let ctx; beforeEach(async () => { - ctx = await startApp({ tokens: TOKENS, faaClient: makeFakeFaaClient() }); + ctx = await startApp({ faaClient: makeFakeFaaClient() }); const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); }); @@ -428,7 +442,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({}), @@ -445,7 +459,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ targets: [] }), @@ -461,7 +475,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -479,7 +493,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -498,7 +512,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -516,7 +530,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -533,7 +547,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -556,7 +570,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -577,7 +591,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -597,7 +611,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -620,7 +634,7 @@ describe('POST /api/v1/jobs/:id/promote — validation', () => { describe('POST /api/v1/jobs/:id/promote — state checks', () => { it('returns 409 job_not_ready_for_promote when status is not COMPLETED', async () => { const faa = makeFakeFaaClient(); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob({ status: 'BIE', stage: 'bie', progress: 50 }); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -630,7 +644,7 @@ describe('POST /api/v1/jobs/:id/promote — state checks', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -651,7 +665,7 @@ describe('POST /api/v1/jobs/:id/promote — state checks', () => { it('returns 409 source_not_available when job has no output for source', async () => { const faa = makeFakeFaaClient(); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob({ // 故意只留 onnx,沒 bie / nef @@ -664,7 +678,7 @@ describe('POST /api/v1/jobs/:id/promote — state checks', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -694,7 +708,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { { ok: true, result: { etag: 'faa-etag-nef', sizeBytes: 1048576 } }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -704,7 +718,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -759,7 +773,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { { ok: true, result: { etag: 'etag-nef', sizeBytes: 200 } }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -769,7 +783,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -799,7 +813,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { const faa = makeFakeFaaClient({ outcomes: [{ ok: true, result: { etag: 'e', sizeBytes: 1 } }], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -807,7 +821,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { await fetch(`${ctx.baseUrl}/api/v1/jobs/job-completed-001/promote`, { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -826,7 +840,7 @@ describe('POST /api/v1/jobs/:id/promote — 200 happy path', () => { describe('POST /api/v1/jobs/:id/promote — idempotency', () => { it('returns 200 + existing promoted_object_keys without re-calling FAA', async () => { const faa = makeFakeFaaClient(); // 不應該被呼叫 - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const previouslyPromoted = [ { @@ -849,7 +863,7 @@ describe('POST /api/v1/jobs/:id/promote — idempotency', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -897,7 +911,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -907,7 +921,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -934,7 +948,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { const faa = makeFakeFaaClient({ outcomes: [{ ok: false, error: new FAATimeoutError('PUT timeout 300000ms') }], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -944,7 +958,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -971,7 +985,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -981,7 +995,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -1009,7 +1023,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -1019,7 +1033,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -1041,7 +1055,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { minio.headObject = jest.fn(async () => { throw new Error('minio is down'); }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa, minio }); + const ctx = await startApp({ faaClient: faa, minio }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -1051,7 +1065,7 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -1079,6 +1093,16 @@ describe('POST /api/v1/jobs/:id/promote — FAA failures', () => { // --------------------------------------------------------------------------- describe('POST /api/v1/jobs/:id/promote — createApp wiring smoke test', () => { + // A6:FAKE_CONFIG inline 化(之前散落在頂部,已隨清 OAuth dead code 移除; + // 此 wiring smoke test 仍需要 config 觸發 createApp 內 promote 路徑) + const WIRING_CONFIG = Object.freeze({ + converter: { apiKey: TEST_API_KEY }, + fileAccessAgent: { + baseUrl: 'https://files.test.local', + audience: 'file_access_api', + }, + }); + it('createApp wires faaClient when opts.config has FAA baseUrl (no token → 401, not 501)', async () => { const { createApp } = require('../../../app'); const { createUploader } = require('../../../middleware/upload'); @@ -1098,7 +1122,7 @@ describe('POST /api/v1/jobs/:id/promote — createApp wiring smoke test', () => { redis, jobService, sseService, minio, uploader }, { frontendUrl: 'http://localhost:3000', - config: FAKE_CONFIG, // 含 fileAccessAgent.baseUrl,會觸發 lazy build faaClient + config: WIRING_CONFIG, // 含 fileAccessAgent.baseUrl,會觸發 lazy build faaClient rateLimit: { windowMs: 60000, max: 100 }, storageBackend: 'minio', } @@ -1109,7 +1133,7 @@ describe('POST /api/v1/jobs/:id/promote — createApp wiring smoke test', () => }); const { port } = server.address(); try { - // 沒帶 token → 應走 requireAuth → 401 + // 沒帶 token → 應走 requireApiKey → 401 // 若 promote 沒被 wire 起來,會回 501(fallback) const res = await fetch( `http://127.0.0.1:${port}/api/v1/jobs/abc/promote`, @@ -1186,7 +1210,7 @@ describe('POST /api/v1/jobs/:id/promote — SECURITY', () => { }, ], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -1196,7 +1220,7 @@ describe('POST /api/v1/jobs/:id/promote — SECURITY', () => { { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ @@ -1218,7 +1242,7 @@ describe('POST /api/v1/jobs/:id/promote — SECURITY', () => { const faa = makeFakeFaaClient({ outcomes: [{ ok: true, result: { etag: 'e', sizeBytes: 1 } }], }); - const ctx = await startApp({ tokens: TOKENS, faaClient: faa }); + const ctx = await startApp({ faaClient: faa }); try { const job = makeCompletedJob(); ctx.redis.store.set('job:job-completed-001', JSON.stringify(job)); @@ -1231,7 +1255,7 @@ describe('POST /api/v1/jobs/:id/promote — SECURITY', () => { await fetch(`${ctx.baseUrl}/api/v1/jobs/job-completed-001/promote`, { method: 'POST', headers: { - Authorization: 'Bearer good-write-token', + Authorization: `Bearer ${TEST_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ diff --git a/apps/task-scheduler/src/routes/v1/__tests__/result.integration.test.js b/apps/task-scheduler/src/routes/v1/__tests__/result.integration.test.js new file mode 100644 index 0000000..92de86a --- /dev/null +++ b/apps/task-scheduler/src/routes/v1/__tests__/result.integration.test.js @@ -0,0 +1,1060 @@ +/** + * GET /api/v1/jobs/:id/result 整合測試(Phase 0.8b Phase B)。 + * + * 對齊 docs/autoflow/04-architecture/api/api-result.md §14 IT-1 ~ IT-9 + * + AC-1 ~ AC-12 + R1-R12 Reviewer 驗證項。 + * + * 測試覆蓋: + * - IT-1 Happy path(200 + Accept-Ranges: none + Content-Disposition + audit) + * - IT-2 401 missing / wrong API key + * - IT-3 404 job_not_found + * - IT-4 404 result_not_found(completed 但無 NEF) + * - IT-5 409 job_not_completed + * - IT-6 410 result_expired(expires_at < now) + * - IT-7 410 result_expired(MinIO 回 null) + * - IT-8 429 burst rate limit hit + * - IT-9 429 sustained rate limit hit + * - IT-10 503 concurrent stream cap + * - IT-11 429 bandwidth quota hourly + * - IT-12 Range header 寫 audit log + 回 200 整段 + * - IT-13 filename assertion fallback(含 buildFilename 內部) + * - IT-14 502 storage_unavailable(MinIO throw) + * - IT-15 audit log forensic:cross-event 追蹤完整性 + * + * 為什麼直接組 app 而非用 createApp:避免 createApp 的 v1Deps 注入鏈在 mock 時太間接。 + * 與 promote.integration.test.js 使用同樣模式。 + */ + +'use strict'; + +const express = require('express'); +const http = require('http'); +const { Readable } = require('stream'); + +const { requireApiKey } = require('../../../auth/apiKeyMiddleware'); + +const TEST_API_KEY = + 'integration-test-api-key-12345678901234567890123456789012'; + +// --------------------------------------------------------------------------- +// helpers +// --------------------------------------------------------------------------- + +function makeFakeJobService(jobs = {}) { + const store = new Map(Object.entries(jobs)); + return { + store, + getJob: jest.fn(async (jobId) => store.get(jobId) || null), + }; +} + +function makeFakeMinioStorage({ failOn = null, returnNullOn = null } = {}) { + return { + getObjectStream: jest.fn(async (key) => { + if (failOn === key) { + const err = new Error('mock minio failure'); + err.name = 'NetworkingError'; + throw err; + } + if (returnNullOn === key) { + return null; + } + const body = Buffer.from(`nef-content-for-${key}`); + return { + stream: Readable.from([body]), + contentLength: body.length, + contentType: 'application/octet-stream', + }; + }), + }; +} + +/** + * 直接組 app(避免 createApp 的 v1Deps 注入鏈在 mock 時太間接、可注入 audit log spy)。 + * + * @param {object} opts + * @param {object} opts.jobService + * @param {object} opts.minioStorage + * @param {object} [opts.bandwidthQuota] - 可注入測試用 quota(含特定 limit / pre-loaded state) + * @param {{ burst?: { max?: number, windowMs?: number }, + * sustained?: { max?: number, windowMs?: number } }} [opts.rateLimit] + * @param {{ maxConcurrent?: number, retryAfterSeconds?: number }} [opts.concurrency] + * @param {(fields: object) => void} [opts.onLog] - 注入 audit log spy + * @param {string} [opts.expectedApiKey=TEST_API_KEY] + */ +async function startApp(opts = {}) { + const { + jobService, + minioStorage, + bandwidthQuota, + rateLimit = {}, + concurrency = { maxConcurrent: 10, retryAfterSeconds: 30 }, + onLog, + expectedApiKey = TEST_API_KEY, + } = opts; + + const { + createResultRouter, + } = require('../result'); + const { + createPerClientRateLimiter, + } = require('../../../middleware/perClientRateLimit'); + const { + createResultStreamConcurrencyLimiter, + } = require('../../../middleware/resultStreamConcurrency'); + const { + createResultBandwidthQuota, + } = require('../../../middleware/resultBandwidthQuota'); + const { errorHandler } = require('../../../middleware/errorHandler'); + const { + requestIdMiddleware, + } = require('../../../middleware/requestId'); + + const tokenFingerprintKeyGen = (req) => + req && req.auth && typeof req.auth.tokenFingerprint === 'string' + ? `tf:${req.auth.tokenFingerprint}` + : `ip:${req && req.ip ? req.ip : 'unknown'}`; + + const burstLimiter = createPerClientRateLimiter({ + windowMs: (rateLimit.burst && rateLimit.burst.windowMs) || 10 * 1000, + max: (rateLimit.burst && rateLimit.burst.max) || 5, + keyGenerator: tokenFingerprintKeyGen, + errorDetails: { limit_type: 'burst' }, + onLimitExceeded: + typeof onLog === 'function' + ? (req, retryAfterSec) => + onLog({ + level: 'WARN', + action: 'result.rate_limited', + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: 'GET', + http_path: req.originalUrl || req.url || '', + job_id: req.params && req.params.id ? req.params.id : null, + duration_ms: 0, + limit_type: 'burst', + retry_after_seconds: retryAfterSec, + }) + : undefined, + }); + const sustainedLimiter = createPerClientRateLimiter({ + windowMs: + (rateLimit.sustained && rateLimit.sustained.windowMs) || 60 * 1000, + max: (rateLimit.sustained && rateLimit.sustained.max) || 20, + keyGenerator: tokenFingerprintKeyGen, + errorDetails: { limit_type: 'sustained' }, + onLimitExceeded: + typeof onLog === 'function' + ? (req, retryAfterSec) => + onLog({ + level: 'WARN', + action: 'result.rate_limited', + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: 'GET', + http_path: req.originalUrl || req.url || '', + job_id: req.params && req.params.id ? req.params.id : null, + duration_ms: 0, + limit_type: 'sustained', + retry_after_seconds: retryAfterSec, + }) + : undefined, + }); + + const quota = + bandwidthQuota || + createResultBandwidthQuota({ + hourlyLimitBytes: 1024 * 1024 * 1024, // 1 GB default + dailyLimitBytes: 6 * 1024 * 1024 * 1024, + // 對齊 spec §9.5:bucket key 用 token_fingerprint(不是 clientId) + keyGenerator: (req) => + req && req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : 'unknown', + onLog, + }); + + const concurrencyLimiter = createResultStreamConcurrencyLimiter({ + maxConcurrent: concurrency.maxConcurrent, + retryAfterSeconds: concurrency.retryAfterSeconds, + onLog, + }); + + const resultRouter = createResultRouter({ + jobService, + minioStorage, + bandwidthQuota: quota, + onLog, + }); + + const app = express(); + app.use(requestIdMiddleware); + app.use(express.json()); + + // 注:本測試直接掛 router、不過 createV1Router(避免 mock 依賴鏈太深) + app.use( + '/api/v1/jobs/:id/result', + requireApiKey({ expectedApiKey }), + burstLimiter, + sustainedLimiter, + quota.middleware, + concurrencyLimiter.middleware, + resultRouter + ); + app.use('/api/v1', errorHandler); + + return new Promise((resolve) => { + const server = app.listen(0, '127.0.0.1', () => { + const { port } = server.address(); + resolve({ + server, + baseUrl: `http://127.0.0.1:${port}`, + jobService, + minioStorage, + bandwidthQuota: quota, + concurrencyLimiter, + close: () => new Promise((r) => server.close(r)), + }); + }); + }); +} + +/** + * 簡易 HTTP client(避免 node-fetch 依賴)。 + * + * @param {string} url + * @param {object} [opts] + * @returns {Promise<{ status: number, headers: object, body: Buffer, bodyText: string }>} + */ +function httpGet(url, opts = {}) { + return new Promise((resolve, reject) => { + const headers = opts.headers || {}; + const req = http.get(url, { headers }, (res) => { + const chunks = []; + res.on('data', (c) => chunks.push(c)); + res.on('end', () => { + const body = Buffer.concat(chunks); + resolve({ + status: res.statusCode, + headers: res.headers, + body, + bodyText: body.toString('utf8'), + }); + }); + }); + req.on('error', reject); + req.end(); + }); +} + +function buildJob(overrides = {}) { + return { + job_id: 'job-xyz-123', + status: 'COMPLETED', + stage: 'nef', + progress: 100, + created_at: '2026-05-10T00:00:00Z', + updated_at: '2026-05-10T00:10:00Z', + expires_at: '2099-05-17T00:00:00Z', + source_filename: 'yolov5s.onnx', + platform: '720', + result_object_keys: { nef: 'jobs/job-xyz-123/output/result.nef' }, + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// tests +// --------------------------------------------------------------------------- + +describe('GET /api/v1/jobs/:id/result — integration', () => { + let ctx; + afterEach(async () => { + if (ctx && ctx.close) await ctx.close(); + ctx = null; + }); + + // ------------------------------------------------------------------------- + // IT-1: Happy path + // ------------------------------------------------------------------------- + describe('IT-1 Happy path', () => { + test('returns 200 + correct headers + streams NEF binary + audit log', async () => { + const job = buildJob(); + const auditLogs = []; + const onLog = (fields) => auditLogs.push(fields); + + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog, + }); + + const res = await httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + + expect(res.status).toBe(200); + expect(res.headers['content-type']).toBe('application/octet-stream'); + expect(res.headers['accept-ranges']).toBe('none'); + expect(res.headers['content-disposition']).toMatch( + /attachment; filename="yolov5s_720\.nef"; filename\*=UTF-8''yolov5s_720\.nef/ + ); + expect(res.body.toString('utf8')).toBe( + 'nef-content-for-jobs/job-xyz-123/output/result.nef' + ); + + // Audit log:必有 result.streamed event(含 A.7 五欄 + /result 四欄) + const streamed = auditLogs.find( + (l) => l.action === 'result.streamed' + ); + expect(streamed).toBeDefined(); + expect(streamed.level).toBe('INFO'); + expect(streamed.job_id).toBe('job-xyz-123'); + expect(streamed.source_ip).toBeTruthy(); + expect(streamed.token_fingerprint).toBeTruthy(); + expect(streamed.request_id).toBeTruthy(); + expect(streamed.http_method).toBe('GET'); + expect(streamed.http_path).toMatch(/\/api\/v1\/jobs\/job-xyz-123\/result/); + expect(streamed.size_bytes).toBeGreaterThan(0); + expect(streamed.stream_completed).toBe(true); + expect(typeof streamed.duration_ms).toBe('number'); + }); + + test('uses舊格式 output.nef_path when result_object_keys 缺漏(向後相容)', async () => { + const job = buildJob({ + result_object_keys: null, + output: { nef_path: 'legacy/jobs/old/result.nef' }, + }); + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + }); + const res = await httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + expect(res.status).toBe(200); + expect(res.body.toString('utf8')).toBe( + 'nef-content-for-legacy/jobs/old/result.nef' + ); + }); + + test('既有 job 無 source_filename / platform 時 fallback 到 job_.nef', async () => { + const job = buildJob({ + source_filename: null, + platform: null, + }); + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + }); + const res = await httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + expect(res.status).toBe(200); + expect(res.headers['content-disposition']).toMatch( + /filename="job_job-xyz-123\.nef"/ + ); + }); + }); + + // ------------------------------------------------------------------------- + // IT-2: 401 missing / wrong API key + // ------------------------------------------------------------------------- + describe('IT-2 Authentication failures', () => { + test('returns 401 when Authorization header missing', async () => { + ctx = await startApp({ + jobService: makeFakeJobService({}), + minioStorage: makeFakeMinioStorage(), + }); + const res = await httpGet(`${ctx.baseUrl}/api/v1/jobs/abc/result`); + expect(res.status).toBe(401); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('invalid_token'); + }); + + test('returns 401 when API key wrong', async () => { + ctx = await startApp({ + jobService: makeFakeJobService({}), + minioStorage: makeFakeMinioStorage(), + }); + const res = await httpGet(`${ctx.baseUrl}/api/v1/jobs/abc/result`, { + headers: { Authorization: 'Bearer wrong-key-' + 'x'.repeat(50) }, + }); + expect(res.status).toBe(401); + }); + }); + + // ------------------------------------------------------------------------- + // IT-3: 404 job_not_found + // ------------------------------------------------------------------------- + test('IT-3 returns 404 job_not_found when jobID does not exist', async () => { + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({}), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/nonexistent/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(404); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('job_not_found'); + expect(parsed.error.request_id).toBeTruthy(); + const notFoundLog = auditLogs.find((l) => l.action === 'result.not_found'); + expect(notFoundLog).toBeDefined(); + expect(notFoundLog.reason).toBe('job_not_found'); + expect(notFoundLog.level).toBe('WARN'); + }); + + // ------------------------------------------------------------------------- + // IT-4: 404 result_not_found + // ------------------------------------------------------------------------- + test('IT-4 returns 404 result_not_found when completed but no NEF key', async () => { + const job = buildJob({ + result_object_keys: null, + output: null, + }); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(404); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('result_not_found'); + const notFoundLog = auditLogs.find((l) => l.action === 'result.not_found'); + expect(notFoundLog).toBeDefined(); + expect(notFoundLog.reason).toBe('no_nef_key'); + }); + + // ------------------------------------------------------------------------- + // IT-5: 409 job_not_completed + // ------------------------------------------------------------------------- + test('IT-5 returns 409 when job is still running', async () => { + const job = buildJob({ status: 'ONNX' }); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(409); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('job_not_completed'); + expect(parsed.error.details.current_status).toBe('ONNX'); + const log = auditLogs.find((l) => l.action === 'result.not_completed'); + expect(log).toBeDefined(); + expect(log.current_status).toBe('ONNX'); + }); + + // ------------------------------------------------------------------------- + // IT-6: 410 result_expired (expires_at in past) + // ------------------------------------------------------------------------- + test('IT-6 returns 410 when expires_at is in the past', async () => { + const job = buildJob({ expires_at: '2020-01-01T00:00:00Z' }); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(410); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('result_expired'); + const log = auditLogs.find((l) => l.action === 'result.expired'); + expect(log).toBeDefined(); + expect(log.expires_at).toBe('2020-01-01T00:00:00Z'); + expect(typeof log.expired_by_ms).toBe('number'); + }); + + // ------------------------------------------------------------------------- + // IT-7: 410 result_expired (MinIO returns null) + // ------------------------------------------------------------------------- + test('IT-7 returns 410 when MinIO object missing (likely lifecycle清掉)', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage({ + returnNullOn: 'jobs/job-xyz-123/output/result.nef', + }), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(410); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('result_expired'); + const log = auditLogs.find( + (l) => l.action === 'result.expired' && l.reason === 'minio_object_missing' + ); + expect(log).toBeDefined(); + }); + + // ------------------------------------------------------------------------- + // IT-8: 429 burst rate limit + // ------------------------------------------------------------------------- + test('IT-8 returns 429 + limit_type: burst after exceeding burst window', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + rateLimit: { + burst: { max: 3, windowMs: 10 * 1000 }, + sustained: { max: 1000, windowMs: 60 * 1000 }, + }, + onLog: (f) => auditLogs.push(f), + }); + // 5 個 req(前 3 個 200、後 2 個 429) + const results = []; + for (let i = 0; i < 5; i += 1) { + // eslint-disable-next-line no-await-in-loop + results.push( + await httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }) + ); + } + const limited = results.filter((r) => r.status === 429); + expect(limited.length).toBeGreaterThan(0); + const parsed = JSON.parse(limited[0].bodyText); + expect(parsed.error.code).toBe('rate_limit_exceeded'); + expect(parsed.error.details.limit_type).toBe('burst'); + expect(limited[0].headers['retry-after']).toBeDefined(); + + // Audit log + const log = auditLogs.find( + (l) => l.action === 'result.rate_limited' && l.limit_type === 'burst' + ); + expect(log).toBeDefined(); + }); + + // ------------------------------------------------------------------------- + // IT-9: 429 sustained rate limit + // ------------------------------------------------------------------------- + test('IT-9 returns 429 + limit_type: sustained after exceeding sustained window', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + rateLimit: { + burst: { max: 1000, windowMs: 10 * 1000 }, + sustained: { max: 3, windowMs: 60 * 1000 }, + }, + onLog: (f) => auditLogs.push(f), + }); + const results = []; + for (let i = 0; i < 5; i += 1) { + // eslint-disable-next-line no-await-in-loop + results.push( + await httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }) + ); + } + const limited = results.filter((r) => r.status === 429); + expect(limited.length).toBeGreaterThan(0); + const parsed = JSON.parse(limited[0].bodyText); + expect(parsed.error.code).toBe('rate_limit_exceeded'); + expect(parsed.error.details.limit_type).toBe('sustained'); + + const log = auditLogs.find( + (l) => l.action === 'result.rate_limited' && l.limit_type === 'sustained' + ); + expect(log).toBeDefined(); + }); + + // ------------------------------------------------------------------------- + // IT-10: 503 concurrent stream cap + // ------------------------------------------------------------------------- + test('IT-10 returns 503 service_busy when concurrent cap exceeded', async () => { + // Slow MinIO stream(hold slot 直到 test 完成)+ maxConcurrent = 1 + const releaseStream = []; + const slowMinio = { + getObjectStream: jest.fn(async () => { + // 建一個 paused stream,並把 push controller 暴露給 test + const stream = new Readable({ read() {} }); + releaseStream.push(stream); + return { + stream, + contentLength: 100, + contentType: 'application/octet-stream', + }; + }), + }; + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': buildJob() }), + minioStorage: slowMinio, + concurrency: { maxConcurrent: 1, retryAfterSeconds: 30 }, + // 高 rate limit 避免 burst 先觸發 + rateLimit: { + burst: { max: 1000, windowMs: 10 * 1000 }, + sustained: { max: 1000, windowMs: 60 * 1000 }, + }, + onLog: (f) => auditLogs.push(f), + }); + + // 先發 1 個 slow request(不等回應、霸佔 slot) + const slowReq = httpGet(`${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, { + headers: { Authorization: `Bearer ${TEST_API_KEY}` }, + }); + // 給 slow request 一點時間 acquire slot + await new Promise((r) => setTimeout(r, 50)); + + // 再發 1 個應該被 503 + const second = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + + expect(second.status).toBe(503); + const parsed = JSON.parse(second.bodyText); + expect(parsed.error.code).toBe('service_busy'); + expect(parsed.error.details.limit_type).toBe('concurrent'); + expect(second.headers['retry-after']).toBe('30'); + + const log = auditLogs.find( + (l) => l.action === 'result.rate_limited' && l.limit_type === 'concurrent' + ); + expect(log).toBeDefined(); + + // 釋放 slow stream(end + 結束 slowReq) + releaseStream.forEach((s) => { + s.push(Buffer.from('x'.repeat(100))); + s.push(null); + }); + await slowReq; + }, 10000); + + // ------------------------------------------------------------------------- + // IT-11: bandwidth quota hourly hit + // ------------------------------------------------------------------------- + test('IT-11 returns 429 bandwidth_quota_exceeded when hourly quota hit', async () => { + const { + createResultBandwidthQuota, + } = require('../../../middleware/resultBandwidthQuota'); + const auditLogs = []; + const quota = createResultBandwidthQuota({ + hourlyLimitBytes: 100, // 100 bytes 上限 + dailyLimitBytes: 1000, + onLog: (f) => auditLogs.push(f), + }); + // 預先 consume 滿 quota(簡化測試 setup) + quota.consume('test-fingerprint', 100); + // 但 fingerprint 要對應實際 token;先看 fingerprint 是哪個 + const { _internals } = require('../../../auth/apiKeyMiddleware'); + const realFingerprint = _internals.tokenFingerprint(TEST_API_KEY); + quota.consume(realFingerprint, 100); // 補對的 fingerprint + + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': buildJob() }), + minioStorage: makeFakeMinioStorage(), + bandwidthQuota: quota, + onLog: (f) => auditLogs.push(f), + }); + + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(429); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('bandwidth_quota_exceeded'); + expect(parsed.error.details.limit_type).toBe('bandwidth_hourly'); + expect(res.headers['retry-after']).toBeDefined(); + + const log = auditLogs.find( + (l) => + l.action === 'result.bandwidth_quota_exceeded' && + l.limit_type === 'bandwidth_hourly' + ); + expect(log).toBeDefined(); + expect(typeof log.bytes_used_in_window).toBe('number'); + expect(typeof log.retry_after_seconds).toBe('number'); + }); + + // ------------------------------------------------------------------------- + // IT-12: Range header silently ignored + audit log + // ------------------------------------------------------------------------- + test('IT-12 Range header silently ignored, returns 200 + Accept-Ranges:none + audit', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { + headers: { + Authorization: `Bearer ${TEST_API_KEY}`, + Range: 'bytes=0-7', + }, + } + ); + // 200(不是 206、不是 416) + expect(res.status).toBe(200); + expect(res.headers['accept-ranges']).toBe('none'); + // 完整 body(不是 partial) + expect(res.body.toString('utf8')).toBe( + 'nef-content-for-jobs/job-xyz-123/output/result.nef' + ); + + // Audit log:result.range_attempted + const log = auditLogs.find((l) => l.action === 'result.range_attempted'); + expect(log).toBeDefined(); + expect(log.level).toBe('INFO'); + expect(log.range_header_received).toBe('bytes=0-7'); + }); + + // ------------------------------------------------------------------------- + // IT-6: stream timeout(spec §14.7 IT-6、§15.2 — 5 min 預設、env 覆寫) + // ------------------------------------------------------------------------- + // + // 為什麼用真實短 timeout(150 ms)而非 jest.useFakeTimers: + // - res.setTimeout / req.setTimeout 是 Node 內部 socket-level timer, + // 由 libuv 觸發、jest fake timers(lolex)不會劫持 socket timer、 + // 用 fake timer 模擬只會 hang。 + // - RESULT_STREAM_TIMEOUT_MS 已支援 env 覆寫(getStreamTimeoutMs lazy 讀)、 + // 測試把 timeout 縮到 150 ms 即可在 < 1s 內驗證、不會 flaky。 + // - paused Readable(從不 push end)模擬 slow client + slow MinIO、 + // 確保 5 min(縮至 150ms)後 res 被 destroy + audit log 寫。 + test('IT-6 stream timeout destroys response + writes result.stream_timeout audit', async () => { + const originalEnv = process.env.RESULT_STREAM_TIMEOUT_MS; + process.env.RESULT_STREAM_TIMEOUT_MS = '150'; + try { + // Slow MinIO stream:建一個 paused Readable、不 push 任何 chunk、 + // 不 push(null)、模擬「MinIO 開了 connection 但沒回資料」 + const slowMinio = { + getObjectStream: jest.fn(async () => ({ + stream: new Readable({ read() {} }), + contentLength: 100, + contentType: 'application/octet-stream', + })), + }; + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': buildJob() }), + minioStorage: slowMinio, + // 高 rate limit / concurrency 避免先觸發 + rateLimit: { + burst: { max: 1000, windowMs: 10 * 1000 }, + sustained: { max: 1000, windowMs: 60 * 1000 }, + }, + concurrency: { maxConcurrent: 10, retryAfterSeconds: 30 }, + onLog: (f) => auditLogs.push(f), + }); + + // Send request、expect connection 被 server destroy(res.destroy 後 + // client 看到 socket hang up / ECONNRESET) + let clientError = null; + let clientResponse = null; + try { + clientResponse = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + } catch (err) { + clientError = err; + } + + // Server 應在 ~150ms 後 timeout、destroy response、 + // client 端收到 socket hang up(或 headers 已送但 body 中斷) + // 兩種都是合法 timeout 表現(取決於 headers 是否在 timeout 前送出) + const timeoutSignalled = + clientError !== null || + (clientResponse !== null && + // headers 已送、body 中斷(content-length=100 但實際 0) + clientResponse.body.length < 100); + expect(timeoutSignalled).toBe(true); + + // 給 audit log 一點時間寫(res destroy 後 setImmediate 才 emit close) + await new Promise((r) => setTimeout(r, 50)); + + const timeoutLog = auditLogs.find( + (l) => l.action === 'result.stream_timeout' + ); + expect(timeoutLog).toBeDefined(); + expect(timeoutLog.level).toBe('WARN'); + expect(timeoutLog.stream_completed).toBe(false); + expect(timeoutLog.timeout_ms).toBe(150); + expect(typeof timeoutLog.bytes_streamed_at_timeout).toBe('number'); + // paused stream 從未 push 任何 chunk → bytes_streamed = 0 + expect(timeoutLog.bytes_streamed_at_timeout).toBe(0); + // A.7 五欄 + /result 四欄完整 + expect(timeoutLog.job_id).toBe('job-xyz-123'); + expect(timeoutLog.source_ip).toBeTruthy(); + expect(timeoutLog.token_fingerprint).toBeTruthy(); + expect(timeoutLog.request_id).toBeTruthy(); + expect(timeoutLog.http_method).toBe('GET'); + expect(timeoutLog.http_path).toMatch(/\/api\/v1\/jobs\/job-xyz-123\/result/); + expect(typeof timeoutLog.duration_ms).toBe('number'); + expect(timeoutLog.size_bytes).toBe(0); + + // 同一 request 不應同時有 result.streamed(mutual exclusive 終態) + const streamedLog = auditLogs.find((l) => l.action === 'result.streamed'); + expect(streamedLog).toBeUndefined(); + } finally { + if (originalEnv === undefined) { + delete process.env.RESULT_STREAM_TIMEOUT_MS; + } else { + process.env.RESULT_STREAM_TIMEOUT_MS = originalEnv; + } + } + }, 5000); + + // ------------------------------------------------------------------------- + // IT-13: filename assertion fallback + // ------------------------------------------------------------------------- + test('IT-13 buildFilename assertion fallback when sanitize bug introduces unsafe chars', () => { + const { + _internals, + } = require('../result'); + const auditLogs = []; + const onLog = (f) => auditLogs.push(f); + + // 模擬上游 sanitize bug:傳入含 `"` 的 stem + const result = _internals.buildFilename( + { + source_filename: 'evil".onnx', + platform: '720', + job_id: 'safe-uuid', + }, + onLog, + { source_ip: '1.1.1.1', job_id: 'safe-uuid' } + ); + // assertion 失敗 → fallback 到 job_.nef + expect(result).toBe('job_safe-uuid.nef'); + // 寫 audit log + const log = auditLogs.find( + (l) => l.action === 'result.filename_assertion_failed' + ); + expect(log).toBeDefined(); + expect(log.expected_pattern).toBe('^[A-Za-z0-9._-]+$'); + // .onnx 被 strip → stem = 'evil"',最終 candidate = 'evil"_720.nef' + expect(log.actual_filename).toBe('evil"_720.nef'); + }); + + test('IT-13 buildFilename happy path produces safe filename', () => { + const { _internals } = require('../result'); + const result = _internals.buildFilename({ + source_filename: 'model.onnx', + platform: '720', + job_id: 'safe-uuid', + }); + expect(result).toBe('model_720.nef'); + }); + + test('IT-13 buildFilename fallback when platform missing', () => { + const { _internals } = require('../result'); + const result = _internals.buildFilename({ + source_filename: 'model.onnx', + platform: null, + job_id: 'safe-uuid', + }); + expect(result).toBe('job_safe-uuid.nef'); + }); + + test('IT-13 buildContentDisposition includes both filename and filename*', () => { + const { _internals } = require('../result'); + const result = _internals.buildContentDisposition('model_720.nef'); + expect(result).toBe( + 'attachment; filename="model_720.nef"; filename*=UTF-8\'\'model_720.nef' + ); + }); + + test('IT-13 buildContentDisposition escapes backslash + quote', () => { + const { _internals } = require('../result'); + // 直接呼叫 buildContentDisposition(buildFilename assertion 會擋住 unsafe input、 + // 但 buildContentDisposition 自己仍有 defense-in-depth escape) + const result = _internals.buildContentDisposition('a"b\\c.nef'); + expect(result).toContain('filename="a\\"b\\\\c.nef"'); + }); + + // ------------------------------------------------------------------------- + // IT-14: 502 storage_unavailable + // ------------------------------------------------------------------------- + test('IT-14 returns 502 storage_unavailable when MinIO throws', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage({ + failOn: 'jobs/job-xyz-123/output/result.nef', + }), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(502); + const parsed = JSON.parse(res.bodyText); + expect(parsed.error.code).toBe('storage_unavailable'); + const log = auditLogs.find( + (l) => l.action === 'result.storage_unavailable' + ); + expect(log).toBeDefined(); + expect(log.error_name).toBe('NetworkingError'); + // 不洩漏 minio endpoint / object key + expect(log).not.toHaveProperty('endpoint'); + expect(log.error_message).toBeUndefined(); + }); + + // ------------------------------------------------------------------------- + // IT-15: audit log forensic cross-event + // ------------------------------------------------------------------------- + test('IT-15 audit log includes A.7 five fields + /result four fields', async () => { + const job = buildJob(); + const auditLogs = []; + ctx = await startApp({ + jobService: makeFakeJobService({ 'job-xyz-123': job }), + minioStorage: makeFakeMinioStorage(), + onLog: (f) => auditLogs.push(f), + }); + const res = await httpGet( + `${ctx.baseUrl}/api/v1/jobs/job-xyz-123/result`, + { headers: { Authorization: `Bearer ${TEST_API_KEY}` } } + ); + expect(res.status).toBe(200); + const streamed = auditLogs.find((l) => l.action === 'result.streamed'); + expect(streamed).toBeDefined(); + + // A.7 五欄 + expect(streamed.source_ip).toBeTruthy(); + expect(streamed.token_fingerprint).toBeTruthy(); + expect(streamed.request_id).toBeTruthy(); + expect(streamed.http_method).toBe('GET'); + expect(streamed.http_path).toBeTruthy(); + + // /result 四欄 + expect(streamed.job_id).toBe('job-xyz-123'); + expect(typeof streamed.size_bytes).toBe('number'); + expect(typeof streamed.duration_ms).toBe('number'); + expect(streamed.stream_completed).toBe(true); + }); +}); + +// --------------------------------------------------------------------------- +// extractNefObjectKey unit tests +// --------------------------------------------------------------------------- +describe('extractNefObjectKey unit', () => { + const { _internals } = require('../result'); + test('returns result_object_keys.nef when present', () => { + expect( + _internals.extractNefObjectKey({ + result_object_keys: { nef: 'new/path/result.nef' }, + }) + ).toBe('new/path/result.nef'); + }); + test('falls back to output.nef_path when result_object_keys missing', () => { + expect( + _internals.extractNefObjectKey({ + output: { nef_path: 'legacy/result.nef' }, + }) + ).toBe('legacy/result.nef'); + }); + test('returns null when both missing', () => { + expect(_internals.extractNefObjectKey({})).toBeNull(); + expect(_internals.extractNefObjectKey(null)).toBeNull(); + }); + test('returns null when keys exist but value is empty string', () => { + expect( + _internals.extractNefObjectKey({ result_object_keys: { nef: '' } }) + ).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// bandwidthQuota unit tests +// --------------------------------------------------------------------------- +describe('createResultBandwidthQuota unit', () => { + const { + createResultBandwidthQuota, + } = require('../../../middleware/resultBandwidthQuota'); + + test('consume accumulates bytes per key', () => { + const q = createResultBandwidthQuota({ + hourlyLimitBytes: 1000, + dailyLimitBytes: 10000, + }); + q.consume('fp-A', 300); + q.consume('fp-A', 200); + q.consume('fp-B', 500); + expect(q.getState('fp-A').hourlyBytes).toBe(500); + expect(q.getState('fp-A').dailyBytes).toBe(500); + expect(q.getState('fp-B').hourlyBytes).toBe(500); + }); + + test('consume ignores invalid bytes (negative / NaN / non-number)', () => { + const q = createResultBandwidthQuota({ + hourlyLimitBytes: 1000, + dailyLimitBytes: 10000, + }); + q.consume('fp-A', 100); + q.consume('fp-A', -50); // ignored + q.consume('fp-A', NaN); // ignored + q.consume('fp-A', 'abc'); // ignored + expect(q.getState('fp-A').hourlyBytes).toBe(100); + }); + + test('hourly window reset after HOUR_MS elapsed', () => { + const q = createResultBandwidthQuota({ + hourlyLimitBytes: 1000, + dailyLimitBytes: 10000, + }); + q.consume('fp-A', 500); + // Manually rewind reset time(測試友善:直接操控 internal state) + const state = q._internals.getOrCreate('fp-A'); + state.hourlyResetAt = Date.now() - 1; // 已過期 + q.consume('fp-A', 100); // 應觸發 reset + expect(q.getState('fp-A').hourlyBytes).toBe(100); + }); +}); + +// --------------------------------------------------------------------------- +// resultStreamConcurrency unit tests +// --------------------------------------------------------------------------- +describe('createResultStreamConcurrencyLimiter unit', () => { + const { + createResultStreamConcurrencyLimiter, + } = require('../../../middleware/resultStreamConcurrency'); + + test('getInFlight starts at 0 and getMax returns configured value', () => { + const lim = createResultStreamConcurrencyLimiter({ + maxConcurrent: 5, + retryAfterSeconds: 10, + }); + expect(lim.getInFlight()).toBe(0); + expect(lim.getMax()).toBe(5); + }); +}); diff --git a/apps/task-scheduler/src/routes/v1/__tests__/v1-routes.integration.test.js b/apps/task-scheduler/src/routes/v1/__tests__/v1-routes.integration.test.js index 0b420df..7bb2356 100644 --- a/apps/task-scheduler/src/routes/v1/__tests__/v1-routes.integration.test.js +++ b/apps/task-scheduler/src/routes/v1/__tests__/v1-routes.integration.test.js @@ -7,7 +7,7 @@ * 3. 外部送的合法 X-Request-Id 被沿用 * 4. 外部送的非法 X-Request-Id 被 ignore,server 自行產生 * 5. legacy 路由不受影響(仍然回原本的格式) - * 6. **D4 修復驗證**:requireAuth + requestId middleware 串接,401 response + * 6. **D4 修復驗證**:requireApiKey + requestId middleware 串接,401 response * 的 body 含真正的 UUID(不是 null) * * 啟動方式:用 createApp + 注入 mock deps,app.listen(0),用 fetch() 真打 HTTP。 @@ -23,7 +23,10 @@ const { createSseService } = require('../../../services/sseService'); const { createJobService } = require('../../../services/jobService'); const { createUploader } = require('../../../middleware/upload'); -const { requireAuth } = require('../../../auth/middleware'); +// Phase 0.8b A4:visionA → converter 改用 API key。 +// D4 修復 case 透過 requireApiKey({ expectedApiKey }) 驗 401 + request_id 串接行為 +// (與 auth provider 無關,純驗 middleware 鏈)。 +const { requireApiKey } = require('../../../auth/apiKeyMiddleware'); const { requestIdMiddleware } = require('../../../middleware/requestId'); const UUID_V4_REGEX = /^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i; @@ -415,11 +418,12 @@ describe('Phase 2 預留端點(Minor-2 修復):回 501 not_implemented', ( }); // --------------------------------------------------------------------------- -// D4 修復驗證:requireAuth + requestId 串接 +// D4 修復驗證:requireApiKey + requestId 串接 // --------------------------------------------------------------------------- -describe('D4 修復:requireAuth + requestId middleware 串接', () => { - // 此測試獨立於 v1 router 之外,直接組裝一個簡易 app 驗證串接行為 +describe('D4 修復:requireApiKey + requestId middleware 串接', () => { + // 驗 401 + request_id 串接行為(與 auth provider 無關,純驗 middleware 鏈)。 + // 此測試獨立於 v1 router 之外,直接組裝一個簡易 app 驗證串接行為。 let server; let baseUrl; @@ -428,31 +432,8 @@ describe('D4 修復:requireAuth + requestId middleware 串接', () => { app.use(requestIdMiddleware); app.get( '/protected', - requireAuth('converter:job.write', { - config: { - memberCenter: { - issuer: 'https://auth.test.local', - jwksUrl: 'https://auth.test.local/.well-known/jwks', - tokenUrl: '', - }, - converter: { - audience: 'kneron_converter_api', - clientId: '', - clientSecret: '', - tenantId: '', - scopeWrite: 'converter:job.write', - scopeRead: 'converter:job.read', - }, - fileAccessAgent: { baseUrl: '', audience: 'file_access_api' }, - jwks: { cacheMaxAgeMs: 60000, cooldownMs: 30000, clockToleranceSec: 60 }, - }, - // verify 函數一律 throw 模擬「token 無效」(此測試只關心 401 path 的 request_id) - verify: async () => { - const e = new Error('signature failed'); - e.code = 'ERR_JWS_SIGNATURE_VERIFICATION_FAILED'; - throw e; - }, - }), + // 注入一個固定的 expectedApiKey;test 送錯誤 key 觸發 401 + requireApiKey({ expectedApiKey: 'correct-api-key-for-d4-test' }), (_req, res) => res.status(200).json({ ok: true }) ); @@ -508,4 +489,22 @@ describe('D4 修復:requireAuth + requestId middleware 串接', () => { expect(body.error.request_id).not.toBeNull(); expect(body.error.request_id).toMatch(UUID_V4_REGEX); }); + + // A7 新增:401 路徑也寫 audit log(含 source_ip + request_id;missing path 無 fingerprint) + it('401 path writes auth.api_key.missing audit log with source_ip and request_id', async () => { + const logCalls = console.log.mock.calls; + const callCountBefore = logCalls.length; + + await fetch(`${baseUrl}/protected`); + + const newCalls = logCalls.slice(callCountBefore).flat().filter((x) => typeof x === 'string'); + const auditLine = newCalls.find((l) => l.includes('auth.api_key.missing')); + expect(auditLine).toBeDefined(); + const parsed = JSON.parse(auditLine); + expect(parsed.action).toBe('auth.api_key.missing'); + expect(typeof parsed.source_ip).toBe('string'); + expect(parsed.request_id).toMatch(UUID_V4_REGEX); + // missing path 沒 token、不該印 fingerprint + expect(parsed.token_fingerprint).toBeUndefined(); + }); }); diff --git a/apps/task-scheduler/src/routes/v1/index.js b/apps/task-scheduler/src/routes/v1/index.js index a76985b..3281409 100644 --- a/apps/task-scheduler/src/routes/v1/index.js +++ b/apps/task-scheduler/src/routes/v1/index.js @@ -21,8 +21,8 @@ * └── /jobs/:id/promote — POST(promote router,mergeParams 取 :id) * * 注意: - * T3 不掛 requireAuth;T5/T6/T7 實作各端點時,會在各自 handler 之前加。 - * per-client_id rate limiter(T3 計畫)也尚未掛 — 與 requireAuth 順序強相關, + * T3 不掛 requireApiKey;T5/T6/T7 實作各端點時,會在各自 handler 之前加。 + * per-client_id rate limiter(T3 計畫)也尚未掛 — 與 requireApiKey 順序強相關, * 留待 T5 起需要 clientId 時再加,避免提前耦合。 */ @@ -32,7 +32,18 @@ const express = require('express'); const { createJobsRouter } = require('./jobs'); const { createPromoteRouter } = require('./promote'); +const { createResultRouter } = require('./result'); const { errorHandler, ApiError } = require('../../middleware/errorHandler'); +const { requireApiKey } = require('../../auth/apiKeyMiddleware'); +const { + createPerClientRateLimiter, +} = require('../../middleware/perClientRateLimit'); +const { + createResultBandwidthQuota, +} = require('../../middleware/resultBandwidthQuota'); +const { + createResultStreamConcurrencyLimiter, +} = require('../../middleware/resultStreamConcurrency'); /** * 建立 /api/v1 router。 @@ -46,6 +57,18 @@ const { errorHandler, ApiError } = require('../../middleware/errorHandler'); * @param {string} [deps.storageBackend] — 'minio' / 'local',T5 handler 啟動時驗證 * @returns {import('express').Router} */ +/** + * 解析 env 整數(無效或缺漏 → null、讓 caller 走 default)。 + * + * @param {string|undefined} raw + * @returns {number|null} + */ +function parseEnvInt(raw) { + if (typeof raw !== 'string' || !/^\d+$/.test(raw)) return null; + const n = Number.parseInt(raw, 10); + return Number.isInteger(n) && n > 0 ? n : null; +} + function createV1Router(deps = {}) { const router = express.Router(); @@ -66,6 +89,110 @@ function createV1Router(deps = {}) { }); router.use('/jobs/:id/promote', promoteRouter); + // Phase 0.8b Phase B:GET /api/v1/jobs/:id/result — streaming proxy 給 visionA-backend + // + // Wire 順序(**勿改**、對齊 api-result.md §15): + // requireApiKey → resultBurstLimiter → resultSustainedLimiter → + // resultBandwidthQuota → resultStreamConcurrency → handler + // + // 原則: + // - auth 最前(避免未驗證流量耗 quota slot) + // - req-count 軸(burst + sustained)比 bandwidth 軸便宜、放前面 + // - concurrent cap 在 bandwidth 後:bandwidth 拒可避免錯誤 acquire concurrent slot + // + // 缺 deps 時、createResultRouter 內部會 fallback 到 501(不掛 middleware chain) + if ( + deps.jobService && + deps.minio && + typeof deps.minio.getObjectStream === 'function' + ) { + const tokenFingerprintKeyGen = (req) => + req && req.auth && typeof req.auth.tokenFingerprint === 'string' + ? `tf:${req.auth.tokenFingerprint}` + : `ip:${req && req.ip ? req.ip : 'unknown'}`; + + const bandwidthQuota = createResultBandwidthQuota({ + hourlyLimitBytes: parseEnvInt( + process.env.RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES + ), + dailyLimitBytes: parseEnvInt( + process.env.RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES + ), + keyGenerator: (req) => + req && req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : 'unknown', + }); + + const concurrencyLimiter = createResultStreamConcurrencyLimiter({ + maxConcurrent: parseEnvInt(process.env.MAX_CONCURRENT_RESULT_STREAMS), + retryAfterSeconds: 30, + }); + + // AC-9 audit hook:rate limit 命中時補寫 audit log + const buildRateLimitAuditHook = (limitType) => + function onLimitExceeded(req, retryAfterSec) { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + service: 'task-scheduler', + timestamp: new Date().toISOString(), + level: 'WARN', + action: 'result.rate_limited', + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: 'GET', + http_path: req.originalUrl || req.url || '', + job_id: req.params && req.params.id ? req.params.id : null, + duration_ms: 0, + limit_type: limitType, + retry_after_seconds: retryAfterSec, + }) + ); + }; + + const resultBurstLimiter = createPerClientRateLimiter({ + windowMs: parseEnvInt(process.env.RESULT_RATE_LIMIT_BURST_WINDOW_MS) || 10 * 1000, + max: parseEnvInt(process.env.RESULT_RATE_LIMIT_BURST_PER_10S) || 5, + keyGenerator: tokenFingerprintKeyGen, + errorDetails: { limit_type: 'burst' }, + onLimitExceeded: buildRateLimitAuditHook('burst'), + }); + const resultSustainedLimiter = createPerClientRateLimiter({ + windowMs: + parseEnvInt(process.env.RESULT_RATE_LIMIT_SUSTAINED_WINDOW_MS) || 60 * 1000, + max: + parseEnvInt(process.env.RESULT_RATE_LIMIT_SUSTAINED_PER_MIN) || 20, + keyGenerator: tokenFingerprintKeyGen, + errorDetails: { limit_type: 'sustained' }, + onLimitExceeded: buildRateLimitAuditHook('sustained'), + }); + + const resultRouter = createResultRouter({ + jobService: deps.jobService, + minioStorage: deps.minio, + bandwidthQuota, + }); + + router.use( + '/jobs/:id/result', + requireApiKey(), + resultBurstLimiter, + resultSustainedLimiter, + bandwidthQuota.middleware, + concurrencyLimiter.middleware, + resultRouter + ); + } else { + // 缺 deps fallback:用 createResultRouter 內部的 501 path + const resultRouterFallback = createResultRouter({}); + router.use('/jobs/:id/result', resultRouterFallback); + } + // /api/v1/jobs/* — POST / GET / GET :id const jobsRouter = createJobsRouter(deps); router.use('/jobs', jobsRouter); diff --git a/apps/task-scheduler/src/routes/v1/jobs.js b/apps/task-scheduler/src/routes/v1/jobs.js index 3f9f2ec..d86d683 100644 --- a/apps/task-scheduler/src/routes/v1/jobs.js +++ b/apps/task-scheduler/src/routes/v1/jobs.js @@ -13,17 +13,19 @@ * - handler 採薄層,只負責 HTTP I/O;business logic 全部委給 jobService / * validators / sanitize utils * - 中介層順序(**勿改**): - * POST: requireAuth(scope:write) → perClientRateLimiter → uploader.fields(...) → createJobHandler - * GET : requireAuth(scope:read) → perClientRateLimiter → getJob(byId|list)Handler + * POST: requireApiKey() → perClientRateLimiter → uploader.fields(...) → createJobHandler + * GET : requireApiKey() → perClientRateLimiter → getJob(byId|list)Handler * 依據: - * 1. requireAuth 在最前 → 未驗證流量被擋,不會吃 multer 大檔(M2) + * 1. requireApiKey 在最前 → 未驗證流量被擋,不會吃 multer 大檔(M2) + * (Phase 0.8b A3 起,取代 OAuth scope check;API key 即「caller 是 visionA」 + * 的完整證明,不再做 OAuth scope 區分) * 2. rate limiter 在 multer 之前 → 超 quota client 不會把 500MB 灌進來 * 3. multer 最後 → 在 auth + quota 雙重通過後才 parse multipart * - 寫入順序:先 MinIO 後 Lua(M5 方案 A) * - GET /:id 與 GET 列表共用 perClientRateLimiter(每個請求都計入該 client_id quota) * * 失敗情境對照(TDD §1.4.2 + §14): - * - 401/403 由 requireAuth 處理 + * - 401 / 503 由 requireApiKey 處理(Phase 0.8b A3:取代既有 OAuth 401/403) * - 429 由 perClientRateLimiter handler 處理(轉 ApiError) * - 413 由 multer LIMIT_FILE_SIZE 觸發(→ multerErrorAdapter 轉 v1 格式) * - 400 由 validator 統一回(含 details.field) @@ -46,7 +48,11 @@ const express = require('express'); const { v4: uuidv4 } = require('uuid'); const { ApiError } = require('../../middleware/errorHandler'); -const { requireAuth } = require('../../auth/middleware'); +// Phase 0.8b A3:visionA → converter 對外 API 改用 pre-shared API key(取代 OAuth JWT)。 +// 設計:docs/autoflow/04-architecture/auth.md §1 + TODO-visionA-integration-v2 §3.5 +// 既有 OAuth scope check / tenant check 已在純 API key 路線下移除;scope / tenantId +// 由 apiKeyMiddleware 寫死回給下游(兼容 logging / rate limiter 既有 read pattern)。 +const { requireApiKey } = require('../../auth/apiKeyMiddleware'); const { createPerClientRateLimiter } = require('../../middleware/perClientRateLimit'); const { validateCreateJobRequest } = require('./validators/createJob'); const { toExternalStatus } = require('../../services/statusMapper'); @@ -498,7 +504,8 @@ function buildListJobsHandler(deps) { ? req.auth.clientId : null; if (!clientId) { - // requireAuth 已保證有 clientId;保險起見 + // requireApiKey 已保證寫入 clientId(A3 後固定 'visionA-service',永不為 null); + // 此 fallback 為保險網,理論上不會觸發 return next(new ApiError(401, 'invalid_token', 'Token 無 client_id')); } @@ -730,6 +737,18 @@ function buildCreateJobHandler(deps) { created_by_client_id: req.auth && req.auth.clientId ? req.auth.clientId : null, + // Phase 0.8b Phase B 新增(AC-B1):給 GET /jobs/:id/result 構造 download filename 用 + // - source_filename:來源是已 sanitized 的 safeFilename(與 input.filename 同值; + // 冗餘但語意清楚,buildFilename 在 result handler 讀此欄位) + // - 為什麼不存原始 modelFile.originalname:originalname 可能含 XSS / 控制字元 / + // path traversal pattern,即使 Content-Disposition header 不會被 browser render, + // 仍可能在 log / error message 處被 echo;sanitized 版本是 defense-in-depth + // - platform:頂層平台欄位(複製自 parameters.platform),讓 buildFilename + // 不需深入 parameters 物件即可拼出 download filename + // - 既有 job(無此欄位)由 buildFilename fallback 到 job_.nef,無需 migration + source_filename: input.safeFilename, + platform: parameters && parameters.platform ? parameters.platform : null, + input: { filename: input.safeFilename, object_key: writeResult.inputObjectKey, @@ -951,7 +970,8 @@ function createJobsRouter(deps = {}) { ); }); } else { - const requireWriteAuth = requireAuth(config.converter.scopeWrite, { config }); + // A3:requireApiKey() 不需 scope 參數(API key 即「caller 是 visionA」的完整證明) + const requireWriteAuth = requireApiKey(); const perClientLimiter = createPerClientRateLimiter(rateLimit || {}); const handler = buildCreateJobHandler({ jobService, @@ -969,8 +989,8 @@ function createJobsRouter(deps = {}) { : 100; // T10 D5:concurrency limiter 必須掛在 multer **之前**(避免吃 500MB 才被擋) - // 但要在 requireAuth + rate limit 之後(避免 unauthorized 流量擠占 slot) - // 順序:requireAuth → rate limit → concurrency → multer → multerErrorAdapter → handler + // 但要在 requireApiKey + rate limit 之後(避免 unauthorized 流量擠占 slot) + // 順序:requireApiKey → rate limit → concurrency → multer → multerErrorAdapter → handler const middlewareChain = [requireWriteAuth, perClientLimiter]; if (uploadConcurrencyLimiter) { middlewareChain.push(uploadConcurrencyLimiter); @@ -1012,12 +1032,13 @@ function createJobsRouter(deps = {}) { // → fallback 到 501(與 POST 一致),這樣 v1-routes integration test 仍可在 // 沒有完整 deps 時用 fallback path 跑過。 if (jobService && config) { - const requireReadAuth = requireAuth(config.converter.scopeRead, { config }); + // A3:GET 端點同樣換 requireApiKey()(read/write 不再區分 scope) + const requireReadAuth = requireApiKey(); const perClientLimiterRead = createPerClientRateLimiter(rateLimit || {}); const getJobHandler = buildGetJobHandler({ jobService }); const listJobsHandler = buildListJobsHandler({ jobService }); - // 順序:requireAuth → rate limit → handler + // 順序:requireApiKey → rate limit → handler // 為什麼 GET 也走 perClientRateLimiter: // 防止單一 client 用 GET /jobs polling 拖慢 Redis(即使每次便宜,10000 req // 仍可造成 Redis 飽和)。300 req / 5min 對 polling 場景充裕(每秒 1 次)。 diff --git a/apps/task-scheduler/src/routes/v1/promote.js b/apps/task-scheduler/src/routes/v1/promote.js index b444115..3afa8a6 100644 --- a/apps/task-scheduler/src/routes/v1/promote.js +++ b/apps/task-scheduler/src/routes/v1/promote.js @@ -3,8 +3,9 @@ * * 流程(對齊 TDD §1.4.5、§2.10、§6.1-§6.5、tasks-phase1.md §2 T7): * - * 1. requireAuth('converter:job.write') - * ├── 401 invalid_token / 403 insufficient_scope(含主動 destroy 連線,T1 M2) + * 1. requireApiKey()(Phase 0.8b A3:取代 OAuth scope check) + * ├── 401 invalid_token(API key 缺漏 / 不匹配,含主動 destroy 連線) + * ├── 503 service_unavailable(env 未設 CONVERTER_API_KEY) * └── ok → 繼續 * * 2. validate body:targets 非空、source ∈ {onnx, bie, nef}、target_object_key 安全 @@ -37,7 +38,10 @@ * - **不洩露**:FAA 內部錯誤 message 不直接傳給 v1 client;統一轉成 502 / 503 + 文案。 * * 認證: - * T7 階段掛 `requireAuth('converter:job.write')`(與 POST /jobs 同 scope)。 + * Phase 0.8b A3 起掛 `requireApiKey()`(取代 OAuth scope check); + * visionA → converter 路線改用 pre-shared API key。 + * 注意:本 handler 內 `faaClient.putFile(...)`(converter → FAA)仍走 OAuth + * client_credentials(保留 oauthClient.js)— 兩條鏈條獨立、互不影響。 */ 'use strict'; @@ -45,7 +49,11 @@ const express = require('express'); const { ApiError } = require('../../middleware/errorHandler'); -const { requireAuth } = require('../../auth/middleware'); +// Phase 0.8b A3:visionA → converter 對外 API 改用 pre-shared API key。 +// 既有 `requireAuth('converter:job.write')` scope check 在純 API key 路線下移除。 +// 注意:本 router 內 `converter → FAA` 的 OAuth client_credentials 鏈條(faaClient +// 取 token 打 PUT)保留不動 — 那條與 visionA → converter 的認證無關。 +const { requireApiKey } = require('../../auth/apiKeyMiddleware'); const { createPerClientRateLimiter } = require('../../middleware/perClientRateLimit'); const { createFaaClient } = require('../../fileAccessAgent/client'); const { @@ -593,11 +601,12 @@ function createPromoteRouter(deps = {}) { return router; } - const requireWriteAuth = requireAuth(config.converter.scopeWrite, { config }); + // A3:requireApiKey() — 不再用 scope 區分讀寫權限(API key 是完整證明) + const requireWriteAuth = requireApiKey(); const perClientLimiter = createPerClientRateLimiter(rateLimit || {}); const handler = buildPromoteHandler({ jobService, minio, faaClient }); - // 順序鎖死:requireAuth → perClientRateLimit → JSON 已由 app.use(express.json) 全域 parse → handler + // 順序鎖死:requireApiKey → perClientRateLimit → JSON 已由 app.use(express.json) 全域 parse → handler router.post('/', requireWriteAuth, perClientLimiter, handler); return router; diff --git a/apps/task-scheduler/src/routes/v1/result.js b/apps/task-scheduler/src/routes/v1/result.js new file mode 100644 index 0000000..f80d4f4 --- /dev/null +++ b/apps/task-scheduler/src/routes/v1/result.js @@ -0,0 +1,606 @@ +/** + * /api/v1/jobs/:id/result — Phase 0.8b Phase B 新增(取代 Phase 2 delegated download token)。 + * + * 用途:visionA-backend 用此 endpoint 從 Converter Bucket 直接 streaming NEF + * 結果檔(streaming proxy)。取代「visionA → 拿 delegated download token → FAA」 + * 路徑(該路徑因 MC 沒實作 endpoint 而從未跑通)。 + * + * 設計(source of truth: docs/autoflow/04-architecture/api/api-result.md, + * Phase B 設計強化 2026-05-17 後): + * - AC-1:requireApiKey 套用 + * - AC-2:two-tier rate limit(burst 5/10s + sustained 20/min、bucket key = token_fingerprint) + * - AC-3:bandwidth quota(1 GB/hr + 6 GB/24hr、per token_fingerprint) + * - AC-4:concurrent stream cap = 10(per-instance) + * - AC-5:Range header silently ignored、回 200 整段 + * - AC-6:收到 Range header 寫 audit log `result.range_attempted` + * - AC-7:response stream timeout 5 min(res.setTimeout + req.setTimeout) + * - AC-8:stream cleanup(finish / close / error)+ stream_completed flag + * - AC-9:12 種 audit log event、A.7 五欄 + /result 四欄 + * - AC-10:audit log INFO/WARN level 對齊 §11 + * - AC-11:filename quote-escape + RFC 5987 fallback + assertion fail-secure + * - AC-12:response header `Accept-Ranges: none`(明示不支援 Range) + * + * 失敗路徑(api-result.md §4.1): + * 401 invalid_token / 404 job_not_found / 404 result_not_found / + * 409 job_not_completed / 410 result_expired / 422 invalid_request / + * 429 rate_limit_exceeded / 429 bandwidth_quota_exceeded / + * 502 storage_unavailable / 503 service_busy / 503 stream_timeout + * + * Wire 順序(**勿改**、由 v1/index.js 套用): + * requireApiKey() → resultBurstLimiter → resultSustainedLimiter → + * resultBandwidthQuota.middleware → resultStreamConcurrency.middleware → handler + * + * 注意:本 router 用 `mergeParams: true`,由 v1/index.js 掛在 + * `/jobs/:id/result`、handler 用 `/` path 讀 `:id`。 + */ + +'use strict'; + +const express = require('express'); + +const { ApiError } = require('../../middleware/errorHandler'); + +// --------------------------------------------------------------------------- +// 常數 +// --------------------------------------------------------------------------- + +/** + * Stream response timeout(預設 5 分鐘,env `RESULT_STREAM_TIMEOUT_MS` 覆寫)。 + * 對齊 api-result.md §15.2:5 min 最低 throughput 容忍 ≈ 1.7 MB/s,合法 + * client 即使在中等網路(10 Mbps)也能在 5 min 內拿完 500MB。 + */ +const DEFAULT_STREAM_TIMEOUT_MS = 5 * 60 * 1000; + +/** + * 從 env 讀 stream timeout(lazy、給測試可在 require 後改 env)。 + * + * @returns {number} + */ +function getStreamTimeoutMs() { + const raw = process.env.RESULT_STREAM_TIMEOUT_MS; + if (raw && /^\d+$/.test(raw)) { + const n = Number.parseInt(raw, 10); + if (n > 0) return n; + } + return DEFAULT_STREAM_TIMEOUT_MS; +} + +// --------------------------------------------------------------------------- +// helpers +// --------------------------------------------------------------------------- + +/** + * 結構化 audit log(與 apiKeyMiddleware / promote 同樣格式)。 + * + * @param {object} fields + */ +function defaultLogAudit(fields) { + // eslint-disable-next-line no-console + console.log( + JSON.stringify({ + service: 'task-scheduler', + timestamp: new Date().toISOString(), + ...fields, + }) + ); +} + +/** + * 從 job record 拿 NEF object key(雙路徑,對齊 promote.js getJobOutputKey)。 + * + * @param {object} job + * @returns {string|null} + */ +function extractNefObjectKey(job) { + if (!job || typeof job !== 'object') return null; + // 新格式(T9 後) + if ( + job.result_object_keys && + typeof job.result_object_keys === 'object' && + typeof job.result_object_keys.nef === 'string' && + job.result_object_keys.nef.length > 0 + ) { + return job.result_object_keys.nef; + } + // 舊格式(向後相容) + if ( + job.output && + typeof job.output === 'object' && + typeof job.output.nef_path === 'string' && + job.output.nef_path.length > 0 + ) { + return job.output.nef_path; + } + return null; +} + +/** + * AC-11:buildFilename + defense-in-depth assertion。 + * + * 規則(api-result.md §3.2 + §13.4a): + * - 拿 `job.source_filename` 去掉常見副檔名得 stem + * - 拿 `job.platform`(lowercase) + * - 兩者皆有 → `_.nef` + * - 任一缺 → fallback `job_.nef` + * - **assertion**:結果必須符合 `/^[A-Za-z0-9._-]+$/`,否則 fail-secure fallback + * 到 `job_.nef`、寫 audit log `result.filename_assertion_failed` + * + * 為什麼 fail-secure 而非 throw: + * `/result` 必須回 NEF 給合法 caller;assertion 失敗時不能拒服務、 + * 應 fallback 安全 filename 並寫告警 log。 + * + * @param {object} job + * @param {(fields: object) => void} [logAuditFn] + * @param {object} [auditContext] - A.7 五欄 + /result 四欄 + * @returns {string} + */ +function buildFilename(job, logAuditFn, auditContext) { + const sourceFilename = + job && typeof job.source_filename === 'string' ? job.source_filename : ''; + const platform = + job && typeof job.platform === 'string' ? job.platform.toLowerCase() : ''; + + // 去除常見副檔名(onnx / tflite / pb / h5 / pt / pth) + const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, ''); + const jobId = job && typeof job.job_id === 'string' ? job.job_id : 'unknown'; + const safeFallback = `job_${jobId}.nef`; + + const candidate = stem && platform ? `${stem}_${platform}.nef` : safeFallback; + + // Defense-in-depth assertion(§13.4a) + if (!/^[A-Za-z0-9._-]+$/.test(candidate)) { + if (typeof logAuditFn === 'function') { + const baseContext = + auditContext && typeof auditContext === 'object' ? auditContext : {}; + logAuditFn({ + level: 'ERROR', + action: 'result.filename_assertion_failed', + ...baseContext, + expected_pattern: '^[A-Za-z0-9._-]+$', + actual_filename: candidate.slice(0, 100), // 截短 100 字、避免 log injection + }); + } + return safeFallback; + } + return candidate; +} + +/** + * AC-11:Content-Disposition header value 構造(quote-escape + RFC 5987 fallback)。 + * + * 規則(api-result.md §13.4a): + * 1. ASCII safe:filename 含 non-ASCII 時、ASCII 部份用 `_` 替代 + * (Phase B sanitize 已限白名單、不會有 non-ASCII,但 defense-in-depth 處理) + * 2. Backslash + double quote 雙重轉義 + * 3. RFC 5987 `filename*=UTF-8''...` extended syntax(為未來 unicode 預留) + * + * @param {string} filename + * @returns {string} + */ +function buildContentDisposition(filename) { + const safeFilename = typeof filename === 'string' ? filename : ''; + // 1. ASCII-safe:非 ASCII 改 _(Phase B 不會出現、但 defense-in-depth) + // eslint-disable-next-line no-control-regex + const ascii = safeFilename.replace(/[^\x20-\x7E]/g, '_'); + // 2. quote-escape:先處理 `\`(避免 `\"` 被分段轉義時順序錯),再處理 `"` + const escaped = ascii.replace(/\\/g, '\\\\').replace(/"/g, '\\"'); + // 3. RFC 5987 extended(filename*)— 為未來 unicode 預留 + const encoded = encodeURIComponent(safeFilename); + return `attachment; filename="${escaped}"; filename*=UTF-8''${encoded}`; +} + +// --------------------------------------------------------------------------- +// handler factory +// --------------------------------------------------------------------------- + +/** + * 建立 GET /:id/result handler。 + * + * @param {object} deps + * @param {{ getJob: (id: string) => Promise }} deps.jobService + * @param {{ getObjectStream: (key: string) => Promise<{ stream: any, contentLength?: number, contentType?: string } | null> }} deps.minioStorage + * @param {{ consume: (key: string, bytes: number) => void }} [deps.bandwidthQuota] - + * 若提供,在 stream 完成時呼叫 consume(token_fingerprint, bytesStreamed) + * @param {(fields: object) => void} [deps.onLog] - audit log hook(測試友善) + * @returns {import('express').RequestHandler} + */ +function buildResultHandler(deps) { + if (!deps || typeof deps !== 'object') { + throw new Error('[buildResultHandler] deps is required'); + } + const { jobService, minioStorage, bandwidthQuota } = deps; + if (!jobService || typeof jobService.getJob !== 'function') { + throw new Error('[buildResultHandler] deps.jobService.getJob is required'); + } + if (!minioStorage || typeof minioStorage.getObjectStream !== 'function') { + throw new Error( + '[buildResultHandler] deps.minioStorage.getObjectStream is required' + ); + } + const logAudit = + typeof deps.onLog === 'function' ? deps.onLog : defaultLogAudit; + + return async function resultHandler(req, res, next) { + const startedAtMs = Date.now(); + const jobId = req.params && req.params.id ? req.params.id : null; + + // 共用 audit log context(A.7 五欄 + /result 四欄) + const auditBase = { + source_ip: req.ip || null, + token_fingerprint: + req.auth && typeof req.auth.tokenFingerprint === 'string' + ? req.auth.tokenFingerprint + : null, + request_id: req.requestId || null, + http_method: 'GET', + http_path: req.originalUrl || (req.url || ''), + job_id: jobId, + }; + + // AC-5 + AC-6:Range header silently ignore + 寫 audit log + // + // 為什麼 INFO 不是 WARN:Range header 本身不是攻擊(HTTP 標準),但出現在 + // `/result` 是 anomalous signal(visionA 端不發 Range)→ 值得記錄、不該告警。 + if (req.headers && req.headers.range) { + logAudit({ + level: 'INFO', + action: 'result.range_attempted', + ...auditBase, + // 截短 100 字、避免 log injection + range_header_received: String(req.headers.range).slice(0, 100), + }); + // 不回 416、不解析 Range → 繼續走 200 整段流程 + } + + try { + // 1. 驗 jobId path param + if (typeof jobId !== 'string' || jobId === '') { + return next( + new ApiError(422, 'invalid_request', 'job id is required') + ); + } + + // 2. 拿 job record + const job = await jobService.getJob(jobId); + if (!job) { + logAudit({ + level: 'WARN', + action: 'result.not_found', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + reason: 'job_not_found', + }); + return next( + new ApiError(404, 'job_not_found', `Job ${jobId} not found`) + ); + } + + // 3. status 檢查:必須 COMPLETED 才有結果可下載 + // 內部 status 是大寫('COMPLETED' / 'ONNX' / 'BIE' / 'NEF' / 'FAILED') + if (job.status !== 'COMPLETED') { + logAudit({ + level: 'WARN', + action: 'result.not_completed', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + current_status: job.status || null, + }); + return next( + new ApiError( + 409, + 'job_not_completed', + `Job ${jobId} is ${job.status}; result only available after completion`, + { current_status: job.status || null } + ) + ); + } + + // 4. expires_at 檢查(lifecycle 過期) + if (job.expires_at) { + const expiresAtMs = new Date(job.expires_at).valueOf(); + const nowMs = Date.now(); + if (Number.isFinite(expiresAtMs) && expiresAtMs < nowMs) { + logAudit({ + level: 'WARN', + action: 'result.expired', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + expires_at: job.expires_at, + expired_by_ms: nowMs - expiresAtMs, + }); + return next( + new ApiError( + 410, + 'result_expired', + `Job ${jobId} result expired at ${job.expires_at}; re-convert to get a fresh result` + ) + ); + } + } + + // 5. NEF object key 解析(雙路徑) + const nefKey = extractNefObjectKey(job); + if (!nefKey) { + logAudit({ + level: 'WARN', + action: 'result.not_found', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + reason: 'no_nef_key', + }); + return next( + new ApiError( + 404, + 'result_not_found', + `Job ${jobId} completed but no NEF result available` + ) + ); + } + + // 6. MinIO stream + let mstream; + try { + mstream = await minioStorage.getObjectStream(nefKey); + } catch (err) { + // 不 log err.message — 可能含 MinIO endpoint / region / object key + // 只 log err.name / err.code 分類用 + logAudit({ + level: 'ERROR', + action: 'result.storage_unavailable', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + error_name: err && err.name ? err.name : 'unknown', + error_code: err && err.code ? err.code : null, + }); + return next( + new ApiError( + 502, + 'storage_unavailable', + '無法讀取結果檔,請稍後重試' + ) + ); + } + if (!mstream || !mstream.stream) { + // MinIO getObjectStream 回 null → 物件已 lifecycle 清掉 + logAudit({ + level: 'WARN', + action: 'result.expired', + ...auditBase, + duration_ms: Date.now() - startedAtMs, + reason: 'minio_object_missing', + }); + return next( + new ApiError( + 410, + 'result_expired', + `Job ${jobId} NEF object not found in storage (likely expired)` + ) + ); + } + + // 7. Stream 準備工作(在 setHeader 之前完成 filename 構造) + const filename = buildFilename(job, logAudit, auditBase); + const contentDispositionValue = buildContentDisposition(filename); + + // 8. AC-12 + 設 response headers + // + // 為什麼明示 `Accept-Ranges: none`: + // - RFC 7233 §2.3:server 應明示支援狀態 + // - 省略時 client 可能 speculatively 試 Range;明示後 well-behaved client 不試 + // - silently ignore Range request → 不洩漏 server 是否懂 Range(feature + // detection 失敗、attack surface 最小化) + res.setHeader( + 'Content-Type', + mstream.contentType || 'application/octet-stream' + ); + if (typeof mstream.contentLength === 'number') { + res.setHeader('Content-Length', String(mstream.contentLength)); + } + res.setHeader('Content-Disposition', contentDispositionValue); + res.setHeader('Accept-Ranges', 'none'); + + // 9. AC-7 + AC-8 + AC-9:stream 累計 / timeout / cleanup + const streamTimeoutMs = getStreamTimeoutMs(); + let bytesStreamed = 0; + let streamCompleted = false; + let finalEventSent = false; + + // 共用「終態 event 寫一次」邏輯(避免 finish + close 雙觸發) + const sendFinalEvent = (action, level, extraFields = {}) => { + if (finalEventSent) return; + finalEventSent = true; + logAudit({ + level, + action, + ...auditBase, + size_bytes: bytesStreamed, + duration_ms: Date.now() - startedAtMs, + stream_completed: streamCompleted, + ...extraFields, + }); + }; + + // AC-7:response stream timeout(res 端 + req 端雙覆蓋) + // + // Node 預設 server.timeout = 0(無上限);attacker 用 slow read(1 byte/30s) + // 可霸佔 socket + MinIO upstream connection 數小時。 + // 5 min 最低 throughput 容忍 ≈ 1.7 MB/s(500MB / 5 min),合法 client 即使 + // 在中等網路(10 Mbps)也能在 5 min 內拿完。 + res.setTimeout(streamTimeoutMs, () => { + sendFinalEvent('result.stream_timeout', 'WARN', { + timeout_ms: streamTimeoutMs, + bytes_streamed_at_timeout: bytesStreamed, + }); + // destroy 兩端 + if ( + mstream.stream && + typeof mstream.stream.destroy === 'function' && + !mstream.stream.destroyed + ) { + mstream.stream.destroy(); + } + if (!res.destroyed) { + res.destroy(new Error('Response stream timeout')); + } + }); + req.setTimeout(streamTimeoutMs); + + // AC-8:bytes 累計 + cleanup + mstream.stream.on('data', (chunk) => { + bytesStreamed += chunk && typeof chunk.length === 'number' ? chunk.length : 0; + }); + + // AC-8:stream error(MinIO 中斷 / network) + mstream.stream.on('error', (streamErr) => { + sendFinalEvent('result.stream_error', 'ERROR', { + error_type: 'minio_disconnect', + error_message: + streamErr && streamErr.message + ? String(streamErr.message).slice(0, 100) + : 'unknown', + }); + if (!res.destroyed) res.destroy(streamErr); + }); + + // AC-8:stream 完成(end emit + 沒 error) + mstream.stream.on('end', () => { + streamCompleted = true; + }); + + // Client 主動斷線 → cleanup MinIO stream(避免 connection leak) + req.on('close', () => { + if ( + mstream.stream && + typeof mstream.stream.destroy === 'function' && + !mstream.stream.destroyed + ) { + mstream.stream.destroy(); + } + }); + + // res.on('finish') 是 response 正常送完 + res.on('finish', () => { + if (streamCompleted) { + sendFinalEvent('result.streamed', 'INFO', { + content_length: + typeof mstream.contentLength === 'number' + ? mstream.contentLength + : null, + }); + // AC-3:bandwidth quota consume(成功 stream 才扣 quota) + if ( + bandwidthQuota && + typeof bandwidthQuota.consume === 'function' && + auditBase.token_fingerprint + ) { + try { + bandwidthQuota.consume(auditBase.token_fingerprint, bytesStreamed); + } catch (_) { + /* noop — quota consume 不應該 throw、但保險 */ + } + } + } else { + // 罕見:response finish 但 stream 未 end(partial stream + finish?) + // + // 為什麼用 `partial_stream`(spec §11.5 enum 待補): + // pipe 模式下 res emit 'finish' 通常意味著 stream emit 'end' 才會發生; + // 若 streamCompleted=false 卻 res finish=true、最可能是 `res.destroy()` + // 後 race 進來(如 stream timeout 觸發 destroy 後、underlying socket + // 先 flush 完已 buffer 的 chunk 並 emit finish)、或 backpressure 異常。 + // forensic 上用 `partial_stream` 比 `unknown` 更精確、能讓 ops 直接 + // 定位這一類 race condition;Architect 需在 spec §11.5 補入此 enum。 + sendFinalEvent('result.stream_error', 'INFO', { + error_type: 'partial_stream', + error_message: 'response finished without stream end', + }); + } + }); + + // res.on('close') 涵蓋 client abort / error 情境 + res.on('close', () => { + if (!finalEventSent) { + // Stream 沒完整送完且還沒寫 final event → client_closed + sendFinalEvent('result.client_closed', 'INFO', {}); + } + }); + + // 10. AC-5:pipe stream(Range header 已 ignore、Accept-Ranges: none 已設) + mstream.stream.pipe(res); + } catch (err) { + // 兜底:不該發生(上面已 try/catch 涵蓋);不洩漏內部錯誤 + logAudit({ + level: 'ERROR', + action: 'result.stream_error', + ...auditBase, + size_bytes: 0, + duration_ms: Date.now() - startedAtMs, + stream_completed: false, + error_type: 'unknown', + error_message: + err && err.message ? String(err.message).slice(0, 100) : 'unknown', + }); + return next(err); + } + }; +} + +// --------------------------------------------------------------------------- +// router factory +// --------------------------------------------------------------------------- + +/** + * 建立 result router。 + * + * @param {object} [deps] + * @param {object} [deps.jobService] + * @param {object} [deps.minioStorage] - 對齊 promote 的 minio facade + * @param {object} [deps.bandwidthQuota] - createResultBandwidthQuota 結果 + * @param {(fields: object) => void} [deps.onLog] + * @returns {import('express').Router} + */ +function createResultRouter(deps = {}) { + const router = express.Router({ mergeParams: true }); + const { jobService, minioStorage } = deps; + + // 缺 deps fallback → 501(與 jobs / promote 同 pattern) + if (!jobService || !minioStorage) { + const missing = []; + if (!jobService) missing.push('jobService'); + if (!minioStorage) missing.push('minioStorage'); + const missingList = missing.join(', '); + router.get('/', (req, res, next) => { + return next( + new ApiError( + 501, + 'not_implemented', + `GET /api/v1/jobs/:id/result 端點需要 jobService / minioStorage 注入;當前環境配置不完整,缺漏依賴:${missingList}` + ) + ); + }); + return router; + } + + const handler = buildResultHandler({ + jobService, + minioStorage, + bandwidthQuota: deps.bandwidthQuota, + onLog: deps.onLog, + }); + + router.get('/', handler); + return router; +} + +module.exports = { + createResultRouter, + // 測試 / 內部用 + _internals: { + buildResultHandler, + buildFilename, + buildContentDisposition, + extractNefObjectKey, + getStreamTimeoutMs, + defaultLogAudit, + DEFAULT_STREAM_TIMEOUT_MS, + }, +}; diff --git a/docs/TODO-visionA-integration-v2.md b/docs/TODO-visionA-integration-v2.md new file mode 100644 index 0000000..a632f10 --- /dev/null +++ b/docs/TODO-visionA-integration-v2.md @@ -0,0 +1,713 @@ +# visionA Cloud 整合 — converter scheduler 交接檔 v2 + +> **時間**:2026-05-16 +> **背景**:visionA Cloud Phase 0.8b — 從 OAuth client_credentials 改 API key + 重設計 download 路徑 +> **替代**:本檔取代 `docs/TODO-visionA-integration.md`(5/2 寫的、ADR-014 設計 — 已過時) +> **對應**:visionA repo 的 `docs/autoflow/04-architecture/adr/adr-015-server-to-server-api-key.md` v2.1 + `adr-016-download-via-converter.md` v1.0 + +--- + +## 1. 為什麼會有這份交接檔 + +### 5/2 原本的設計(ADR-014、現已 supersede) + +visionA → converter / FAA 走 **OAuth `client_credentials`** + MC(Member Center)service token + scope (`converter:job.read/write` / `files:download.delegate`)。 + +### 5/9 stage e2e 撞 4 個 blocker → ADR-015 改 API key + +| # | Blocker | +|---|---------| +| 1 | MC stage 沒註冊 `converter:job.read/write` 兩個 scope | +| 2 | converter image 5 週前舊版、沒 OAuth middleware、沒 `/api/v1/jobs` endpoint | +| 3 | converter 缺 `MEMBER_CENTER_*` env | +| 4 | FAA 端 OAuth 整合狀態不確定(warrenchen 維護) | + +**使用者拍板**:1:1 internal trust(visionA ↔ converter 是 1 對 1)用 OAuth 過度設計、改 pre-shared API key。 + +### 5/16 grep MC + FAA source 發現 ADR-014 §2 設計缺口 → ADR-016 改 download 路徑 + +```bash +grep -rn "delegated\|DownloadToken" member_center/src --include="*.cs" +# 0 命中 — MC source 沒有 issue / validate delegated download token endpoint +``` + +MC 端**從來就沒有實作** ADR-014 §2 假設的「issue delegated download token」+「validate delegated download token」兩個 endpoint。FAA 的 `MemberCenterDelegatedDownloadTokenValidator.cs` 假設 MC 有 `_options.DownloadTokenValidationPath` introspection endpoint,**也假設錯了**。 + +→ **delegated token 鏈從 5/2 寫完到現在一直是斷的**,只是因為從未實際 e2e 跑通過所以沒人發現。 + +**使用者拍板**:不動 MC、不動 FAA。改設計成 visionA → converter `GET /api/v1/jobs/{id}/result` 中轉(ADR-016)。 + +--- + +## 2. converter scheduler 需要做的 2 件事 + +| # | 範圍 | 動 code 多少 | 風險 | +|---|------|-------------|------| +| **任務 A** | 加 API key middleware(取代 OAuth JWT 驗證、或並存)| ~50-100 行 | 低(現有 `requireAuth` pattern 可借鑑)| +| **任務 B** | 加 `GET /api/v1/jobs/:id/result` endpoint | ~80 行 + test | 低(`getObjectStream` 已存在、Phase 2 預留位已有 routing 慣例)| + +兩個都在 `apps/task-scheduler/src/` 內、單一 repo。 + +--- + +## 3. 任務 A — API key middleware + +### 3.1 設計取捨:API key only vs API key + OAuth 並存 + +**推薦:並存模式**(最少 breaking change) + +| 設計 | 動 code | 影響 | 推薦? | +|------|---------|------|-------| +| **A. 純 API key**(砍 OAuth)| 砍 `src/auth/middleware.js` + `src/auth/jwks.js` + `src/auth/oauthClient.js` | 既有 OAuth caller 全部要 migrate(如有)| ❌ 風險高 | +| **B. 並存**(OAuth + API key 二選一)| 新增 `src/auth/apiKeyMiddleware.js` + 改 `routes/v1/index.js` wire | 既有 OAuth 路徑完全不動、API key 是額外 path | ✅ **推薦** | +| **C. 純 API key + 保留 OAuth helper code**(不 wire)| 既有 `auth/` 留著但不啟用 | 模糊、未來容易誤啟用 | ⚠️ 不推薦 | + +採 **B 並存**,理由: +1. visionA 是 1 個 caller、走 API key +2. 其他既有 caller(如 jimchen 手動測試、CI、未來其他產品線)仍可用 OAuth +3. converter 不需要強迫所有 caller 一次 migrate +4. 如果未來 100% caller 都用 API key、再砍 OAuth path + +### 3.2 實作(新增 `src/auth/apiKeyMiddleware.js`) + +對齊既有 `requireAuth(scope)` API surface,讓 routes/v1/jobs.js 可以無痛換掛。 + +```javascript +// src/auth/apiKeyMiddleware.js +// +// API key middleware — Phase 0.8b 為 visionA 整合新增。 +// +// 設計: +// - 接受 Authorization: Bearer +// - 用 crypto.timingSafeEqual constant-time compare(避免 timing attack) +// - 不驗 scope / tenant — 1:1 internal trust,API key 就是「caller 是 visionA」的完整證明 +// - 對齊既有 sendAuthError 模式(含 destroy socket M2 行為) +// +// 對應 visionA repo 的: +// - ADR-015 v2.1 §1 visionA → converter +// - ADR-015 v2.1 §3.5.1 reference middleware implementation (Go) +// - 本檔是 Node.js port +// +// 如何接: +// const { requireAuth } = require('./middleware'); // 既有 OAuth +// const { requireApiKey } = require('./apiKeyMiddleware'); // 新 API key +// const auth = requireApiKey() || requireAuth('converter:job.write'); // 不能直接這樣寫,看 §3.3 +// +// 看 §3.3「並存策略」實際 wire 方式。 + +'use strict'; + +const crypto = require('crypto'); + +/** + * 解析 Bearer header(複用 ./middleware.js 內部 helper 邏輯)。 + */ +function extractBearerToken(headerValue) { + if (typeof headerValue !== 'string' || headerValue.length === 0) return null; + const match = headerValue.match(/^Bearer\s+(.+)$/i); + if (!match) return null; + const token = match[1].trim(); + return token === '' ? null : token; +} + +/** + * sendApiKeyError — 對齊既有 ./middleware.js sendAuthError 的 destroy socket M2 行為。 + * + * 不直接 require ./middleware._internals.sendAuthError 是因為它含 request_id 邏輯、 + * 跟 API key 不同 context;inline 一個簡化版避免循環依賴。 + */ +function sendApiKeyError(req, res, status, code, message) { + if (res.headersSent) { + try { if (req.socket && !req.socket.destroyed) req.socket.destroy(); } catch (_) {} + return; + } + res.setHeader('Connection', 'close'); + res.status(status).json({ + error: { + code, + message, + request_id: req.requestId || null, + }, + }); + res.once('finish', () => { + try { if (req.socket && !req.socket.destroyed) req.socket.destroy(); } catch (_) {} + }); +} + +/** + * Constant-time string compare。 + * + * 重要: + * - 必須長度先比(避免 timingSafeEqual 在長度不同時 throw) + * - 長度不算 secret(公開資訊) + */ +function constantTimeEquals(a, b) { + if (typeof a !== 'string' || typeof b !== 'string') return false; + const bufA = Buffer.from(a, 'utf8'); + const bufB = Buffer.from(b, 'utf8'); + if (bufA.length !== bufB.length) return false; + return crypto.timingSafeEqual(bufA, bufB); +} + +/** + * 建立一個 requireApiKey middleware。 + * + * @param {object} [deps] — 測試注入 + * @param {string} [deps.expectedApiKey] — 明文 API key;不傳則 lazy load from config + * @returns {import('express').RequestHandler} + */ +function requireApiKey(deps = {}) { + let expected = deps.expectedApiKey; + + return function apiKeyMiddleware(req, res, next) { + try { + if (!expected) { + // Lazy-load config(對齊 ./middleware.js pattern) + const config = require('../config').loadConfig(); + expected = config.converter.apiKey; // §3.4 加入 config 後可讀 + } + + // Fail-fast: API key 未設定就拒絕所有 request(不要 silently allow) + if (!expected || expected === '') { + // eslint-disable-next-line no-console + console.error(JSON.stringify({ + level: 'ERROR', + action: 'auth.api_key.not_configured', + message: 'CONVERTER_API_KEY env not set; rejecting all requests', + timestamp: new Date().toISOString(), + })); + return sendApiKeyError(req, res, 503, 'service_unavailable', 'API key not configured'); + } + + const token = extractBearerToken(req.headers && req.headers.authorization); + if (!token) { + return sendApiKeyError(req, res, 401, 'invalid_token', + '缺少或格式錯誤的 Authorization header(需為 Bearer )'); + } + + if (!constantTimeEquals(token, expected)) { + return sendApiKeyError(req, res, 401, 'invalid_token', 'API key 驗證失敗'); + } + + // 驗證成功 — 設 req.auth 給下游使用(對齊 OAuth middleware 的 req.auth shape) + req.auth = { + sub: 'visionA-service', // 固定值(API key 沒 sub) + clientId: 'visionA-service', + tenantId: null, // API key 不帶 tenant + scopes: ['converter:job.write', 'converter:job.read'], // implicit full access + raw: { authType: 'api_key' }, + }; + + return next(); + } catch (err) { + // eslint-disable-next-line no-console + console.error(JSON.stringify({ + level: 'ERROR', + action: 'auth.api_key.unexpected_error', + message: err && err.message ? err.message : 'unknown', + timestamp: new Date().toISOString(), + })); + return sendApiKeyError(req, res, 401, 'invalid_token', 'API key 驗證失敗'); + } + }; +} + +module.exports = { + requireApiKey, + _internals: { extractBearerToken, constantTimeEquals, sendApiKeyError }, +}; +``` + +### 3.3 並存策略 — 二選一 middleware + +最簡單做法:寫一個 `requireApiKeyOrOAuth(oauthScope)` wrapper。 + +```javascript +// src/auth/middleware.js 末尾加: + +const { requireApiKey } = require('./apiKeyMiddleware'); + +/** + * 並存 middleware — 先試 API key、不行 fallback OAuth。 + * + * 用法(取代既有 routes/v1/jobs.js + promote.js 的 requireAuth(scope) 呼叫): + * const auth = require('./middleware'); + * router.post('/jobs', auth.requireApiKeyOrOAuth('converter:job.write'), handler); + * + * 行為: + * 1. 沒帶 Authorization header → 401(API key middleware 處理) + * 2. 帶的 token 是 visionA 的 API key(constant-time match)→ API key path 過、req.auth 設好、next() + * 3. 帶的 token 不 match API key → 走 OAuth JWT 驗證(既有 requireAuth) + * - 過 → next() + * - 不過 → 401(OAuth middleware 處理) + * + * 為什麼 API key 優先:API key compare 快(constant-time string compare),JWT 驗證慢(JWKS fetch + verify)。 + */ +function requireApiKeyOrOAuth(oauthScope) { + const apiKey = requireApiKey(); + const oauth = requireAuth(oauthScope); + + return function combinedAuth(req, res, next) { + // 先攔截 response — 如果 API key 過了 next(),就完成;如果 API key 寫了 401,我們 swap 成試 OAuth + const originalSetHeader = res.setHeader.bind(res); + const originalStatus = res.status.bind(res); + const originalJson = res.json.bind(res); + + let apiKeyRejected = false; + let pendingResponse = null; + + // 暫時 mock res 來看 API key middleware 的決定 + const mockRes = { + setHeader: (...args) => { /* swallow */ }, + status: (code) => ({ json: (body) => { apiKeyRejected = true; pendingResponse = { code, body }; return mockRes; } }), + headersSent: false, + once: () => {}, + }; + + apiKey(req, mockRes, (err) => { + if (err) return next(err); + if (apiKeyRejected) { + // API key 失敗 → 試 OAuth(真的 res 給 OAuth 用) + return oauth(req, res, next); + } + // API key 過 → next() 已被 apiKey middleware 呼叫 + return next(); + }); + }; +} +``` + +**注意**:上面 `requireApiKeyOrOAuth` 用 mock res 攔截 API key 的 401 response、實作有點 hacky。**比較乾淨的做法**是兩個 middleware 都不直接 send response、而是 `next(err)` 給統一 error handler。但這需要改 `sendAuthError` 行為、變動範圍大。 + +**建議**:先用上面 hacky 版本 ship、後續 refactor 改成 `next(err)` 模式。 + +**或者**:更簡單——**直接砍 OAuth、只用 API key**(如果你確定沒其他 caller)。看下面 §3.5 決策樹。 + +### 3.4 Config 加 API key + +```javascript +// src/config.js — 在 converter 段落內加: + +// 在 schema / loadConfig 內: +const config = { + // ... 既有欄位 + converter: { + // ... 既有 audience / scopeWrite / scopeRead / tenantId + apiKey: process.env.CONVERTER_API_KEY || '', // Phase 0.8b 新增 + }, +}; + +// 啟動時驗證(fail-fast): +function validateConfig(config) { + // 既有驗證... + // 不強制 apiKey 必填(保留 OAuth-only 部署)— 但 startup log 印明確訊息 + if (!config.converter.apiKey) { + // eslint-disable-next-line no-console + console.warn(JSON.stringify({ + level: 'WARN', + action: 'config.api_key_not_set', + message: 'CONVERTER_API_KEY env not set; API key middleware will reject all requests', + timestamp: new Date().toISOString(), + })); + } else { + console.log(JSON.stringify({ + level: 'INFO', + action: 'config.api_key_enabled', + message: 'API key middleware enabled', + // 不印 key 本身(對齊 visionA 的 api_key_set boolean pattern) + api_key_length: config.converter.apiKey.length, + timestamp: new Date().toISOString(), + })); + } +} +``` + +### 3.5 決策樹:純 API key vs 並存 + +**先回答**:除了 visionA 之外,**現在 / 短期內**還有別的 caller 會打 converter `/api/v1/*` 嗎? + +| 答案 | 推薦 | +|------|------| +| 沒有 / 不確定 | **純 API key**(砍 OAuth)— 簡單、code 少、不用維護兩條 path | +| 有 / 短期會有 | **並存** — 用 §3.3 的 `requireApiKeyOrOAuth` | +| 只有 jimchen 手動測試會用 | **純 API key** + 提供 API key 給自己用即可 | + +**我(jimchen)建議**:**純 API key**。理由: +- visionA 是唯一真實 caller +- 自己手動測試用同一把 API key 就好 +- 並存增加 code 複雜度(`requireApiKeyOrOAuth` hacky)+ 維護成本 +- 未來真有第二個 caller 再加 OAuth 回來不遲 + +→ **如果你選純 API key**:把 `requireAuth(scope)` 全部改成 `requireApiKey()`(4 個 endpoint)、砍掉 `auth/middleware.js` + `auth/jwks.js` + `auth/oauthClient.js`、`config.js` 移除 OAuth 相關欄位。 + +--- + +## 4. 任務 B — 新增 `GET /api/v1/jobs/:id/result` endpoint + +對齊 visionA repo 的 **ADR-016 §1** spec。 + +### 4.1 API spec(**這是 visionA 端會打的契約、不可變動**) + +| 欄位 | 值 | +|------|---| +| **Method + Path** | `GET /api/v1/jobs/:id/result` | +| **Auth** | `Authorization: Bearer ` | +| **Query** | 無 | +| **Body** | 無 | + +#### Response 200(成功) + +```http +HTTP/1.1 200 OK +Content-Type: application/octet-stream +Content-Length: +Content-Disposition: attachment; filename="_.nef" + + +``` + +**重要**: +- 走 **streaming**、不要先 buffer 整個檔(NEF 可能幾百 MB) +- `Content-Length` 必須帶(visionA 端會用來決定是否 timeout) +- `Content-Disposition` filename 由 converter 端構造(visionA 端會用 `defaultDownloadFilename` 覆寫、但 converter 也要給) + +#### Response 4xx/5xx(錯誤) + +| HTTP | error.code | 情境 | +|------|-----------|------| +| 401 | `invalid_token` | API key 不對 / missing | +| 404 | `job_not_found` | jobID 不存在 | +| 409 | `job_not_completed` | job 還沒 completed(still running / failed) | +| **410** | `result_expired` | **converter MinIO 已過期清除(7 天 expires_at 後)** | +| 502 | `storage_unavailable` | MinIO 連不上 | +| 503 | `service_unavailable` | 其他暫時性錯誤 | + +Body 格式: +```json +{ + "error": { + "code": "job_not_found", + "message": "Job not found", + "request_id": "" + } +} +``` + +### 4.2 實作(新增 `src/routes/v1/result.js`) + +```javascript +// src/routes/v1/result.js +// +// GET /api/v1/jobs/:id/result — Phase 0.8b 為 visionA download 路徑新增。 +// +// 對應 visionA repo 的: +// - ADR-016 §1 API spec +// - conversion.md v0.6 §2.3 ConverterClient.GetResult method +// +// 設計: +// - Stream NEF binary 從 MinIO 回 caller(不 buffer) +// - 4 種失敗情境對應 4xx:401 (auth) / 404 (job 不存在) / 409 (還沒完成) / 410 (過期清除) +// - Phase 2 預留的 download-tokens endpoint (回 501) 仍保留、不撤銷 + +'use strict'; + +const express = require('express'); +const { ApiError } = require('../../middleware/errorHandler'); + +/** + * @param {object} deps + * @param {object} deps.jobService - existing job service(getJob method) + * @param {object} deps.minioStorage - storage facade(getObjectStream method) + * @returns {express.Router} + */ +function createResultRouter(deps = {}) { + const { jobService, minioStorage } = deps; + if (!jobService) throw new Error('[createResultRouter] jobService is required'); + if (!minioStorage) throw new Error('[createResultRouter] minioStorage is required'); + + const router = express.Router({ mergeParams: true }); // mergeParams 取 :id + + router.get('/', async (req, res, next) => { + try { + const jobId = req.params.id; + if (!jobId) { + return next(new ApiError(400, 'invalid_request', 'job id is required')); + } + + // 1. 拿 job record + const job = await jobService.getJob(jobId); + if (!job) { + return next(new ApiError(404, 'job_not_found', `Job ${jobId} not found`)); + } + + // 2. 檢查 status — 必須 completed 才能拿 result + if (job.status !== 'completed') { + return next(new ApiError(409, 'job_not_completed', + `Job ${jobId} is ${job.status}; result only available after completion`)); + } + + // 3. 檢查 expires_at — 過期 NEF 已從 MinIO 清掉 + if (job.expires_at && new Date(job.expires_at) < new Date()) { + return next(new ApiError(410, 'result_expired', + `Job ${jobId} result expired at ${job.expires_at}; re-convert to get a fresh result`)); + } + + // 4. 解析 result NEF object key + // 對齊 promote.js §extractSourceObjectKey 的雙路徑邏輯(新格式 result_object_keys / 舊格式 output) + const nefKey = extractNefObjectKey(job); + if (!nefKey) { + return next(new ApiError(404, 'result_not_found', + `Job ${jobId} completed but no NEF result available`)); + } + + // 5. 從 MinIO 拿 stream + metadata + const result = await minioStorage.getObjectStream(nefKey); + if (!result) { + // MinIO 說沒這個 object(與 job record 不一致 — 通常是過期清除但 record 沒同步更新) + return next(new ApiError(410, 'result_expired', + `Job ${jobId} NEF object not found in storage (likely expired)`)); + } + + // 6. 寫 response headers + res.setHeader('Content-Type', result.contentType || 'application/octet-stream'); + if (result.contentLength) { + res.setHeader('Content-Length', String(result.contentLength)); + } + const filename = buildFilename(job); + res.setHeader('Content-Disposition', + `attachment; filename="${filename}"`); + + // 7. Pipe stream 回 client + result.stream.on('error', (streamErr) => { + // 注意:此時 headers 可能已 sent、不能改 status code + // 只能 destroy connection、讓 client 看到 ECONNRESET + // eslint-disable-next-line no-console + console.error(JSON.stringify({ + level: 'ERROR', + action: 'result.stream_error', + job_id: jobId, + error: streamErr.message, + timestamp: new Date().toISOString(), + })); + if (!res.destroyed) res.destroy(streamErr); + }); + + req.on('close', () => { + // Client 中斷下載 — 主動關 stream 釋放 MinIO connection + if (result.stream && typeof result.stream.destroy === 'function') { + result.stream.destroy(); + } + }); + + result.stream.pipe(res); + } catch (err) { + return next(err); + } + }); + + return router; +} + +/** + * 從 job record 拿 NEF object key(雙路徑:新格式 + 舊格式)。 + * 對齊 promote.js extractSourceObjectKey logic。 + */ +function extractNefObjectKey(job) { + // 新格式 + if (job.result_object_keys && typeof job.result_object_keys === 'object' + && typeof job.result_object_keys.nef === 'string' + && job.result_object_keys.nef.length > 0) { + return job.result_object_keys.nef; + } + // 舊格式(向後相容) + if (job.output && typeof job.output === 'object' + && typeof job.output.nef_path === 'string' + && job.output.nef_path.length > 0) { + return job.output.nef_path; + } + return null; +} + +/** + * 構造 download filename。 + * + * 規則:_.nef + * 例:yolov5s.onnx + KL720 → yolov5s_kl720.nef + * + * Fallback:job_.nef(極端情況、source_filename 缺失) + */ +function buildFilename(job) { + const sourceFilename = job.source_filename || ''; + const platform = (job.platform || '').toLowerCase(); + + // 去副檔名 + const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5)$/i, ''); + + if (stem && platform) { + return `${stem}_${platform}.nef`; + } + return `job_${job.job_id || 'unknown'}.nef`; +} + +module.exports = { createResultRouter, _internals: { extractNefObjectKey, buildFilename } }; +``` + +### 4.3 Wire 進 v1 router + +```javascript +// src/routes/v1/index.js — 加 result router: + +const { createResultRouter } = require('./result'); +const { requireApiKey } = require('../../auth/apiKeyMiddleware'); // 或 requireApiKeyOrOAuth + +function createV1Router(deps) { + const router = express.Router(); + + // ... 既有 jobs / promote + + // 新:result endpoint(GET /api/v1/jobs/:id/result) + // 注意:mount 在 /jobs/:id/result 上、mergeParams 取 :id + router.use('/jobs/:id/result', + requireApiKey(), // 或 requireApiKeyOrOAuth('converter:job.read') + createResultRouter({ + jobService: deps.jobService, + minioStorage: deps.minioStorage, + })); + + // errorHandler 仍掛最末 + router.use(errorHandler); + return router; +} +``` + +### 4.4 Test(新增 `src/routes/v1/__tests__/result.integration.test.js`) + +最少 cover 4 個情境: +- ✅ 200 happy path — completed job + 有 NEF + 不過期 → stream 整段、Content-Type / Content-Length / Content-Disposition 正確 +- ❌ 404 job 不存在 +- ❌ 409 job 還在 running +- ❌ 410 job 已過期(測 expires_at 在過去) +- ❌ 401 missing API key / wrong API key(如果用 requireApiKey) + +對齊既有 `__tests__/getJobs.integration.test.js` 的 fixture / mock pattern。 + +--- + +## 5. 部署順序(重要 — visionA / converter 雙端對齊) + +**錯誤的順序會讓 stage 整段 down**。正確順序: + +``` +Step 1: converter 端先實作完 + deploy(含 API key 驗證 + result endpoint) + → 但 CONVERTER_API_KEY env 設成跟 visionA 一樣的值 + → 此時 converter 同時接 OAuth(既有)+ API key(新) + → 既有 caller 不受影響 + +Step 2: 驗證 converter 新 endpoint 可用 + → 用 curl 打 GET /api/v1/jobs/<某個 completed job>/result 帶 Bearer + → 確認回 200 + NEF binary stream + +Step 3: visionA backend deploy(已 ready、commit 9e29ebf) + → VISIONA_CONVERTER_API_KEY env 跟 CONVERTER_API_KEY 對齊 + → 此時 visionA 用 API key 打 converter、走新的 GetResult endpoint + +Step 4: e2e 驗證 + → User upload → init → poll → promote → download + → 全綠 = 完成 + +Step 5(選配): 砍 converter OAuth path + → 確認沒其他 caller 後、砍 OAuth middleware + jwks + oauthClient + → 砍 MEMBER_CENTER_* env +``` + +--- + +## 6. CONVERTER_API_KEY 怎麼產 + +```bash +openssl rand -hex 32 +# 輸出:64 個 hex 字元、例如:a3f9b2c1d8e7f6a5b4c3d2e1f0987654321fedcba9876543210abcdef1234567 +``` + +**部署**: +- converter stage:放 `kneron_model_converter/apps/task-scheduler/.env` 或對應的 docker-compose env +- visionA stage:放 `~/visionA/.env.stage` 的 `VISIONA_CONVERTER_API_KEY=...` +- **兩端必須完全相同字串** + +**安全**: +- ⚠️ **絕不進 git**(`.gitignore` 已 exclude `.env`、verify 一次) +- ⚠️ **絕不寫進 Slack / email / 對話** +- ⚠️ **絕不印 log**(middleware 內 log 用 `api_key_length` 或 `api_key_set: true` boolean、不印 key 本身) +- 每環境獨立 key(dev / stage / prod 各自 `openssl rand -hex 32`) + +--- + +## 7. 既有 promote 流程不變 + +**重要**:converter promote 流程(`POST /api/v1/jobs/:id/promote` → converter 自己 PUT FAA)**完全不動**。 + +- visionA → converter promote 仍會打、但 visionA 拿到 promote response 後**不再從 FAA pull NEF**(v0.6 設計改成從 converter GetResult pull) +- converter promote response 仍含 `target_object_key`(在 FAA 上)— visionA 不再用、但 converter promote logic 保留 +- converter → FAA 的 OAuth client_credentials 鏈條保留(這條不在本次 scope) + +→ 換句話說,visionA 既走 `/promote`(promote 還在 FAA)+ 也走 `/result`(拿 NEF 給 user download),**兩個 endpoint 都會被 visionA 打**。 + +--- + +## 8. Phase 2 預留的 `/download-tokens` endpoint + +`apps/task-scheduler/README.md` 寫的 Phase 2 預留: + +``` +POST /api/v1/jobs/:id/download-tokens converter:job.read Phase 2 預留,回 501 +``` + +**這個跟 ADR-016 沒衝突、保留**。`/download-tokens` 是給未來 browser 直連 converter download 用的 short-TTL token;本次 `/result` 是給 visionA backend 用的 stream proxy 入口。兩個 endpoint 用途不同。 + +--- + +## 9. 簡化版 checklist + +如果你想跳過上面細節、只要可執行 checklist: + +### Phase A — API key middleware +- [ ] 新建 `src/auth/apiKeyMiddleware.js`(複製 §3.2 code) +- [ ] 修 `src/config.js` 加 `converter.apiKey` 欄位、讀 `CONVERTER_API_KEY` env +- [ ] 修 `src/routes/v1/index.js` 把 `requireAuth(scope)` 全改 `requireApiKey()`(如果走純 API key) +- [ ] 加 unit test 對 `apiKeyMiddleware`(happy / missing / wrong key / constant-time 行為) +- [ ] 修 `.env.example` 加 `CONVERTER_API_KEY=` placeholder +- [ ] 修 README.md 認證段落(OAuth → API key) + +### Phase B — `/result` endpoint +- [ ] 新建 `src/routes/v1/result.js`(複製 §4.2 code) +- [ ] 修 `src/routes/v1/index.js` 加 result router wire +- [ ] 加 integration test cover 4 情境(200 / 404 / 409 / 410) +- [ ] 修 `apps/task-scheduler/README.md` 加 `/result` endpoint 描述 + +### Phase C — 部署 +- [ ] `openssl rand -hex 32` 產 stage `CONVERTER_API_KEY` +- [ ] 設到 converter stage `.env` / docker-compose env +- [ ] 設到 visionA stage `.env.stage` `VISIONA_CONVERTER_API_KEY=` +- [ ] redeploy converter +- [ ] curl verify `/result` endpoint 用 API key 可拿 +- [ ] redeploy visionA +- [ ] e2e 跑完整 upload → poll → promote → download + +--- + +## 10. 參考文件(visionA repo) + +| 文件 | 用途 | +|------|------| +| `docs/autoflow/04-architecture/adr/adr-015-server-to-server-api-key.md` v2.1 | 為什麼用 API key、§3.5.1 Go reference middleware(要 port 成 Node) | +| `docs/autoflow/04-architecture/adr/adr-016-download-via-converter.md` v1.0 | 為什麼加 `/result` endpoint、完整 6 個替代方案分析 | +| `docs/autoflow/04-architecture/conversion.md` v0.6.1 §2.3 | ConverterClient.GetResult method spec(visionA 端) | +| `docs/autoflow/04-architecture/api/api-conversion.md` v0.6 §4 | download endpoint 對外契約(visionA backend → browser) | + +--- + +## 11. 給未來自己(jimchen)的提醒 + +1. **不要 assume MC team 會配合**:5/2 寫 ADR-014 時 assume MC 有 delegated token endpoint、結果根本沒有。**動 MC 之前先 grep MC source**。 +2. **converter / FAA / MC 三方都 grep 一次**:以後做任何跨 repo integration design,**寫 ADR 前先 grep 每一方的 source code 確認 endpoint 真的存在**。 +3. **e2e 要早跑**:Phase 0.8 設計到 5/4 完工、5/9 才實機跑 e2e 撞牆。**整合 design 完成、code 還沒寫前、先用 curl 跑一遍真實 e2e**(可以一些 endpoint mock、但起碼確認 auth / token / scope 鏈通)。 diff --git a/docs/autoflow/04-architecture/TDD.md b/docs/autoflow/04-architecture/TDD.md index 223c621..0b904a0 100644 --- a/docs/autoflow/04-architecture/TDD.md +++ b/docs/autoflow/04-architecture/TDD.md @@ -1,1392 +1,294 @@ -# TDD — Kneron Model Converter 對外 API(Phase 1) +# TDD 索引 — Kneron Model Converter 對外 API ## 作者:Architect Agent -## 狀態:Draft(三方交叉審閱前) -## 最後更新:2026-04-25 -## 配套文件: -- `design-doc.md`(架構決策) -- `../02-prd/PRD.md`(需求) -- `../03-design/design-review.md`(UX 回饋) +## 狀態:Draft(Phase 0.8b 重寫 + 模組化) +## 最後更新:2026-05-16 -本 TDD 聚焦 Phase 1 實作細節。所有決策背後的「為什麼」請參考 `design-doc.md`。 +> **auth 設計演進**:本 TDD 反映 Phase 0.8b 拍板後的「目標狀態」。完整歷史見 visionA repo `docs/autoflow/04-architecture/adr/adr-015-server-to-server-api-key.md` v2.1 + `adr-016-download-via-converter.md` v1.0。 +> +> **配套**:`design-doc.md`(架構決策)、`../02-prd/PRD.md`(需求)、`../03-design/design-review.md`(UX 回饋)。 + +--- ## 變更歷程 | 日期 | 變更 | 作者 | |------|------|------| -| 2026-04-25 | 初版 Draft 1.0 | Architect Agent | -| 2026-04-25 | 原始模型上傳路徑改為 visionA-backend multipart 直接上傳 Converter;POST /api/v1/jobs 改 multipart/form-data;移除 FAA `getFile()` / `headFile()` / `files:download.read` / `files:metadata.read` 相關內容;TBD-1、input_object_key、input_not_found 相關內容同步移除 | Architect Agent | +| 2026-04-25 | 初版 Draft 1.0(OAuth resource server + promote) | Architect Agent | +| 2026-04-25 | Multipart 上傳路徑改 visionA → converter 直傳;移除 FAA GET/HEAD | Architect Agent | +| 2026-05-16 | **Phase 0.8b 重寫**:visionA → converter 改 API key;新增 `/result` endpoint;OAuth resource server 章節砍除;模組化拆分為索引 + 子檔案 | Architect Agent | --- -## 1. API 規格(Phase 1 必做) +## 1. 文件結構 -### 1.1 通用約定 +本 TDD 在 Phase 0.8b 重寫時拆分為模組化結構: -- **Base URL**:`https:///api/v1`(public vhost,僅此路徑對外) -- **Content-Type**: - - `POST /api/v1/jobs`:`multipart/form-data`(與既有 Web UI `POST /jobs` 一致) - - 其他端點(GET / POST `/promote`):request 為 `application/json; charset=utf-8` - - **所有 response**:`application/json; charset=utf-8` -- **時間格式**:ISO 8601 UTC(例:`2026-04-25T12:00:00Z`) -- **ID 格式**:`job_id` 採 UUIDv4(字串) -- **認證**:`Authorization: Bearer `(除 `/health` 外全部必要) -- **Request ID**:若 client 傳 `X-Request-Id`,回應帶同一值;未傳則 server 產 UUIDv4。所有 log 必須記錄。 -- **速率限制**:per `client_id` 300 req / 5min(header 回 `X-RateLimit-Limit`、`X-RateLimit-Remaining`、`X-RateLimit-Reset`) - -### 1.2 統一錯誤格式 - -所有 4xx / 5xx 回應: - -```json -{ - "error": { - "code": "string_code", - "message": "human readable message (zh-TW)", - "details": { /* 可選,結構視 code 而定 */ }, - "request_id": "uuid-v4" - } -} -``` - -### 1.3 端點清單 - -| 方法 | 路徑 | 說明 | 需 scope | -|------|------|------|---------| -| GET | `/health` | 健康檢查 | — | -| POST | `/api/v1/jobs` | 建立轉檔 job | `converter:job.write` | -| GET | `/api/v1/jobs` | 列出 job(過濾條件)| `converter:job.read` | -| GET | `/api/v1/jobs/:id` | 單一 job 狀態 | `converter:job.read` | -| POST | `/api/v1/jobs/:id/promote` | 搬檔到 File Access Agent | `converter:job.write` | - -**Phase 2 預留(Phase 1 回 501 Not Implemented)**: -| 方法 | 路徑 | 說明 | -|------|------|------| -| POST | `/api/v1/jobs/:id/download-tokens` | 換 delegated download token(待 Member Center)| -| DELETE | `/api/v1/jobs/:id` | 取消 job | - -### 1.4 端點詳細規格 - -#### 1.4.1 `GET /health`(不需 auth) - -**Response 200**: -```json -{ - "service": "kneron-converter-api", - "status": "healthy", - "version": "1.0.0", - "timestamp": "2026-04-25T12:00:00Z", - "dependencies": { - "redis": "connected", - "member_center": "reachable", - "file_access_agent": "reachable" - } -} -``` - -**Response 503**(任一依賴失敗): -```json -{ - "service": "kneron-converter-api", - "status": "unhealthy", - "dependencies": { - "redis": "disconnected", - "member_center": "reachable", - "file_access_agent": "reachable" - } -} -``` - -說明:Member Center / File Access Agent 的可達性檢查可用背景 cache(每 30s 檢查一次),避免 `/health` 自己變慢。 +| 檔案 | 內容 | 目標讀者 | +|------|------|---------| +| `TDD.md`(本檔,索引)| 各章節摘要 + 子檔案連結 | 全部 | +| `auth.md` | API key middleware 設計 + 砍除 OAuth resource server 清單 + 保留 OAuth client | Backend | +| `api/api-jobs.md` | `POST/GET /jobs`、`GET /jobs/:id` 規格 | Backend、Reviewer、Testing | +| `api/api-promote.md` | `POST /jobs/:id/promote` 規格 | Backend、Reviewer、Testing | +| `api/api-result.md` | **新增** `GET /jobs/:id/result` 規格 | Backend、Reviewer、Testing | +| `database.md` | Redis schema + 索引 + Lua script | Backend | +| `infra.md` | Nginx / docker-compose / .env 變動 + 部署順序 | Backend、DevOps | +| `performance.md` | SLO + 延遲預算 + 負載測試 | Backend、Testing | +| `observability.md` | Log 格式 + 敏感資料保護 + 告警 | Backend | +| `security.md` | Trust boundary + Input validation + Auth security | 全部 | +| `design-doc.md` | 架構決策 + ADR | 全部 | --- -#### 1.4.2 `POST /api/v1/jobs` +## 2. Phase 0.8b 改動摘要 -**Request**: -```http -POST /api/v1/jobs -Authorization: Bearer -Content-Type: multipart/form-data; boundary=----WebKitFormBoundary... -X-Request-Id: (optional) +### 2.1 對外 auth 改 API key -------WebKitFormBoundary... -Content-Disposition: form-data; name="model"; filename="model.onnx" -Content-Type: application/octet-stream +- 砍 `auth/middleware.js`(OAuth resource server)+ `auth/jwks.js` +- 加 `auth/apiKeyMiddleware.js` +- 4 個既有 endpoint 改掛 `requireApiKey()` +- 新加 `/result` endpoint 也用 `requireApiKey()` - -------WebKitFormBoundary... -Content-Disposition: form-data; name="ref_images[]"; filename="img_0.jpg" -Content-Type: image/jpeg +詳見 `auth.md` §1 + §3(砍除清單)。 - -------WebKitFormBoundary... -Content-Disposition: form-data; name="user_id" +### 2.2 新增 `/result` endpoint -visionA-user-12345 -------WebKitFormBoundary... -Content-Disposition: form-data; name="model_id" +- `GET /api/v1/jobs/:id/result` +- Streaming proxy NEF from MinIO → caller +- 4 種 4xx + 2 種 5xx 情境 +- 雙路徑 NEF key 解析(新格式 + 舊格式向後相容) +- **2026-05-17 補充**:rate limit(60 req/min,獨立 bucket)、Range header 防護(silently ignore)、audit log 8 個 action、Backend `source_filename` 寫入 acceptance criteria(§9-§14) -1001 -------WebKitFormBoundary... -Content-Disposition: form-data; name="version" +詳見 `api/api-result.md`。 -0001 -------WebKitFormBoundary... -Content-Disposition: form-data; name="platform" +### 2.3 保留不動 -520 -------WebKitFormBoundary... -Content-Disposition: form-data; name="enable_evaluate" +- Promote 流程(converter → FAA 仍走 OAuth client_credentials) +- Redis schema(除確認 `source_filename` 欄位存在) +- Worker、MinIO bucket、Nginx 結構 -false -------WebKitFormBoundary...-- -``` +詳見 `auth.md` §2 + `api/api-promote.md`。 -**Multer 設定**: -- `multer.memoryStorage()`(與既有 Web UI `POST /jobs` 一致) -- `limits.fileSize`: 500MB(`model` 單檔上限) -- `fields`: `model`(1 個 file)、`ref_images[]`(`maxCount: 100`) +### 2.4 Config 變動 -**欄位定義**: +- 移除:`MEMBER_CENTER_ISSUER` / `MEMBER_CENTER_JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*` / `JWT_CLOCK_TOLERANCE_SEC` +- 新增:`CONVERTER_API_KEY` +- 保留:`MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_*` / `FILE_ACCESS_AGENT_*` / `OAUTH_*` -| 欄位 | 類型 | 位置 | 必填 | 驗證 | +詳見 `infra.md` §3。 + +--- + +## 3. 系統概述 + +### 3.1 角色 + +- **Converter(本專案)**:Node.js Task Scheduler + Python Worker +- **visionA-backend**:Go 服務,Converter 對外 API 的**唯一** caller +- **Member Center(MC)**:OAuth authorization server — Phase 0.8b 後**只**給 Converter → FAA promote 用 +- **File Access Agent(FAA)**:NAS 邊界檔案閘道,single-tenant per instance + +### 3.2 API 端點清單(Phase 0.8b 後) + +| 方法 | 路徑 | Auth | 說明 | 規格 | |------|------|------|------|------| -| `model` | file | multipart file | ✅ | 副檔名 ∈ {`.onnx`, `.pt`, `.pth`, `.tflite`, `.h5`, `.pb`};大小 ≤ 500MB | -| `ref_images[]` | file[] | multipart file | ❌ | `image/*`;最多 100 張;與既有 Web UI 規則一致 | -| `user_id` | string | multipart field | ✅ | 1-128 字元,不含 `/`、`\`、`..`,VisionA 端決定格式 | -| `model_id` | string → int | multipart field | ✅ | 轉 int 後 1 ≤ x ≤ 65535 | -| `version` | string | multipart field | ✅ | 1-32 字元,建議數字字串 | -| `platform` | string | multipart field | ✅ | enum: `520`, `720`, `530`, `630`, `730` | -| `enable_evaluate` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` | -| `enable_sim_fp` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` | -| `enable_sim_fixed` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` | -| `enable_sim_hw` | string `'true'` / `'false'` | multipart field | ❌ | 預設 `'false'` | -| `metadata` | string(JSON)| multipart field | ❌ | 若傳入,需為合法 JSON 物件字串;未來擴展用 | +| GET | `/health` | — | 健康檢查 | `api/api-jobs.md` §3 | +| POST | `/api/v1/jobs` | API key | 建立 job | `api/api-jobs.md` §4 | +| GET | `/api/v1/jobs` | API key | 列表 / Recovery | `api/api-jobs.md` §6 | +| GET | `/api/v1/jobs/:id` | API key | 單一 job 狀態 | `api/api-jobs.md` §5 | +| POST | `/api/v1/jobs/:id/promote` | API key | 搬檔到 FAA | `api/api-promote.md` | +| GET | `/api/v1/jobs/:id/result` | API key | **NEW** stream NEF | `api/api-result.md` | +| POST | `/api/v1/jobs/:id/download-tokens` | API key | Phase 2,回 501 | `api/api-jobs.md` §7 | +| DELETE | `/api/v1/jobs/:id` | API key | Phase 2,回 501 | `api/api-jobs.md` §7 | -**注意事項**: -- multipart 中所有 field value 都是字串,server 端需將 `'true'` / `'false'` → boolean,`model_id` → integer。 -- 與既有 Web UI `POST /jobs` multipart 欄位完全對齊,`user_id` 是對外 API 新增的欄位(Web UI 不需要)。 -- Validation 順序:先驗 OAuth token、再驗 multipart(避免未驗證就吃大檔)。實作上建議把 `requireAuth` middleware 放在 `multer` middleware 之前,這樣無效 token 會在 multer 開始 parse 前就被拒。 +### 3.3 既有路徑(Phase 0.8b 不動) -**Response 201 Created**: -```json -{ - "job_id": "550e8400-e29b-41d4-a716-446655440000", - "status": "created", - "stage": "onnx", - "progress": 0, - "created_at": "2026-04-25T12:00:00Z", - "expires_at": "2026-05-02T12:00:00Z", - "user_id": "visionA-user-12345" -} -``` +| 方法 | 路徑 | 用途 | +|------|------|------| +| POST | `/jobs` (multipart) | Web UI 既有上傳 | +| GET | `/jobs/:id` | Web UI 狀態查詢 | +| GET | `/jobs/:id/events` (SSE) | Web UI 進度 push | +| GET | `/jobs/:id/download/:filename` | Web UI 下載 | +| GET | `/queues/stats` | 內部監控 | -**錯誤回應**: - -| 狀態 | error.code | 情境 | -|------|-----------|------| -| 400 | `validation_error` | 欄位缺漏或格式錯誤(`details.field` 列出問題欄位)| -| 400 | `invalid_multipart` | multipart parse 失敗、缺必要 file / field、副檔名不符 | -| 401 | `invalid_token` | JWT 無效 / 過期 / 缺 claim | -| 403 | `insufficient_scope` | token 缺 `converter:job.write`(`details.required_scope`)| -| 403 | `tenant_mismatch` | token 的 `tenant_id` 與 Converter 設定不符 | -| 409 | `user_has_active_job` | user_id 已有進行中 job(詳見 §1.5)| -| 413 | `file_too_large` | 上傳檔案超過 500MB(由 multer `LIMIT_FILE_SIZE` 觸發)| -| 500 | `misconfiguration` | `STORAGE_BACKEND !== 'minio'` 等 | -| 500 | `internal_error` | 其他 | +這些走 internal vhost,不對外、不加 auth。 --- -#### 1.4.3 `GET /api/v1/jobs/:id` +## 4. 技術堆疊(不變) -**Request**: -```http -GET /api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 -Authorization: Bearer -If-None-Match: "etag-value" (optional) -``` +| 層級 | 選擇 | +|------|------| +| 後端框架 | Node.js 18 + Express 4 | +| 認證(對外)| API key(`crypto.timingSafeEqual`,Phase 0.8b 新)| +| 認證(promote)| OAuth client_credentials(jose / 自寫 fetch)| +| 資料庫 | Redis 7 | +| 物件儲存 | MinIO(Converter Bucket) | +| Worker | Python 3.10+ | +| 反向代理 | Nginx | +| 測試 | Jest | -**Response 200 OK**: -```json -{ - "job_id": "550e8400-e29b-41d4-a716-446655440000", - "user_id": "visionA-user-12345", - "status": "running", - "stage": "bie", - "progress": 45, - "stage_progress": 60, - "created_at": "2026-04-25T12:00:00Z", - "updated_at": "2026-04-25T12:05:30Z", - "expires_at": "2026-05-02T12:00:00Z", - "stage_timings": { - "onnx": { "started_at": "2026-04-25T12:00:05Z", "completed_at": "2026-04-25T12:02:10Z" }, - "bie": { "started_at": "2026-04-25T12:02:15Z", "completed_at": null }, - "nef": null - }, - "input": { - "filename": "model.onnx", - "object_key": "jobs/550e8400-e29b-41d4-a716-446655440000/input/model.onnx", - "size_bytes": 204800000, - "ref_images_count": 0 - }, - "result_object_keys": null, - "error": null, - "parameters": { - "model_id": 1001, - "version": "0001", - "platform": "520", - "enable_evaluate": false, - "enable_sim_fp": false, - "enable_sim_fixed": false, - "enable_sim_hw": false - }, - "metadata": { - "source": "visionA-web", - "tags": ["experiment-001"] - }, - "estimated_completion_at": null -} -``` - -**狀態機**(`status` 欄位): - -- `created` — 剛建立,等第一階段開工 -- `running` — 正在某個 stage(`stage` 欄位有值) -- `completed` — 全部完成(`result_object_keys` 有值,`stage=null`) -- `failed` — 失敗(`error` 有值) - -**完成時的 `result_object_keys`**(在 Converter Bucket 的 key): -```json -"result_object_keys": { - "onnx": "jobs/{job_id}/output/out.onnx", - "bie": "jobs/{job_id}/output/out.bie", - "nef": "jobs/{job_id}/output/out.nef" -} -``` - -**失敗時的 `error`**: -```json -"error": { - "stage": "bie", - "code": "quantization_failed", - "message": "參考圖片不足或格式不符,BIE 量化階段失敗", - "details": { "raw": "..." } -} -``` - -**Response 304 Not Modified**:若 `If-None-Match` 吻合當前 ETag(ETag 建議為 `updated_at` 的 hash)。 - -**錯誤回應**: - -| 狀態 | error.code | 情境 | -|------|-----------|------| -| 401/403 | 同上 | — | -| 404 | `job_not_found` | job 不存在,或不屬於呼叫 client_id(避免資訊洩露)| +詳見 `design-doc.md` §3.5。 --- -#### 1.4.4 `GET /api/v1/jobs`(列表 / Recovery) - -**Query 參數**: - -| 參數 | 類型 | 必填 | 說明 | -|------|------|------|------| -| `user_id` | string | ❌ | 過濾 user_id(Recovery 必用)| -| `status` | string | ❌ | `in_progress`(= `created` ∪ `running`), `completed`, `failed`, `all`(預設 `all`)| -| `limit` | int | ❌ | 預設 20,上限 100 | -| `offset` | int | ❌ | 預設 0 | -| `created_after` | ISO 8601 | ❌ | 過濾 `created_at >= created_after` | - -**Response 200**: -```json -{ - "total": 2, - "limit": 20, - "offset": 0, - "items": [ - { /* 同 GET /jobs/:id 格式,但 items 為精簡版:可省 stage_timings.details、metadata */ } - ] -} -``` - -**實作注意**:以 `user:{user_id}:jobs` Set 為索引,避免全掃 `KEYS job:*`(採納 Design 4.1.2 建議)。 - ---- - -#### 1.4.5 `POST /api/v1/jobs/:id/promote` - -**Request**: -```http -POST /api/v1/jobs/550e8400-.../promote -Authorization: Bearer -Content-Type: application/json - -{ - "targets": [ - { - "source": "nef", - "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef" - }, - { - "source": "bie", - "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie" - } - ] -} -``` - -**欄位定義**: - -| 欄位 | 類型 | 必填 | 說明 | -|------|------|------|------| -| `targets` | array | ✅ | 要 promote 的檔案清單(至少 1 個)| -| `targets[].source` | string | ✅ | enum: `onnx`, `bie`, `nef` — 對應 job 輸出檔 | -| `targets[].target_object_key` | string | ✅ | File Access Agent 的目標 key(VisionA 決定命名)| - -**Response 200 OK**: -```json -{ - "job_id": "550e8400-...", - "promoted": [ - { - "source": "nef", - "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef", - "size_bytes": 10485760, - "file_access_agent_etag": "abc123", - "promoted_at": "2026-04-25T12:30:00Z" - }, - { - "source": "bie", - "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie", - "size_bytes": 5242880, - "file_access_agent_etag": "def456", - "promoted_at": "2026-04-25T12:30:02Z" - } - ] -} -``` - -**錯誤回應**: - -| 狀態 | error.code | 情境 | -|------|-----------|------| -| 400 | `validation_error` | targets 格式錯、source 非合法 stage | -| 404 | `job_not_found` | 同上 | -| 409 | `job_not_ready_for_promote` | `status != completed`(`details.current_status`)| -| 409 | `source_not_available` | job 沒產這個 stage 的結果(例如只跑 onnx 但要 promote nef)| -| 502 | `file_gateway_unavailable` | File Access Agent PUT 失敗 | -| 503 | `auth_service_unavailable` | 取 Converter 自己 token 失敗 | - -**重試語意**:`promote` 是冪等的(同樣 target_object_key PUT 兩次結果一樣,File Access Agent 會覆蓋)。Converter Bucket 檔案在 7 天內保留,允許重試。 - -### 1.5 重要錯誤 payload 範例 - -#### `user_has_active_job`(採納 Design 建議) - -```json -{ - "error": { - "code": "user_has_active_job", - "message": "使用者目前已有進行中的轉檔任務", - "details": { - "active_job_id": "550e8400-...", - "active_job_status": "running", - "active_job_stage": "bie", - "active_job_progress": 45, - "active_job_created_at": "2026-04-25T12:00:00Z" - }, - "request_id": "req-uuid" - } -} -``` - -#### `insufficient_scope` - -```json -{ - "error": { - "code": "insufficient_scope", - "message": "token 缺少必要權限", - "details": { - "required_scope": "converter:job.write", - "provided_scopes": ["converter:job.read"] - }, - "request_id": "req-uuid" - } -} -``` - ---- - -## 2. Task Scheduler 改造 - -### 2.1 目錄結構建議 +## 5. 專案結構 ``` apps/task-scheduler/ -├── server.js ← 既有,只作為 entry(初始化 + mount routes) +├── server.js ← Entry ├── src/ -│ ├── config.js ← 新:集中讀取所有 env(fail fast) -│ ├── redis.js ← 新:Redis client + helper +│ ├── config.js ← 集中讀 env(Phase 0.8b 改) +│ ├── redis.js ← Redis client │ ├── auth/ -│ │ ├── jwks.js ← 新:JWKS cache + JWT 驗證 -│ │ ├── middleware.js ← 新:Express middleware(驗 token + scope) -│ │ └── oauthClient.js ← 新:Converter 作為 OAuth client(token cache) +│ │ ├── apiKeyMiddleware.js ← 【新】Phase 0.8b +│ │ ├── oauthClient.js ← 【保留】promote 用 +│ │ ├── middleware.js ← 【砍】OAuth resource server +│ │ └── jwks.js ← 【砍】 │ ├── fileAccessAgent/ -│ │ ├── client.js ← 新:File Access Agent HTTP client(僅 PUT,promote 用) -│ │ └── errors.js ← 新:錯誤翻譯 +│ │ ├── client.js ← FAA HTTP client(保留) +│ │ └── errors.js ← 錯誤翻譯(保留) │ ├── routes/ -│ │ ├── legacy.js ← 既有路由(/jobs, /jobs/:id, /jobs/:id/events, ...) +│ │ ├── legacy.js ← 既有 /jobs/* 路由 │ │ └── v1/ -│ │ ├── index.js ← mount 新路由 -│ │ ├── jobs.js ← POST/GET /api/v1/jobs, GET /:id -│ │ └── promote.js ← POST /api/v1/jobs/:id/promote +│ │ ├── index.js ← v1 router 組裝(要改 wire result + 換 auth middleware) +│ │ ├── jobs.js ← POST/GET(要改換 requireApiKey) +│ │ ├── promote.js ← POST promote(要改換 requireApiKey) +│ │ └── result.js ← 【新】Phase 0.8b │ ├── services/ -│ │ ├── jobService.js ← 新:封裝 job CRUD、user 索引、active job 檢查 -│ │ └── doneListener.js ← 既有 listenDoneQueue 抽成 module +│ │ ├── jobService.js ← Job CRUD +│ │ └── doneListener.js ← Worker done event │ ├── middleware/ -│ │ ├── errorHandler.js ← 新:統一錯誤格式 -│ │ └── requestId.js ← 新:X-Request-Id +│ │ ├── errorHandler.js ← 統一錯誤 +│ │ └── requestId.js │ └── utils/ -│ └── logger.js ← 新:結構化 log -├── package.json -└── Dockerfile -``` - -**實作原則**:保守重構,既有功能不改語意,只「移動 + 抽象」。 - -### 2.2 auth middleware(T1) - -```javascript -// src/auth/middleware.js 骨架 - -const { verifyJwt, InsufficientScopeError } = require('./jwks'); -const config = require('../config'); - -function requireAuth(requiredScope) { - return async (req, res, next) => { - try { - const authHeader = req.headers.authorization || ''; - const match = authHeader.match(/^Bearer\s+(.+)$/); - if (!match) { - return sendError(res, 401, 'invalid_token', 'Missing bearer token', req); - } - - const token = match[1]; - const claims = await verifyJwt(token, { - issuer: config.memberCenter.issuer, - audience: config.converter.audience, - clockSkew: 60, - }); - - // scope 檢查 - const scopes = (claims.scope || '').split(' ').filter(Boolean); - if (!scopes.includes(requiredScope)) { - return sendError(res, 403, 'insufficient_scope', 'Missing required scope', req, { - required_scope: requiredScope, - provided_scopes: scopes, - }); - } - - // tenant 檢查(可選) - if (config.converter.tenantId && claims.tenant_id) { - if (claims.tenant_id !== config.converter.tenantId) { - return sendError(res, 403, 'tenant_mismatch', 'Tenant mismatch', req); - } - } - - // 記錄 claim 到 req 供下游使用 - req.auth = { - clientId: claims.client_id || claims.sub, - tenantId: claims.tenant_id || null, - scopes, - tokenClaims: claims, - }; - - next(); - } catch (err) { - // 具體錯誤類型處理 - if (err.code === 'ERR_JWT_EXPIRED') { - return sendError(res, 401, 'token_expired', 'Token expired', req); - } - if (err.code === 'ERR_JWKS_NO_MATCHING_KEY') { - return sendError(res, 401, 'invalid_token', 'Signature verification failed', req); - } - return sendError(res, 401, 'invalid_token', 'Token verification failed', req); - } - }; -} -``` - -### 2.3 JWKS cache(T1) - -採用 `jose` npm 套件的 `createRemoteJWKSet`,內建 TTL cache 與 stale-while-revalidate。 - -```javascript -// src/auth/jwks.js - -const { createRemoteJWKSet, jwtVerify } = require('jose'); -const config = require('../config'); - -const jwks = createRemoteJWKSet(new URL(config.memberCenter.jwksUrl), { - cacheMaxAge: 10 * 60 * 1000, // 10 min - cooldownDuration: 30 * 1000, // 30s 內不重複 refresh -}); - -async function verifyJwt(token, { issuer, audience, clockSkew }) { - const { payload } = await jwtVerify(token, jwks, { - issuer, - audience, - clockTolerance: clockSkew, - }); - return payload; -} - -module.exports = { verifyJwt }; -``` - -### 2.4 OAuth client(T2) - -```javascript -// src/auth/oauthClient.js - -const config = require('../config'); - -class OAuthClient { - constructor() { - this._cache = new Map(); // scope-key -> { token, expiresAt } - } - - async getToken(scope) { - const key = scope; - const cached = this._cache.get(key); - if (cached && cached.expiresAt - 60000 > Date.now()) { - return cached.token; - } - - const params = new URLSearchParams({ - grant_type: 'client_credentials', - client_id: config.converter.clientId, - client_secret: config.converter.clientSecret, - scope, - audience: config.fileAccessAgent.audience, - }); - - const res = await fetch(config.memberCenter.tokenUrl, { - method: 'POST', - headers: { 'Content-Type': 'application/x-www-form-urlencoded' }, - body: params.toString(), - }); - if (!res.ok) { - throw new Error(`token endpoint ${res.status}`); - } - const data = await res.json(); - const entry = { - token: data.access_token, - expiresAt: Date.now() + (data.expires_in || 3600) * 1000, - }; - this._cache.set(key, entry); - return entry.token; - } - - invalidate(scope) { - this._cache.delete(scope); - } -} - -module.exports = new OAuthClient(); -``` - -**錯誤處理**:呼叫端 catch 到失敗時回 503 `auth_service_unavailable`。 - -### 2.5 File Access Agent client(T6) - -Phase 1 Converter 只在 `promote` 階段呼叫 File Access Agent(寫入結果檔),**不需要 HEAD / GET**。 - -```javascript -// src/fileAccessAgent/client.js - -const config = require('../config'); -const oauthClient = require('../auth/oauthClient'); - -async function putFile(objectKey, stream, { contentType, contentLength }) { - const token = await oauthClient.getToken('files:upload.write'); - const res = await fetch( - `${config.fileAccessAgent.baseUrl}/files/${encodeURI(objectKey)}`, - { - method: 'PUT', - headers: { - Authorization: `Bearer ${token}`, - 'Content-Type': contentType, - 'Content-Length': String(contentLength), - }, - body: stream, - duplex: 'half', // Node 18 stream body 需要 - } - ); - if (!res.ok) throw new FAAError(res.status, await res.text()); - return await res.json(); -} - -module.exports = { putFile }; -``` - -**大檔 stream 處理(promote 用)**:從 MinIO `GetObjectCommand` 的 Body(stream)直接 pipe 到 fetch PUT body,確保不把整個結果檔載入記憶體。`POST /api/v1/jobs/:id/promote` 流程: - -``` -MinIO GetObjectCommand.Body (stream) - ↓ pipe -fetch PUT body (stream, duplex: 'half') - ↓ -File Access Agent -``` - -### 2.6 新路由群(T3) - -```javascript -// src/routes/v1/index.js - -const express = require('express'); -const jobsRouter = require('./jobs'); -const promoteRouter = require('./promote'); -const { requireAuth } = require('../../auth/middleware'); -const { apiV1RateLimit } = require('../../middleware/rateLimit'); - -const router = express.Router(); - -router.use(apiV1RateLimit); - -router.post('/jobs', requireAuth('converter:job.write'), jobsRouter.create); -router.get('/jobs', requireAuth('converter:job.read'), jobsRouter.list); -router.get('/jobs/:id', requireAuth('converter:job.read'), jobsRouter.get); -router.post('/jobs/:id/promote', requireAuth('converter:job.write'), promoteRouter.promote); - -// Phase 2 預留 -router.post('/jobs/:id/download-tokens', requireAuth('converter:job.read'), (req, res) => { - res.status(501).json({ - error: { code: 'not_implemented', message: 'Phase 2 功能,待 Member Center 補完', request_id: req.requestId }, - }); -}); -router.delete('/jobs/:id', requireAuth('converter:job.write'), (req, res) => { - res.status(501).json({ - error: { code: 'not_implemented', message: '尚未實作', request_id: req.requestId }, - }); -}); - -module.exports = router; -``` - -### 2.7 Redis 資料模型改造 - -#### 2.7.1 Job record(JSON,key = `job:{id}`)新增欄位 - -```jsonc -{ - // 既有欄位 - "job_id": "uuid", - "created_at": "...", - "updated_at": "...", - "status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 注意:舊 Web UI 仍用大寫狀態 - "stage": "onnx | bie | nef | null", - "progress": 0, - "parameters": { /* model_id, version, platform, options */ }, - "output": { "bie_path": null, "nef_path": null }, - "error": null, - - // 新增欄位(Phase 1) - "origin": "api | web", // 來自新 API 或舊 Web UI - "user_id": "visionA-user-12345", - "tenant_id": "uuid-or-null", - "created_by_client_id": "kneron_converter_client_abc", - "input": { - "filename": "model.onnx", // multipart 原始檔名 - "object_key": "jobs/{job_id}/input/model.onnx", // Converter Bucket 內的 key - "size_bytes": 204800000, - "ref_images_count": 0 - }, - "stage_timings": { - "onnx": { "started_at": "...", "completed_at": "..." }, - "bie": { "started_at": "...", "completed_at": null }, - "nef": null - }, - "stage_progress": 0, // 0-100,當前 stage 內進度(Worker 推上來) - "expires_at": "2026-05-02T12:00:00Z", - "metadata": {} -} -``` - -**關於 `status` 大小寫**:既有 Web UI 會讀大寫(`ONNX`, `COMPLETED` 等)。新 API 對外回傳時需要**映射為小寫語意化狀態**(`created`, `running`, `completed`, `failed`)。映射表: - -| 內部 status | 對外 `status` + `stage` | -|------------|----------------------| -| `ONNX` | `running` + stage=`onnx` | -| `BIE` | `running` + stage=`bie` | -| `NEF` | `running` + stage=`nef` | -| `COMPLETED` | `completed` + stage=`null` | -| `FAILED` | `failed` + stage=<失敗時的 stage> | - -**注意**:既有 Scheduler `advanceJob` 把初始狀態設 `ONNX`,不區分「created」。新 API 建 job 後、onnx worker 接到前,依然是 `ONNX`。此時對外狀態應回 `created`(stage=onnx 但 stage_timings.onnx.started_at 為 null)。**實作上以 `stage_timings.onnx.started_at == null` 判斷是 `created` 還是 `running`。** - -#### 2.7.2 User 索引(新) - -| Key | 類型 | 用途 | TTL | -|-----|------|------|-----| -| `user:{user_id}:jobs` | Set | 該 user 所有 job_id(不分狀態) | 每次寫入時 `EXPIRE 7d` | -| `user:{user_id}:active_job` | String | 當前 in-progress job_id(= `created` 或 `running`)| 隨 job 結束刪除 | - -**寫入時機**(原子性用 MULTI 包): - -``` -建立 job: - MULTI - SET job:{id} {...} - SADD user:{user_id}:jobs {id} - EXPIRE user:{user_id}:jobs 604800 - SETNX user:{user_id}:active_job {id} # NX 是同使用者鎖的關鍵 - EXEC - - 若 SETNX 回 0 → 衝突,回滾(DEL job:{id}、SREM user:{user_id}:jobs {id}),回 409 - 若 SETNX 回 1 → 成功 - -完成 / 失敗時: - MULTI - SET job:{id} {...} - DEL user:{user_id}:active_job - EXEC - - 僅在 active_job 的 value 等於當前 job_id 時才 DEL(用 WATCH 或 Lua script 確保) -``` - -**Lua script(建議)**:確保「檢查 + 設 active + 寫 job」的原子性。 - -```lua --- claim_active_job.lua --- KEYS[1] = user:{user_id}:active_job --- KEYS[2] = job:{job_id} --- KEYS[3] = user:{user_id}:jobs --- ARGV[1] = job_id --- ARGV[2] = job_json --- ARGV[3] = ttl_seconds - -if redis.call('EXISTS', KEYS[1]) == 1 then - return {'conflict', redis.call('GET', KEYS[1])} -end -redis.call('SET', KEYS[1], ARGV[1]) -redis.call('SET', KEYS[2], ARGV[2]) -redis.call('SADD', KEYS[3], ARGV[1]) -redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3])) -return {'ok'} -``` - -#### 2.7.3 避免 `KEYS *` 的實作 - -**錯誤做法**(既有 code 有用,但新 API 不用): -```javascript -const keys = await redis.keys('job:*'); // O(N) 阻塞 Redis -``` - -**新 API 列表查詢**: -```javascript -async function listJobsByUser(userId, { status, limit, offset }) { - const ids = await redis.smembers(`user:${userId}:jobs`); - const pipeline = redis.pipeline(); - for (const id of ids) pipeline.get(`job:${id}`); - const results = await pipeline.exec(); - let jobs = results.map(([err, raw]) => JSON.parse(raw)).filter(Boolean); - // status 過濾 - if (status === 'in_progress') { - jobs = jobs.filter(j => ['created', 'running'].includes(mapStatus(j))); - } else if (status && status !== 'all') { - jobs = jobs.filter(j => mapStatus(j) === status); - } - // 排序、分頁 - jobs.sort((a, b) => new Date(b.created_at) - new Date(a.created_at)); - return { total: jobs.length, items: jobs.slice(offset, offset + limit) }; -} -``` - -### 2.8 POST /api/v1/jobs 流程(T4) - -``` -1. requireAuth('converter:job.write') — middleware 驗 token(放在 multer 之前,避免未驗證就吃大檔) -2. multer 中介層處理 multipart(memoryStorage,fileSize=500MB): - - req.files.model[0](required) - - req.files['ref_images[]'] / req.files.ref_images(optional, maxCount=100) - - req.body.user_id / model_id / version / platform / enable_* - ├── LIMIT_FILE_SIZE → 413 file_too_large - ├── multer 其他錯誤 → 400 invalid_multipart - └── ok → 繼續 -3. 驗證 fields(joi / zod / 手寫): - - user_id, model_id, version, platform 必填 - - enable_* 轉 boolean - - model 檔副檔名白名單 - ├── 失敗 → 400 validation_error(details.field) -4. 檢查 STORAGE_BACKEND === 'minio' - ├── 否 → 500 misconfiguration -5. 生成 job_id(UUIDv4) -6. 嘗試 claim_active_job Lua script(見 §2.7.2) - ├── conflict → 回 409 user_has_active_job + 當前 active job 詳情 - └── ok → 繼續 -7. 同步寫入 MinIO(Converter Bucket): - - jobs/{job_id}/input/{sanitized_model_filename} ← req.files.model[0].buffer - - jobs/{job_id}/ref_images/{index}_{sanitized_filename} ← 每個 ref_image.buffer - - 失敗 → 回滾(DEL job:{id}, DEL user:{user_id}:active_job, SREM user:{user_id}:jobs {id}),回 502 `storage_unavailable` -8. 更新 job record(補 input.object_key、size_bytes、ref_images_count、stage_timings.onnx.started_at=now) -9. enqueueStage('onnx', job) -10. 回 201 + { job_id, status: 'created', ... } -``` - -**關鍵**: -- Auth middleware 必須在 multer 之前,避免未驗證就 parse 500MB 大檔 -- 第 7 步若任一檔案寫 MinIO 失敗必須回滾,避免 Redis 有 job 但 MinIO 沒檔 -- `claim_active_job` 之後才寫 MinIO,避免拿到鎖但 MinIO 失敗時還要補回滾 MinIO(順序:驗證 → 鎖 → 寫檔 → enqueue) - -**time complexity**:SLA p95 < 5s(200MB @ 50MB/s ≈ 4s multipart + 1s MinIO write)。500MB 檔案 ~12s(見 design-doc §6.1)。 - -### 2.9 GET /api/v1/jobs/:id 流程(T5) - -``` -1. requireAuth('converter:job.read') -2. 讀 job:{id} -3. 若不存在,回 404 job_not_found -4. 若 job.created_by_client_id !== req.auth.clientId → 回 404(不洩露) -5. 計算 ETag = hash(job.updated_at),若 If-None-Match 吻合 → 304 -6. 映射內部 status → 對外 status + stage -7. 回 200 + 序列化 response -``` - -### 2.10 promote 流程(T6) - -``` -1. requireAuth('converter:job.write') -2. 驗 body(targets 格式) -3. 讀 job:{id}(+ client 隔離檢查) -4. 若 status != 'completed' → 409 job_not_ready_for_promote -5. 對每個 target: - a. 從 Converter Bucket 讀結果檔(stream) - b. faa.putFile(target.target_object_key, stream, ...) - c. 記錄 promoted_at / etag / size -6. 全部成功 → 回 200 + promoted[] -7. 部分失敗 → 回 502,details 標注哪些成功 / 失敗 -``` - -**冪等性**:promote 是冪等的(File Access Agent PUT 會覆蓋),可以重試。 - -### 2.11 Done listener 的改造 - -既有 `listenDoneQueue` 收到 worker done 事件時呼叫 `advanceJob`。新改動: - -- `advanceJob` 在 status 變化時同步更新 `stage_timings` -- 完成時自動 `DEL user:{user_id}:active_job`(Lua script 保證原子性) -- 失敗時同上 - -### 2.12 /health 升級 - -既有 `/health` 只檢查 Redis。新版加上: -- Member Center reachability(`GET /.well-known/openid-configuration`,背景 30s 一次,cache 結果) -- File Access Agent reachability(`GET /health`,同上) -- 回應 503 if 任一 critical dependency 異常 - ---- - -## 3. Worker 改造 - -**Phase 1 決定:Worker 不大改。** - -既有 `services/workers/s3_storage.py` 已支援從 MinIO 讀寫。Worker 只要看到 input 在 `jobs/{job_id}/input/` 路徑就開工,不需要知道 File Access Agent 的存在。 - -唯一需要改動的: - -1. **stage_progress 回報**(可選):Worker 處理過程中若能回報階段內進度(例如 30%、60%),可透過一個新的 Redis Stream `queue:progress` 推給 Scheduler。Phase 1 可先全回 0 或 100,後續增強。 -2. **`stage_timings` 的 started_at**:Worker 接到任務時用既有 done event 前,先寫一個 `stage_started` event。或者更簡單的做法:Scheduler 在 `enqueueStage` 時寫 `stage_timings.{stage}.started_at = now`。**建議採後者**,Worker 不動。 - ---- - -## 4. 資料模型與索引 - -### 4.1 為什麼不用 PostgreSQL - -- Phase 1 的資料模式簡單:job 是 state machine,user index 是 key-value -- 既有哲學是「Crash 即 Reset」,PG 會引入反向的持久化語意,反而變複雜 -- Redis Set 做 user 索引足以應付預期量(per user < 10 jobs / 7 天) -- 未來若要跨 Crash recovery / 多 instance HA,再評估 PG - -### 4.2 Redis 記憶體預估 - -- 每個 job record 約 2-4 KB(含 stage_timings 等) -- 每個 user index Set 每個元素 < 40 bytes -- 1000 並發 user × 10 jobs = 10k job record ≈ 40 MB(Redis 輕鬆) -- Converter Bucket lifecycle 7 天,Redis 也跟著 TTL 7 天,記憶體上限可控 - ---- - -## 5. OAuth 整合細節 - -### 5.1 token 驗證(resource server 身分) - -| Claim | 檢查 | -|-------|------| -| `iss` | 等於 `MC_ISSUER` | -| `aud` | 包含 `KNERON_CONVERTER_AUDIENCE`(支援 array 或 string)| -| `exp` | 未過期(含 60s clock skew)| -| `nbf` | 若有,已到 | -| `scope` | 空白分隔,包含 endpoint 要求的 scope | -| `client_id` | 必須有(記錄用)| -| `tenant_id` | 若有,等於 `CONVERTER_TENANT_ID`(Phase 1 可先 warn-only)| - -**JWKS 快取**:`jose.createRemoteJWKSet` 內建,TTL 10min,30s cooldown。 - -### 5.2 Converter 當 OAuth Client - -- `client_credentials` grant -- Phase 1 只需要一個 scope:`files:upload.write`(`aud=file_access_api`),僅 `promote` 時呼叫 -- Cache key = scope(未來擴充時若新增 scope,自動 per-scope cache) -- expires_in - 60s 時主動 refresh -- 失敗時 catch,轉 503 `auth_service_unavailable` - -### 5.3 Member Center 離線的影響 - -| 場景 | 影響 | 緩解 | -|------|------|------| -| JWKS fetch 失敗 | 新 kid 無法驗證 | cache 內還有舊 kid 的 key,舊 token 可過;新 token 會失敗 | -| token endpoint 失敗 | Converter 無法取新 token 打 File Access Agent(僅 promote 用)| cache 內 token 有效期內無影響;過期後 promote 會失敗 → 503。`POST /api/v1/jobs` 建 job 不受影響(只驗他人 token,不取自己 token)| -| discovery 失敗 | health check 標示 unhealthy | K8s / Docker 重啟不解決,需人工介入 | - ---- - -## 6. File Access Agent 整合 - -### 6.1 Object key 命名約定(建議) - -| 用途 | 建議命名 | 說明 | -|------|---------|------| -| promote 結果到模型庫(File Access Agent)| `visionA/models/{user_id}/{model_id}/v{version}/{filename}` | VisionA 決定 target_object_key(Converter 不強制命名規則)| -| Converter Bucket 內部(原始模型 input)| `jobs/{job_id}/input/{filename}` | Converter 自己管,multipart 上傳後寫入 | -| Converter Bucket 內部(參考圖片)| `jobs/{job_id}/ref_images/{index}_{filename}` | Converter 自己管 | -| Converter Bucket 內部(結果檔)| `jobs/{job_id}/output/{filename}` | Converter 自己管 | - -**約定**: -- `target_object_key`(promote 目標)的命名規則由 VisionA 定義,Converter 只做基本 sanity check(不能有 `..`、反斜線)。 -- Converter Bucket 內部 object key 由 Converter 控制,外部看不到也不需對齊。 -- Phase 1 不涉及 File Access Agent 上原始模型的 object key,該情境已不存在(原始模型直接 multipart 到 Converter)。 - -### 6.2 HTTP headers 一覽 - -| Request | Headers | -|---------|---------| -| PUT /files/{key}(promote 用)| `Authorization: Bearer `, `Content-Type`, `Content-Length` | - -**注意**:Phase 1 Converter 只對 File Access Agent 發 `PUT` 請求(promote 結果檔),不需要 HEAD / GET。 - -### 6.3 失敗重試策略(僅 PUT /files/{key}) - -| 錯誤 | Converter 行為 | -|------|--------------| -| 4xx(client error)| 不重試,直接回對應的 4xx 給 visionA-backend(例如 target_object_key 不合法)| -| 401(token 失效)| 強制 `oauthClient.invalidate('files:upload.write')`,重取 token 重試一次;仍失敗 → 503 `auth_service_unavailable` | -| 5xx(server error)| 重試最多 2 次(exponential backoff 500ms / 2000ms);全失敗 → 502 `file_gateway_unavailable` | -| network timeout | 同 5xx | - -### 6.4 Timeout - -- PUT /files/{key}:依檔案大小動態,預設 300s(500MB @ 最壞 5MB/s);由 `PROMOTE_TIMEOUT_MS` env 控制 - -### 6.5 大檔 stream - -- 使用 Node 18 原生 `fetch` + `body: ReadableStream` -- `duplex: 'half'` 旗標必要(Node 18.17+) -- 從 MinIO GetObjectCommand 的 Body(stream)直接 pipe 到 fetch PUT body -- 不做記憶體緩衝 - ---- - -## 7. 部署架構 - -### 7.1 Nginx 設定(雙 vhost) - -```nginx -# /etc/nginx/conf.d/converter.conf - -# Upstream -upstream scheduler_upstream { - server scheduler:4000; - keepalive 32; -} - -# Public vhost(對公網,端口 443) -server { - listen 443 ssl http2; - server_name converter.innovedus.com; - - ssl_certificate /etc/nginx/certs/fullchain.pem; - ssl_certificate_key /etc/nginx/certs/privkey.pem; - - # 只 proxy /api/v1/* - location /api/v1/ { - proxy_pass http://scheduler_upstream; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - proxy_request_buffering off; # 大檔 stream - proxy_read_timeout 300s; - client_max_body_size 600M; # 容許略大於 500MB 的 multipart 上限(POST /api/v1/jobs 原始模型上傳) - } - - # /health 可公開 - location = /health { - proxy_pass http://scheduler_upstream; - } - - # 其他路徑 404 - location / { - return 404 '{"error":{"code":"not_found","message":"Not found"}}'; - default_type application/json; - } -} - -# Internal vhost(僅內網 bind,端口 80 綁內部 interface) -server { - listen 10.0.0.1:80; # 內部 IP,不對外 - server_name converter-internal.innovedus.com; - - # Web UI / 舊工具走的路徑 - location /jobs { - proxy_pass http://scheduler_upstream; - proxy_http_version 1.1; - proxy_set_header Host $host; - proxy_buffering off; # SSE 需要 - } - - location /queues/stats { - proxy_pass http://scheduler_upstream; - } - - # Web UI 靜態資源 - location / { - proxy_pass http://web:3000; - } -} -``` - -### 7.2 docker-compose.yml 變更 - -```yaml -services: - scheduler: - environment: - # 既有 - - PORT=4000 - - REDIS_URL=redis://redis:6379 - - STORAGE_BACKEND=minio - # ... MinIO 相關 - # 新增(Phase 1) - - MC_ISSUER=${MC_ISSUER} - - MC_JWKS_URL=${MC_JWKS_URL} - - MC_TOKEN_URL=${MC_TOKEN_URL} - - KNERON_CONVERTER_AUDIENCE=${KNERON_CONVERTER_AUDIENCE:-kneron_converter_api} - - KNERON_CONVERTER_CLIENT_ID=${KNERON_CONVERTER_CLIENT_ID} - - KNERON_CONVERTER_CLIENT_SECRET=${KNERON_CONVERTER_CLIENT_SECRET} - - FILE_ACCESS_AGENT_BASE_URL=${FILE_ACCESS_AGENT_BASE_URL} - - FILE_ACCESS_AGENT_AUDIENCE=${FILE_ACCESS_AGENT_AUDIENCE:-file_access_api} - - CONVERTER_TENANT_ID=${CONVERTER_TENANT_ID:-} - - CONVERTER_SCOPE_WRITE=${CONVERTER_SCOPE_WRITE:-converter:job.write} - - CONVERTER_SCOPE_READ=${CONVERTER_SCOPE_READ:-converter:job.read} - - API_V1_RATE_LIMIT_WINDOW_MS=${API_V1_RATE_LIMIT_WINDOW_MS:-300000} - - API_V1_RATE_LIMIT_MAX=${API_V1_RATE_LIMIT_MAX:-300} - - NODE_ENV=${NODE_ENV:-development} -``` - -### 7.3 `.env.example` 新增 - -```bash -# === OAuth (Member Center) === -MC_ISSUER=https://auth.innovedus.com -MC_JWKS_URL=https://auth.innovedus.com/.well-known/jwks -MC_TOKEN_URL=https://auth.innovedus.com/oauth/token - -# === Converter identity (Resource Server) === -KNERON_CONVERTER_AUDIENCE=kneron_converter_api - -# === Converter identity (OAuth Client,呼叫 File Access Agent 用) === -KNERON_CONVERTER_CLIENT_ID=kneron_converter -KNERON_CONVERTER_CLIENT_SECRET=change-me -CONVERTER_TENANT_ID= - -# === File Access Agent === -FILE_ACCESS_AGENT_BASE_URL=https://files.nas.internal -FILE_ACCESS_AGENT_AUDIENCE=file_access_api - -# === Scope 命名(可配置以防 Member Center owner 要求不同名稱)=== -CONVERTER_SCOPE_WRITE=converter:job.write -CONVERTER_SCOPE_READ=converter:job.read - -# === Rate Limit === -API_V1_RATE_LIMIT_WINDOW_MS=300000 -API_V1_RATE_LIMIT_MAX=300 +│ └── logger.js +├── docs/openapi.yaml ← 要改 security scheme(OAuth → bearer/api_key) +├── .env.example ← 要改(見 infra.md §4) +├── README.md ← 要改 auth 章節 +└── package.json ``` --- -## 8. Scope 設計總表(給跨團隊對齊用) +## 6. 實作任務拆分(給 Backend) -### 8.1 Converter 作為 Resource Server(接收端) +按 Autoflow 增量式開發規範,每個任務 = 一個可獨立 review 的單位。 -| Scope | 用途 | 被誰取 | -|-------|------|--------| -| `converter:job.write` | 建 job、promote | visionA-backend | -| `converter:job.read` | 查 job | visionA-backend | -| (未來)`converter:admin.read` | 跨 client 查 job | 內部監控用 | +### Phase A — API key middleware + auth 切換(取代 OAuth) -### 8.2 Converter 作為 OAuth Client(發起端) +| # | 任務 | 依賴 | 預估 | 驗收標準 | +|---|------|------|------|---------| +| A1 | 新建 `src/auth/apiKeyMiddleware.js` | — | 1d | unit test 全過:happy path、missing header、wrong key、constant-time、destroy socket、env 未設定 fail-fast | +| A2 | 修 `src/config.js`:新增 `converter.apiKey`、移除 OAuth resource server 相關 env、保留 promote 相關 | — | 0.5d | config.test.js 過;啟動時 `CONVERTER_API_KEY` 未設只 warn(不 throw);OAuth resource server env 移除後 server 仍能啟動 | +| A3 | 修 `src/routes/v1/index.js` / `jobs.js` / `promote.js`:`requireAuth(scope)` → `requireApiKey()` | A1, A2 | 0.5d | 既有 integration test 全過(401 行為改成 API key 模式驗);server 啟動正常 | +| A4 | 砍 `src/auth/middleware.js` + `src/auth/jwks.js` + 相關 test | A3 | 0.5d | `git rm` + test runner 沒 broken import;search code base 沒有 reference 殘留 | +| A5 | 修 `.env.example`、`docs/openapi.yaml`、`README.md`:移 OAuth resource server 段、加 `CONVERTER_API_KEY` | A4 | 0.5d | docs lint 過;OpenAPI security scheme 改 bearer / api_key | +| A6 | Integration test:API key 驗證 4 個情境(happy / missing / wrong / 503) | A1-A5 | 1d | 全部過;既有 jobs / promote integration test 仍過 | -| Scope | 用途 | 在哪裡用 | -|-------|------|---------| -| `files:upload.write` | PUT File Access Agent | promote 結果檔到 NAS 模型庫 | +**Phase A 總工時**:~4d -**Phase 1 僅需上述一個 scope。** Converter 完全不從 File Access Agent 讀取任何東西(原始模型已改為 visionA-backend 直接 multipart 上傳 Converter),因此不需要 `files:download.read` / `files:metadata.read`。 +### Phase B — `/result` endpoint -### 8.3 Member Center 需要做的事(跨團隊協調,對應 progress.md 未解決問題) +| # | 任務 | 依賴 | 預估 | 驗收標準 | +|---|------|------|------|---------| +| B1 | 確認 `jobService.createJob` 寫入 `source_filename` 欄位(檢查既有 code、補上若缺)| — | 0.5d | unit test 過;既有 job record 結構不破壞 | +| B2 | 新建 `src/routes/v1/result.js`(含 `extractNefObjectKey`、`buildFilename`、stream handler)| B1, A1 | 1.5d | unit test 過:filename 各情境、雙路徑 key 解析、stream error / client close handling | +| B3 | Wire `/result` 到 `src/routes/v1/index.js`(含 `requireApiKey` + per-client rate limiter)| B2 | 0.5d | server 啟動 + route table 正確;mergeParams 取 :id 通 | +| B4 | Integration test:`/result` 8 個情境(200 happy / 401 / 404 job / 404 result / 409 / 410 expired / 410 minio miss / 502)| B2, B3 | 1d | 全部過 | -1. 新增 resource audience `kneron_converter_api` -2. 新增 OAuth client `kneron_converter`(供 Converter 自己用,grant=client_credentials) -3. 為 visionA-backend 的 client 加上 `converter:job.write`、`converter:job.read` scope 授權 -4. 為 `kneron_converter` client 加上 `files:upload.write` scope 授權(**僅此一個,用於 promote**) -5. 確認 `tenant_id` claim 是否在 S2S token 中可用 -6. (Phase 2)實作 `POST /file-access/download-tokens` +**Phase B 總工時**:~3.5d + +### 任務排程建議 + +**順序執行 A → B**(Backend 單人): + +- A1 + A2 可平行 +- A3 等 A1 + A2 +- A4 等 A3 +- A5 等 A4 +- A6 等 A5(整體 verify) +- B1 + B2 可平行(B1 簡單,B2 是主要工作) +- B3 等 B2 +- B4 等 B3 + +預估總工時:~7.5 工作日(單人)。若可雙人並行,A 和 B 可分工,壓到 ~5d。 + +### 與 visionA 端的 dependency + +| Backend 任務狀態 | visionA 端可以做什麼 | +|---------------|------------------| +| Phase A 完成、deploy stage | visionA 可以打 stage converter 的既有 endpoint 驗 API key 流程 | +| Phase B 完成、deploy stage | visionA 可以打 `/result` endpoint 驗 streaming | +| Phase A + B 都 deploy 完 | e2e 驗證(visionA repo commit 9e29ebf 已 ready) | --- -## 9. 配置管理(完整環境變數清單) +## 7. 測試策略 -| 變數 | 必填 | 預設 | 說明 | -|------|------|------|------| -| `PORT` | ❌ | `4000` | Scheduler listen port | -| `NODE_ENV` | ❌ | `development` | Node 環境 | -| `REDIS_URL` | ✅ | `redis://redis:6379` | Redis 連線 | -| `JOB_DATA_DIR` | ❌ | `/data/jobs` | 舊 local 模式路徑 | -| `FRONTEND_URL` | ❌ | `http://localhost:3000` | CORS | -| `STORAGE_BACKEND` | ❌ | `local` | `local` / `minio` | -| `MINIO_*` | 依 STORAGE_BACKEND | — | 既有 MinIO 參數 | -| **新增(Phase 1)**| | | | -| `MC_ISSUER` | ✅ | — | Member Center issuer URL | -| `MC_JWKS_URL` | ✅ | — | JWKS endpoint | -| `MC_TOKEN_URL` | ✅ | — | token endpoint | -| `KNERON_CONVERTER_AUDIENCE` | ✅ | `kneron_converter_api` | 接受的 aud | -| `KNERON_CONVERTER_CLIENT_ID` | ✅ | — | Converter 作為 client | -| `KNERON_CONVERTER_CLIENT_SECRET` | ✅ | — | 嚴禁進 Git | -| `FILE_ACCESS_AGENT_BASE_URL` | ✅ | — | File Access Agent URL(僅 promote 使用)| -| `FILE_ACCESS_AGENT_AUDIENCE` | ✅ | `file_access_api` | File Access Agent 的 aud(僅 promote 使用)| -| `CONVERTER_TENANT_ID` | ❌ | `""` | 若空則不做 tenant 檢查 | -| `CONVERTER_SCOPE_WRITE` | ❌ | `converter:job.write` | 可覆寫 | -| `CONVERTER_SCOPE_READ` | ❌ | `converter:job.read` | 可覆寫 | -| `API_V1_RATE_LIMIT_WINDOW_MS` | ❌ | `300000` | 5 min | -| `API_V1_RATE_LIMIT_MAX` | ❌ | `300` | 每 client_id | -| `MULTIPART_MODEL_MAX_BYTES` | ❌ | `524288000` | `POST /api/v1/jobs` 模型檔大小上限(500MB,可覆寫)| -| `MULTIPART_REF_IMAGES_MAX_COUNT` | ❌ | `100` | `POST /api/v1/jobs` ref_images 數量上限 | -| `PROMOTE_TIMEOUT_MS` | ❌ | `300000` | promote 單檔 timeout | +詳見 `performance.md` §7 + 各 `api/*.md` 的 test 章節。 -**Secret 管理**:`KNERON_CONVERTER_CLIENT_SECRET` 禁止進 Git。dev 用 `.env`,prod 建議由 Docker secrets / K8s secrets 注入。 +### 7.1 Unit test 覆蓋率目標 + +- `apiKeyMiddleware`:100%(少量 code、必須全 cover) +- `result.js`:90% +- 既有 OAuth-related 改動:維持 ≥ 85% + +### 7.2 Integration test 必跑 + +- API key 4 情境(happy / missing / wrong / 503) +- 既有 jobs / promote 在 API key 模式下仍過 +- `/result` 8 情境(見 `api/api-result.md` §7.1) + +### 7.3 Manual stage e2e(部署後) + +- curl 驗:`/health`、`POST /jobs`、`GET /jobs/:id`、`POST /promote`、`GET /result` +- visionA 端 e2e:完整 upload → poll → promote → download --- -## 10. 向後相容與遷移 +## 8. 安全注意事項 -### 10.1 既有路徑行為(不變) +詳見 `security.md`。重點: -| 路徑 | Phase 1 行為 | -|------|------------| -| `POST /jobs` (multipart) | **不變**,繼續接收 Web UI 上傳 | -| `GET /jobs/:id` | **不變**,`origin=web` 的 job 不過濾,`origin=api` 的 job 也看得到(內部 vhost 無授權,看不到差別)| -| `GET /jobs/:id/events` (SSE) | **不變**,Web UI 繼續用 | -| `GET /jobs/:id/download/:filename` | **不變**,Web UI 下載結果 | -| `GET /jobs` | **不變**,列全部 | -| `GET /health`, `GET /queues/stats` | **不變** | - -### 10.2 Web UI 何時遷移 - -**非本次範圍**。未來若決定把 Web UI 也納入 OAuth,屬於獨立的 L 級任務,需要設計 Member Center 登入流程、token refresh 等 UX 細節。 - -### 10.3 `STORAGE_BACKEND=local` 模式 - -既有 local 模式(Shared Volume)保留運作。新 API 要求 `STORAGE_BACKEND=minio`,因為: -- 從 multipart 收到的 buffer 要寫到某個 bucket 供 Worker 讀取 -- Shared Volume 路徑跨 container 複雜,未來跨主機部署也不適合 - -**實作檢查**:`POST /api/v1/jobs` 啟動時檢查 `STORAGE_BACKEND === 'minio'`,若非則 500 `misconfiguration`。 +- `CONVERTER_API_KEY` 不進 git / log / Slack +- `constant-time compare`(防 timing attack) +- Sec C1 暫緩(`.env` history rewrite + secret rotation 在 Phase 1 ready 後做、含 CONVERTER_API_KEY) +- Trust boundary:visionA 一旦被 compromise 可冒充任意 user_id(接受、與 OAuth 模型一致) --- -## 11. 測試策略 +## 9. 風險與待確認 -### 11.1 Unit test(Jest / Mocha) - -- `auth/jwks.js`:mock JWKS 回應,測過期、簽章錯、aud 錯、scope 不足 -- `auth/oauthClient.js`:mock token endpoint,測 cache 命中、過期重取、失敗處理 -- `fileAccessAgent/client.js`:mock fetch,測 PUT 5xx 重試、401 invalidate 重試、timeout -- `services/jobService.js`:測 claim_active_job 的並發(模擬兩個 user_id 相同同時建 job) -- `routes/v1/jobs.js` multipart validation:mock `multer`,測超過 500MB、缺 `model`、model_id 非數字、platform 不在 enum、user_id 含 `/` -- Response schema 映射(內部 status → 對外 status + stage) - -### 11.2 Integration test - -- **Member Center mock**:用 `wiremock` 或手寫 Express mock 模擬 JWKS + token endpoint -- **File Access Agent mock**:模擬 PUT 的成功 / 失敗回應(promote 用) -- **Redis**:用真 Redis(docker-compose test 環境) -- **multipart 上傳**:用 `supertest` + `attach('model', buffer, 'model.onnx')` 測試真實 multipart 流程(小檔、中檔、邊界檔 499MB / 501MB) - -### 11.3 E2E test(黑箱) - -- 需真 Member Center + File Access Agent 測試環境(Phase 1 kickoff 前準備) -- 測試案例: - 1. 完整流程:multipart 上傳 → polling → promote 成功 - 2. 409 測試:同 user 連續建 job - 3. 權限測試:invalid token / 缺 scope / 錯 aud - 4. 錯誤路徑:上傳超過 500MB → 413、缺 `model` file → 400、promote File Access Agent 500 → 502 - 5. 多檔案大小測試:小檔(1MB)、中檔(50MB)、大檔(200MB、500MB)分別驗證 p95 - -### 11.4 負載測試 - -- `POST /api/v1/jobs` 不需高 QPS(實際使用量一個 user 分鐘級),但需驗證大檔 multipart 不會 OOM(測試 10 個 user 同時上傳 200MB) -- `GET /api/v1/jobs/:id` 是熱點(polling),測每秒 100 req per Scheduler instance -- p95 < 200ms 驗證(GET),p95 < 5s / 12s 驗證(POST 200MB / 500MB) +| # | 風險 | 影響 | 行動 | +|---|------|------|------| +| R1 | CONVERTER_API_KEY rotation 流程未自動化 | 低 | Phase 1 接受手動 | +| R2 | `/result` 高並發 stream 壓力 | 低 | NEF 通常小、visionA 是唯一 caller、QPS 可控 | +| R3 | Sec C1 暫緩(.env 進 git history) | 中 | Phase 1 ready 收尾後 rewrite | +| R4 | NEF 7 天過期後 client 重新轉檔 | 低 | API spec 已定義 410,visionA 端處理 | +| R5 | Phase 0.8b 部署期間「OAuth → API key」短暫不可用 | 低 | 既有 stage OAuth 從未跑通、不會有 regression | --- -## 12. 實作任務拆分(按 Autoflow 增量式開發規範) +## 10. 後續步驟 -每個任務 = 一個可獨立 review 的單位。Reviewer 會逐個審查。 - -| # | 任務 | 依賴 | 可並行? | 預估 | 驗收標準 | -|---|------|------|---------|------|---------| -| T1 | auth middleware + JWKS 驗證 | — | — | 3d | unit test 全過,能在空 route 上驗 mock token | -| T2 | Converter OAuth client(client_credentials + cache)| — | ✅ 與 T1 平行 | 2d | unit test 過,能對 mock token endpoint 取到並 cache | -| T3 | 新 `/api/v1/*` 路由骨架 + 錯誤格式統一 + request_id middleware | T1 | — | 2d | 所有新端點可通,回 501 是正常路徑 | -| T4 | POST /api/v1/jobs(multer 接收 multipart、寫 MinIO、active job 鎖、enqueue)| T1, T3 | — | 3d | 能建 job、409 正常、413 正常、回滾正常、大檔不 OOM | -| T5 | GET /api/v1/jobs + GET /api/v1/jobs/:id(含 ETag、client 隔離、user 索引)| T1, T3, T4 | ✅ 與 T6 平行 | 3d | Recovery 查詢正確、ETag 304 可用 | -| T6 | POST /api/v1/jobs/:id/promote(含 stream PUT、重試、FAA client)| T1, T2, T3 | ✅ 與 T5 平行 | 4d | 促進成功、冪等、失敗可重試 | -| T7 | 部署分流(Nginx 雙 vhost 設定 + docker-compose 更新)| — | ✅ 與 T1-T6 平行 | 1d | 內網可達 `/jobs`,公網只可達 `/api/v1/*` | -| T8 | OpenAPI 3.0 spec(手寫)+ 錯誤碼完整文件 | T3-T6 | — | 2d | spec lint 過,visionA-backend 能直接 import | - -**預估總工時**:3-4 人週(單人序列執行),若 2 人並行可壓到 2 週。對齊 PRD RICE Effort=4 的估算(較原估算略減,因為 T4 不再需要實作 FAA GET / HEAD 分支)。 - -**外部依賴觸發**: -- T1 需要 Member Center JWKS URL(可用 mock) -- T6 需要 File Access Agent 測試環境(或 mock PUT endpoint) -- T7 需要使用者確認部署拓撲 +1. 本 TDD 索引 + 子檔案送 PM / Design 三方互審 +2. 使用者審核 +3. Backend Agent 依 §6 的任務拆分增量開發 +4. Reviewer 每個任務把關 +5. Testing 整合測試 + e2e +6. DevOps 部署(converter 先 + visionA 後) --- -## 13. 未解決 / 待確認事項(TBD) - -| # | 項目 | 影響 | 待誰確認 | -|---|------|------|---------| -| TBD-1 | Member Center 的 `tenant_id` claim 是否出現在 client_credentials token | T1 設定 | Member Center owner | -| TBD-2 | `kneron_converter_api` audience / `kneron_converter` client / scope 的最終命名 | T1, T2 | Member Center owner | -| TBD-3 | File Access Agent 的 base URL(測試環境、prod 環境)與 tenant_id | T6 | File Access Agent owner | -| TBD-4 | Rate limit 的實際值(300 req / 5min 是估算,需觀測後校準)| 上線後調整 | 觀測資料 | -| TBD-5 | Nginx 雙 vhost 的具體 IP / hostname(依部署拓撲)| T7 | 使用者 / DevOps | -| TBD-6 | stage_progress 的顆粒度(Worker 是否有能力回報 stage 內 %)| P2 feature | Worker 開發團隊 | - ---- - -## 14. 附錄:Error code 完整表 - -| Code | HTTP | 說明 | -|------|------|------| -| `validation_error` | 400 | 欄位格式錯誤(multipart field 缺漏、model_id 非數字、platform 不在 enum 等)| -| `invalid_multipart` | 400 | multipart parse 失敗、缺必要 file、副檔名不符 | -| `invalid_token` | 401 | JWT 無效 / 簽章錯 / 缺 claim | -| `token_expired` | 401 | JWT 過期 | -| `insufficient_scope` | 403 | scope 不足 | -| `tenant_mismatch` | 403 | tenant_id 不符 | -| `job_not_found` | 404 | job 不存在或不屬於 client(避免資訊洩露)| -| `not_found` | 404 | 路徑不存在 | -| `user_has_active_job` | 409 | 同 user 已有 in-progress job | -| `job_not_ready_for_promote` | 409 | promote 時 job 非 completed | -| `source_not_available` | 409 | promote 的 source stage 沒產出 | -| `file_too_large` | 413 | multipart 上傳超過 500MB(由 multer `LIMIT_FILE_SIZE` 觸發)| -| `invalid_object_key` | 422 | target_object_key 格式不合法 | -| `misconfiguration` | 500 | 伺服器設定錯誤(例:STORAGE_BACKEND 錯)| -| `storage_unavailable` | 502 | MinIO 寫入失敗(`POST /api/v1/jobs` 寫 input 時)| -| `internal_error` | 500 | 其他未分類 | -| `not_implemented` | 501 | Phase 2 功能 | -| `file_gateway_unavailable` | 502 | File Access Agent 失敗(僅 promote 使用)| -| `auth_service_unavailable` | 503 | Member Center 取 token 失敗(僅 promote 使用)| -| `service_unavailable` | 503 | 其他依賴失敗 | - ---- - -## 15. 附錄:請求 / 回應速查 - -### 建 job(multipart) -```bash -curl -X POST https://converter.innovedus.com/api/v1/jobs \ - -H "Authorization: Bearer $TOKEN" \ - -F "model=@./model.onnx" \ - -F "user_id=u-12345" \ - -F "model_id=1001" \ - -F "version=0001" \ - -F "platform=520" \ - -F "enable_evaluate=false" \ - -F "enable_sim_fp=false" \ - -F "enable_sim_fixed=false" \ - -F "enable_sim_hw=false" -``` - -**含參考圖片**(可重複 `-F "ref_images[]=@..."`): -```bash -curl -X POST https://converter.innovedus.com/api/v1/jobs \ - -H "Authorization: Bearer $TOKEN" \ - -F "model=@./model.onnx" \ - -F "ref_images[]=@./img_0.jpg" \ - -F "ref_images[]=@./img_1.jpg" \ - -F "user_id=u-12345" \ - -F "model_id=1001" \ - -F "version=0001" \ - -F "platform=520" -``` - -### 查 job -```bash -curl -H "Authorization: Bearer $TOKEN" \ - https://converter.innovedus.com/api/v1/jobs/550e8400-... -``` - -### Recovery -```bash -curl -H "Authorization: Bearer $TOKEN" \ - 'https://converter.innovedus.com/api/v1/jobs?user_id=u-12345&status=in_progress' -``` - -### Promote -```bash -curl -X POST https://converter.innovedus.com/api/v1/jobs/550e8400-.../promote \ - -H "Authorization: Bearer $TOKEN" \ - -H "Content-Type: application/json" \ - -d '{ - "targets": [ - {"source": "nef", "target_object_key": "visionA/models/u-12345/m-1001/v0001/out.nef"} - ] - }' -``` - ---- - -## 16. 變更記錄 +## 11. 變更記錄 | 日期 | 版本 | 變更 | 作者 | |------|------|------|------| -| 2026-04-25 | Draft 1.0 | 初版,Phase 1 完整規格 | Architect Agent | -| 2026-04-25 | Draft 1.1 | POST /api/v1/jobs 改 multipart/form-data;移除 FAA getFile/headFile 實作、`files:download.read`/`files:metadata.read` scope、`input_object_key` 欄位、`input_not_found` error code;新增 `invalid_multipart`/`file_too_large`/`storage_unavailable` error codes;TBD-1 刪除、TBD 重新編號;§2.5 File Access Agent client 僅保留 putFile;§2.8 POST jobs 流程改為 multer 接收→寫 MinIO;§6 FAA 整合精簡為僅 PUT | Architect Agent | +| 2026-04-25 | Draft 1.0 | 初版,Phase 1 完整規格(單檔 1390 行) | Architect Agent | +| 2026-04-25 | Draft 1.1 | Multipart 上傳路徑改 | Architect Agent | +| 2026-05-16 | Draft 2.0 | **Phase 0.8b 重寫**:API key + /result + 模組化拆分為索引 + 8 個子檔案 | Architect Agent | --- -**注意**:本 TDD 約 1390 行,已超過拆分門檻甚多。本次更新聚焦內容修正,暫不拆分;下輪更新強烈建議拆分為: -- `TDD.md`(索引) -- `TDD-api.md`(§1、§14、§15) -- `TDD-backend.md`(§2、§3、§4) -- `TDD-integration.md`(§5、§6) -- `TDD-infra.md`(§7、§9) -- `TDD-testing.md`(§11) +**附註**:本 TDD 從 1390 行單檔重組為 ~180 行索引 + 8 個子檔案。每個子檔案 < 500 行(單一職責),可獨立給 Backend / Reviewer / Testing 不同角色讀對應檔案、減少 context 負擔。 diff --git a/docs/autoflow/04-architecture/api/api-jobs.md b/docs/autoflow/04-architecture/api/api-jobs.md new file mode 100644 index 0000000..c2a6ad1 --- /dev/null +++ b/docs/autoflow/04-architecture/api/api-jobs.md @@ -0,0 +1,295 @@ +# API: `/api/v1/jobs`(POST / GET / GET :id) + +> **狀態**:Phase 1 完工(OAuth)→ Phase 0.8b 換 auth(API key);其他流程不變。 +> +> **配套**:`auth.md`、`api/api-result.md`、`api/api-promote.md`、`database.md`。 + +--- + +## 1. 通用約定 + +- **Base URL**:`https:///api/v1` +- **Content-Type**: + - `POST /api/v1/jobs`:`multipart/form-data` + - 其他 GET:response 為 `application/json; charset=utf-8` +- **時間格式**:ISO 8601 UTC +- **ID 格式**:`job_id` UUIDv4 +- **認證**:`Authorization: Bearer `(除 `/health` 外全部必要) +- **Request ID**:若 client 傳 `X-Request-Id`,回應帶同一值;未傳則 server 產 UUIDv4 +- **速率限制**:per `client_id`(API key 模式下固定 `visionA-service`)300 req / 5min + +--- + +## 2. 統一錯誤格式 + +```json +{ + "error": { + "code": "string_code", + "message": "human readable message (zh-TW)", + "details": { /* 可選 */ }, + "request_id": "uuid-v4" + } +} +``` + +--- + +## 3. `GET /health`(不需 auth) + +**Response 200**: +```json +{ + "service": "kneron-converter-api", + "status": "healthy", + "version": "1.0.0", + "timestamp": "2026-05-16T12:00:00Z", + "dependencies": { + "redis": "connected", + "file_access_agent": "reachable" + } +} +``` + +**Phase 0.8b 變動**:移除 `member_center` dependency(不再驗 MC token);保留 `file_access_agent`(promote 用)+ `redis`。 + +**Response 503**:任一 critical dependency 失敗。 + +--- + +## 4. `POST /api/v1/jobs` + +### 4.1 Request + +```http +POST /api/v1/jobs +Authorization: Bearer +Content-Type: multipart/form-data; boundary=----... +X-Request-Id: (optional) + +------... +Content-Disposition: form-data; name="model"; filename="model.onnx" +Content-Type: application/octet-stream + + +------... +Content-Disposition: form-data; name="ref_images[]"; filename="img_0.jpg" +Content-Type: image/jpeg + + +------... +Content-Disposition: form-data; name="user_id" + +visionA-user-12345 +------... +Content-Disposition: form-data; name="model_id" + +1001 +------... +Content-Disposition: form-data; name="version" + +0001 +------... +Content-Disposition: form-data; name="platform" + +520 +------...-- +``` + +### 4.2 Multer 設定 + +- `multer.memoryStorage()` +- `limits.fileSize`: 500MB(`MULTIPART_MODEL_MAX_BYTES` env 可覆寫) +- `fields`: `model`(1 個 file)、`ref_images[]`(`maxCount: 100`) + +### 4.3 欄位定義 + +| 欄位 | 類型 | 位置 | 必填 | 驗證 | +|------|------|------|------|------| +| `model` | file | multipart file | ✅ | 副檔名 ∈ {`.onnx`, `.pt`, `.pth`, `.tflite`, `.h5`, `.pb`};大小 ≤ 500MB | +| `ref_images[]` | file[] | multipart file | ❌ | `image/*`;最多 100 張;單張 ≤ 10MB | +| `user_id` | string | multipart field | ✅ | 1-128 字元,嚴格白名單 `^[A-Za-z0-9._-]+$`,不含 `..` | +| `model_id` | string → int | multipart field | ✅ | 轉 int 後 1 ≤ x ≤ 65535 | +| `version` | string | multipart field | ✅ | 1-32 字元,嚴格白名單 `^[A-Za-z0-9._-]+$` | +| `platform` | string | multipart field | ✅ | enum: `520`, `720`, `530`, `630`, `730` | +| `enable_evaluate` | string `'true'`/`'false'` | multipart field | ❌ | 預設 `'false'` | +| `enable_sim_fp` | string `'true'`/`'false'` | multipart field | ❌ | 預設 `'false'` | +| `enable_sim_fixed` | string `'true'`/`'false'` | multipart field | ❌ | 預設 `'false'` | +| `enable_sim_hw` | string `'true'`/`'false'` | multipart field | ❌ | 預設 `'false'` | +| `metadata` | string(JSON)| multipart field | ❌ | 合法 JSON 物件字串 | + +### 4.4 Middleware 順序(**勿改**) + +``` +requireApiKey() + ↓ +perClientLimiter(per client_id rate limit) + ↓ +uploadConcurrencySemaphore(per-process MAX_CONCURRENT_UPLOADS) + ↓ +uploader.fields([{ name: 'model', maxCount: 1 }, { name: 'ref_images[]', maxCount: 100 }]) + ↓ +multerErrorAdapter(捕 multer LIMIT_FILE_SIZE → 413) + ↓ +createJobHandler +``` + +**理由**:API key middleware 必須在 multer 之前(避免未驗證就 parse 500MB 大檔);rate limiter 第二(超 quota 不該吃 multipart);upload concurrency semaphore 第三(防 OOM);multer 最後(auth + quota + concurrency 三重通過後才 parse)。 + +### 4.5 Response 201 Created + +```json +{ + "job_id": "550e8400-e29b-41d4-a716-446655440000", + "status": "created", + "stage": "onnx", + "progress": 0, + "created_at": "2026-05-16T12:00:00Z", + "expires_at": "2026-05-23T12:00:00Z", + "user_id": "visionA-user-12345" +} +``` + +### 4.6 Error responses + +| HTTP | error.code | 情境 | +|------|-----------|------| +| 400 | `validation_error` | 欄位缺漏或格式錯誤 | +| 400 | `invalid_multipart` | multipart parse 失敗、缺必要 file、副檔名不符 | +| 401 | `invalid_token` | API key 缺 / 不符 | +| 409 | `user_has_active_job` | user_id 已有進行中 job | +| 413 | `file_too_large` | model 檔超過 500MB | +| 500 | `misconfiguration` | `STORAGE_BACKEND !== 'minio'` | +| 502 | `storage_unavailable` | MinIO 寫入失敗 | +| 503 | `service_unavailable` | upload concurrency semaphore 滿、API key 未配置 | +| 503 | `service_unavailable` | upload semaphore 滿(含 `Retry-After` header) | + +--- + +## 5. `GET /api/v1/jobs/:id` + +### 5.1 Request + +```http +GET /api/v1/jobs/550e8400-... +Authorization: Bearer +If-None-Match: "etag-value" (optional) +``` + +### 5.2 Response 200 + +```json +{ + "job_id": "550e8400-...", + "user_id": "visionA-user-12345", + "status": "running", + "stage": "bie", + "progress": 45, + "stage_progress": 60, + "created_at": "...", + "updated_at": "...", + "expires_at": "...", + "stage_timings": { + "onnx": { "started_at": "...", "completed_at": "..." }, + "bie": { "started_at": "...", "completed_at": null }, + "nef": null + }, + "input": { + "filename": "model.onnx", + "object_key": "jobs/.../input/model.onnx", + "size_bytes": 204800000, + "ref_images_count": 0 + }, + "result_object_keys": null, + "error": null, + "parameters": { /* model_id, version, platform, enable_* */ }, + "metadata": {}, + "estimated_completion_at": null +} +``` + +### 5.3 狀態機(對外 `status` 欄位) + +- `created` — 剛建立,等第一階段開工 +- `running` — 正在某個 stage(`stage` 欄位有值) +- `completed` — 全部完成(`result_object_keys` 有值,`stage=null`) +- `failed` — 失敗(`error` 有值) + +**內部 → 對外映射**(statusMapper): + +| 內部 status | 對外 `status` + `stage` | +|------------|----------------------| +| `ONNX` + `stage_timings.onnx.started_at == null` | `created` + stage=onnx | +| `ONNX` + `started_at != null` | `running` + stage=onnx | +| `BIE` | `running` + stage=bie | +| `NEF` | `running` + stage=nef | +| `COMPLETED` | `completed` + stage=null | +| `FAILED` | `failed` + stage=<最後階段> | + +### 5.4 ETag 支援 + +ETag = hash(`job.updated_at`)。If-None-Match 吻合 → 304 Not Modified(省 body)。 + +### 5.5 Error responses + +| HTTP | error.code | 情境 | +|------|-----------|------| +| 401 | `invalid_token` | API key 缺 / 不符 | +| 404 | `job_not_found` | job 不存在 / 不屬於 client(避免資訊洩露)| + +--- + +## 6. `GET /api/v1/jobs`(列表 / Recovery) + +### 6.1 Query 參數 + +| 參數 | 類型 | 必填 | 說明 | +|------|------|------|------| +| `user_id` | string | **✅** | 過濾 user_id(強制必填,避免全掃)| +| `status` | string | ❌ | `in_progress` (= `created` ∪ `running`) / `completed` / `failed` / `all`(預設 `all`)| +| `limit` | int | ❌ | 預設 20,上限 100 | +| `offset` | int | ❌ | 預設 0 | +| `created_after` | ISO 8601 | ❌ | 過濾 `created_at >= created_after` | + +### 6.2 Response 200 + +```json +{ + "total": 2, + "limit": 20, + "offset": 0, + "items": [ + { /* 同 GET /jobs/:id 格式,精簡版 */ } + ] +} +``` + +### 6.3 實作 + +以 `user:{user_id}:jobs` Set 為索引,避免全掃 `KEYS job:*`。 + +--- + +## 7. Phase 2 預留端點(Phase 1 + 0.8b 回 501) + +| 方法 | 路徑 | 說明 | +|------|------|------| +| POST | `/api/v1/jobs/:id/download-tokens` | Phase 2 預留(未來 browser 直連 download 用)| +| DELETE | `/api/v1/jobs/:id` | 取消 job(Phase 2/3)| + +兩個都回 501 `not_implemented`。 + +--- + +## 8. 端點清單總表 + +| 方法 | 路徑 | 說明 | Auth | +|------|------|------|------| +| GET | `/health` | 健康檢查 | — | +| POST | `/api/v1/jobs` | 建立轉檔 job | API key | +| GET | `/api/v1/jobs` | 列出 job(user_id 必填)| API key | +| GET | `/api/v1/jobs/:id` | 單一 job 狀態 | API key | +| POST | `/api/v1/jobs/:id/promote` | 搬檔到 FAA | API key | +| GET | `/api/v1/jobs/:id/result` | **NEW** stream NEF | API key | +| POST | `/api/v1/jobs/:id/download-tokens` | Phase 2,回 501 | API key | +| DELETE | `/api/v1/jobs/:id` | Phase 2,回 501 | API key | diff --git a/docs/autoflow/04-architecture/api/api-promote.md b/docs/autoflow/04-architecture/api/api-promote.md new file mode 100644 index 0000000..300f78c --- /dev/null +++ b/docs/autoflow/04-architecture/api/api-promote.md @@ -0,0 +1,183 @@ +# API: `POST /api/v1/jobs/:id/promote` + +> **狀態**:Phase 1 完工 — Phase 0.8b 完全保留,只是對外 auth 換成 API key(converter → FAA 仍走 OAuth client_credentials)。 +> +> **配套**:`auth.md` §2(converter → FAA OAuth client 設計)。 + +--- + +## 1. 用途 + +把 Converter Bucket 中的轉檔結果檔(onnx / bie / nef)PUT 到 FAA NAS Bucket(長期儲存)。 + +--- + +## 2. Request + +```http +POST /api/v1/jobs/550e8400-.../promote +Authorization: Bearer +Content-Type: application/json + +{ + "targets": [ + { + "source": "nef", + "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef" + }, + { + "source": "bie", + "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.bie" + } + ] +} +``` + +### 2.1 Body + +| 欄位 | 類型 | 必填 | 說明 | +|------|------|------|------| +| `targets` | array | ✅ | 至少 1 個,最多 10 個 | +| `targets[].source` | string | ✅ | enum: `onnx`, `bie`, `nef` | +| `targets[].target_object_key` | string | ✅ | FAA 的目標 key(visionA 決定命名);長度 ≤ 1024、不可含 `..` / `\` / 控制字元 / 開頭 `/` / `?` / `#` / `%` | + +--- + +## 3. Response 200 + +```json +{ + "job_id": "550e8400-...", + "promoted": [ + { + "source": "nef", + "target_object_key": "visionA/models/user-12345/model-1001/v0001/out.nef", + "size_bytes": 10485760, + "file_access_agent_etag": "abc123", + "promoted_at": "2026-05-16T12:30:00Z" + }, + { + "source": "bie", + "target_object_key": "...", + "size_bytes": 5242880, + "file_access_agent_etag": "def456", + "promoted_at": "..." + } + ] +} +``` + +--- + +## 4. Error Responses + +| HTTP | error.code | 情境 | +|------|-----------|------| +| 400 | `validation_error` | targets 格式錯、source 非合法 stage、duplicate source | +| 401 | `invalid_token` | API key 缺 / 不符 | +| 404 | `job_not_found` | job 不存在 | +| 409 | `job_not_ready_for_promote` | status != COMPLETED(`details.current_status`)| +| 409 | `source_not_available` | job 沒產這個 stage 的結果 | +| 422 | `invalid_object_key` | target_object_key 格式不合法(含 reason)| +| 502 | `file_gateway_unavailable` | FAA PUT 失敗(4xx / 5xx / timeout 已重試 3 次)| +| 502 | `storage_unavailable` | MinIO HEAD / GET 失敗 | +| 503 | `auth_service_unavailable` | 取 FAA token 失敗(401 已 invalidate + retry 仍失敗)| + +--- + +## 5. 冪等性 + +`promote` 對同樣 `target_object_key` PUT 兩次結果一樣(FAA 會覆蓋)。 + +**Two-layer 冪等性**(保留 Phase 1 實作): + +1. **Job-level**:`job.promoted === true` → 直接回 200 + 既有 `promoted_object_keys`,不重打 FAA +2. **FAA-level**:FAA PUT 本身冪等,重試安全 + +--- + +## 6. 實作流程 + +``` +1. requireApiKey() → 401 +2. perClientLimiter → 429 +3. validate body → 400 / 422 +4. jobService.getJob(id) + client 隔離 → 404 +5. 冪等性 check(job.promoted === true → return 200) +6. status === 'COMPLETED' check → 409 +7. for each target (序列): + a. getJobOutputKey(job, target.source) → 409 source_not_available + b. minio.headObject(sourceKey) → 502 storage_unavailable + c. oauthClient.getServiceToken('files:upload.write') ← OAuth client(保留) + d. faaClient.putFile(targetKey, streamFactory, ...) → 502 / 503 + e. 收集 promoted result +8. jobService.markPromoted(jobId, ...) → log ERROR if 失敗(但 client 仍回 200,因為檔案實際已搬完) +9. return 200 + { job_id, promoted: [...] } +``` + +--- + +## 7. 重要決策(保留 Phase 1) + +### 7.1 序列 promote 各 target + +**為什麼序列**: +- FAA 端對單一 client 並發可能有限制 +- 失敗時容易判斷哪個 target 已成功 +- 大檔串流並發會放大記憶體 / CPU 壓力 + +### 7.2 Stream factory pattern + +`faaClient.putFile` 接受 `streamFactory: () => Promise`,每次 attempt 才呼叫 `minio.getObjectStream` 拿新 stream。 + +**為什麼**:HTTP body 不可 replay;attempt #1 5xx 失敗,attempt #2 必須拿新 stream。 + +### 7.3 Target_object_key 安全檢查 + +拒絕: +- 空字串、超長(> 1024) +- 開頭 `/`(避免被 FAA 解讀為絕對路徑) +- 含 `..`(路徑穿越) +- 含 `\`(Windows 路徑 / URL 注入) +- 含 `\0` / 控制字元(`\x00-\x1F`、`\x7F`) +- 含 `?` / `#`(URL query / fragment 注入) +- 含 `%`(雙重編碼攻擊,避免 `%2E%2E` 解碼為 `..`) + +### 7.4 FAA 錯誤分類 + +| FAA 錯誤 | 轉換成 v1 ApiError | +|---------|-------------------| +| `FAAUnauthorizedError`(已 retry 仍 401)| 503 `auth_service_unavailable` | +| `FAAClientError`(4xx 非 401)| 502 `file_gateway_unavailable`(拒絕細節,避免洩漏 FAA 內部訊息)| +| `FAAServerError`(5xx)/ `FAATimeoutError` | 502 `file_gateway_unavailable` | +| 其他 | 500 `internal_error` | + +### 7.5 FAA 重試策略 + +- 4xx 非 401:不重試(client error,重試無益) +- 401:`oauthClient.invalidate(scope)` + retry 1 次;仍 401 → 503 +- 5xx / timeout / network:重試 2 次(exponential backoff 500ms / 2000ms);全失敗 → 502 + +### 7.6 markPromoted 失敗的處理 + +FAA 已成功(檔案在 NAS 上)但 Redis `markPromoted` 失敗: + +- Log ERROR +- 仍回 200 給 client(檔案實際已搬完) +- 下次 promote 同 job 時 `markPromoted` 會再嘗試(FAA PUT 冪等) +- 副作用:client 後續呼叫不會走 idempotent path、會再 PUT 一次(無害) + +--- + +## 8. Curl 範例 + +```bash +curl -X POST https://converter.innovedus.com/api/v1/jobs/550e8400-.../promote \ + -H "Authorization: Bearer $CONVERTER_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "targets": [ + {"source": "nef", "target_object_key": "visionA/models/u-12345/m-1001/v0001/out.nef"} + ] + }' +``` diff --git a/docs/autoflow/04-architecture/api/api-result.md b/docs/autoflow/04-architecture/api/api-result.md new file mode 100644 index 0000000..e77d76e --- /dev/null +++ b/docs/autoflow/04-architecture/api/api-result.md @@ -0,0 +1,1513 @@ +# API: `GET /api/v1/jobs/:id/result`(Phase 0.8b 新增、Phase B 設計 2026-05-17 強化) + +> **狀態**:Phase 0.8b 新增,取代原 Phase 2 delegated download token 設計;Phase B 啟動前經 Security review 強化 streaming 攻擊面 mitigation。 +> **配套**:visionA repo `adr-016-download-via-converter.md` v1.0、`design-doc.md` §3.3 / ADR-011。 +> **Phase B 設計強化來源**(2026-05-17): +> - Security review:`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md`(4 Major + 3 Minor + 2 Suggestion) +> - Architect 採納範圍:M1-M4(rate limit + bandwidth quota + Range + stream timeout + concurrent cap)+ m1-m3(quote-escape + 429/503 status + audit log 欄位) +> - 詳見 §9(rate limit + bandwidth quota)、§10(Range)、§11(audit log)、§13.4a(filename defense-in-depth)、§15(streaming resource limits)、§14(acceptance criteria AC-1 到 AC-12) + +--- + +## 1. 用途 + +visionA-backend 用此 endpoint 從 Converter Bucket 直接拿 NEF 結果檔(streaming proxy)。 + +取代原本「visionA → 拿 delegated download token → FAA」路徑(該路徑因 MC 沒實作 endpoint 而從未跑通)。 + +--- + +## 2. Request + +```http +GET /api/v1/jobs/{id}/result HTTP/1.1 +Host: converter.innovedus.com +Authorization: Bearer +X-Request-Id: (optional) +``` + +### 2.1 Path params + +| 欄位 | 類型 | 說明 | +|------|------|------| +| `id` | string (UUIDv4) | Job ID | + +### 2.2 Query / Body + +**無**(streaming endpoint 不支援額外參數)。 + +### 2.3 Auth + Rate Limit + Bandwidth Quota + Concurrent Cap + +- `Authorization: Bearer `(API key middleware,見 `auth.md` §1) +- **Rate limit**(詳見 §9、Security 2026-05-17 review 後改): + - Burst:5 req / 10 sec per `token_fingerprint` + - Sustained:20 req / min per `token_fingerprint` +- **Bandwidth quota**(詳見 §9): + - Hourly:1 GB / hr per `token_fingerprint` + - Daily:6 GB / 24hr per `token_fingerprint` +- **Concurrent stream cap**(詳見 §15):max 10 同時 stream(per-instance) +- **Stream timeout**(詳見 §15):5 分鐘(超時 destroy connection) + +--- + +## 3. Response 200(成功) + +```http +HTTP/1.1 200 OK +Content-Type: application/octet-stream +Content-Length: +Content-Disposition: attachment; filename="_.nef" +X-Request-Id: + + +``` + +### 3.1 Headers + +| Header | 規則 | +|--------|------| +| `Content-Type` | `application/octet-stream`(或 MinIO HEAD 回傳的 `contentType`,預設 octet-stream) | +| `Content-Length` | NEF 物件大小 bytes(從 MinIO HEAD 取);**必須帶**,visionA 端用來決定 timeout | +| `Content-Disposition` | `attachment; filename=""`,filename 規則見 §3.2 | +| `X-Request-Id` | 沿用 request_id middleware 設定的 ID | + +### 3.2 Filename 規則 + +**格式**:`_.nef` + +| 輸入 | 結果 | +|------|------| +| `source_filename = yolov5s.onnx`、`platform = '720'` | `yolov5s_720.nef` | +| `source_filename = model.pt`、`platform = '530'` | `model_530.nef` | +| `source_filename` 缺失(極端情境)| `job_.nef`(fallback) | +| `platform` 缺失(極端情境)| `job_.nef`(fallback) | + +**注意**:`job.platform` 為 createJob validator 接受的數字字串(如 `'720'` / `'530'` / `'520'`、無 `KL` prefix)、`buildFilename` 透過 `.toLowerCase()` 標準化(對純數字字串無變化、保留同樣的標準化邏輯以兼容未來可能的字母混用 platform)。 + +**實作邏輯**: + +```javascript +function buildFilename(job) { + const sourceFilename = job.source_filename || ''; + const platform = (job.platform || '').toLowerCase(); + const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, ''); + if (stem && platform) { + return `${stem}_${platform}.nef`; + } + return `job_${job.job_id || 'unknown'}.nef`; +} +``` + +**邊界情境**: +- `source_filename` 含特殊字元(已 sanitized 由 `sanitizeFilename`)— 不再二次 sanitize +- `platform` 大小寫 — 統一 lower-case(對齊 visionA `defaultDownloadFilename` 慣例) + +### 3.3 Body + +NEF binary stream(Node Stream pipe)。 + +**不要 buffer 整個檔**。NEF 可能 < 50MB(常見)至 數百 MB(極端),buffer 會 OOM。 + +--- + +## 4. Response 4xx / 5xx + +統一格式: + +```json +{ + "error": { + "code": "", + "message": "", + "details": { /* 可選 */ }, + "request_id": "" + } +} +``` + +### 4.1 失敗情境 + +| HTTP | error.code | 情境 | 訊息範例 | +|------|-----------|------|---------| +| 401 | `invalid_token` | API key missing / 格式錯 / 不符 | API key 驗證失敗 | +| 404 | `job_not_found` | jobID 不存在 | Job {jobId} not found | +| 404 | `result_not_found` | job 已 completed 但 result_object_keys 內沒 NEF | Job {jobId} completed but no NEF result available | +| 409 | `job_not_completed` | job 還沒 completed(still running / failed) | Job {jobId} is {status}; result only available after completion | +| 410 | `result_expired` | converter MinIO 已過期清除(7 天 `expires_at` 後)| Job {jobId} result expired at {expires_at}; re-convert to get a fresh result | +| 422 | `invalid_request` | path param 異常 | job id is required | +| **429** | **`rate_limit_exceeded`** | **req/min 或 burst 超限**(**Security 必補**) | **請求頻率過高,請稍後再試**(含 `limit_type: 'burst' \| 'sustained'`) | +| **429** | **`bandwidth_quota_exceeded`** | **1hr/24hr bandwidth quota 超限**(**Security 必補**)| **下載額度已用完,請稍後再試**(含 `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`) | +| 502 | `storage_unavailable` | MinIO 連不上 / `getObjectStream` throw | 無法讀取結果檔,請稍後重試 | +| **503** | **`service_busy`** | **Concurrent stream cap 達到上限**(**Security 必補**)| **伺服器忙碌中,請稍後再試**(含 `limit_type: 'concurrent'`、`Retry-After: 30`)| +| **503** | **`stream_timeout`** | **response stream 超時(5 分鐘)**(**Security 必補**)| **下載逾時,請重試** | +| 503 | `service_unavailable` | API key 未配置 / 其他暫時性錯誤 | API key not configured | + +### 4.2 status code 選擇邏輯 + +``` +if (API key invalid) → 401 +if (job not in Redis) → 404 job_not_found +if (job.status !== 'completed') → 409 job_not_completed +if (job.expires_at < now) → 410 result_expired +if (no nefKey extractable) → 404 result_not_found +if (minio.getObjectStream throw) + - if MinIO not found error → 410 result_expired + - else → 502 storage_unavailable +``` + +### 4.3 順序的重要性 + +**先檢查 status 再檢查 expires_at**:若 job 還 running,回 409 比 410 更精確(resource 還在、只是還沒完成)。 + +**最後檢查 nefKey extractable**:404 `result_not_found` 是「job 完成但沒 NEF」的特殊情境,應該不會發生(因為 NEF 是最後一階段、completed 就一定有),但保險。 + +### 4.4 Stream 中斷處理 + +Stream 開始後(headers 已送出)若 MinIO stream 出錯: + +- **不能改 status code**(headers 已發) +- 唯一動作:`res.destroy(streamErr)` + log ERROR + client 看到 `ECONNRESET` +- Client(visionA)應實作 retry 邏輯 + +Client 主動關連線(`req.on('close')`): + +- 主動 `result.stream.destroy()` 釋放 MinIO connection +- Log INFO(不算錯) + +--- + +## 5. 與既有 endpoint 的關係 + +### 5.1 Job lifecycle 對應 + +``` +created + ↓ (Worker 處理) +running (stage = onnx → bie → nef) + ↓ +COMPLETED + result_object_keys.nef 有值 + ↓ ───→ GET /jobs/:id/result → 200 + NEF stream + ↓ + (7 天後 expires_at 過了) + ↓ +expired(NEF 在 MinIO 已被 lifecycle 清掉,job record 可能還在 Redis) + ↓ ───→ GET /jobs/:id/result → 410 result_expired +``` + +### 5.2 與 `/promote` 的關係 + +`/result` 與 `/promote` **獨立**: + +- `/promote`:把 NEF 從 Converter Bucket 搬到 FAA NAS Bucket(長期儲存) +- `/result`:從 Converter Bucket streaming 給 caller + +→ visionA 可以同時打兩個(promote 後 NAS 有檔;result 立即下載給 user)。 + +NEF 在 Converter Bucket 7 天後過期清掉、FAA NAS Bucket 永久(由 FAA 端 lifecycle 管理)。**過期後 `/result` 回 410,client 該重新轉檔**(不應該 fallback 去 FAA 拿,那會繞回 delegated download token 死路)。 + +### 5.3 與 Phase 2 預留 `/download-tokens` 的關係 + +`POST /api/v1/jobs/:id/download-tokens` 在 Phase 2 預留(回 501)。**不衝突**: + +- `/download-tokens`:未來給 browser 直連 converter download 用的 short-TTL token +- `/result`:給 visionA backend stream proxy 用 + +兩個用途不同,可共存。Phase 0.8b 不啟用 `/download-tokens`。 + +--- + +## 6. 實作細節 + +### 6.1 NEF object key 解析(雙路徑) + +對齊 promote 流程的 `getJobOutputKey`: + +```javascript +function extractNefObjectKey(job) { + // 新格式 + if (job.result_object_keys + && typeof job.result_object_keys === 'object' + && typeof job.result_object_keys.nef === 'string' + && job.result_object_keys.nef.length > 0) { + return job.result_object_keys.nef; + } + // 舊格式(向後相容) + if (job.output + && typeof job.output === 'object' + && typeof job.output.nef_path === 'string' + && job.output.nef_path.length > 0) { + return job.output.nef_path; + } + return null; +} +``` + +### 6.2 Streaming 流程 + +```javascript +router.get('/', async (req, res, next) => { + try { + const jobId = req.params.id; + if (!jobId) return next(new ApiError(400, 'invalid_request', 'job id is required')); + + // 1. 拿 job record + const job = await jobService.getJob(jobId); + if (!job) return next(new ApiError(404, 'job_not_found', `Job ${jobId} not found`)); + + // 2. 檢查 status + if (job.status !== 'COMPLETED') { // internal status 是大寫 + return next(new ApiError(409, 'job_not_completed', + `Job ${jobId} is ${job.status}; result only available after completion`)); + } + + // 3. 檢查 expires_at + if (job.expires_at && new Date(job.expires_at) < new Date()) { + return next(new ApiError(410, 'result_expired', + `Job ${jobId} result expired at ${job.expires_at}`)); + } + + // 4. 解析 NEF object key + const nefKey = extractNefObjectKey(job); + if (!nefKey) { + return next(new ApiError(404, 'result_not_found', + `Job ${jobId} completed but no NEF result available`)); + } + + // 5. 從 MinIO 拿 stream + let result; + try { + result = await minioStorage.getObjectStream(nefKey); + } catch (err) { + logEvent({ level: 'ERROR', action: 'result.minio_failed', /* ... */ }); + return next(new ApiError(502, 'storage_unavailable', /* ... */)); + } + if (!result) { + return next(new ApiError(410, 'result_expired', + `Job ${jobId} NEF object not found in storage (likely expired)`)); + } + + // 6. 設 headers + res.setHeader('Content-Type', result.contentType || 'application/octet-stream'); + if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength)); + res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`); + + // 7. Stream pipe + result.stream.on('error', (streamErr) => { + logEvent({ level: 'ERROR', action: 'result.stream_error', /* ... */ }); + if (!res.destroyed) res.destroy(streamErr); + }); + req.on('close', () => { + if (result.stream && typeof result.stream.destroy === 'function') { + result.stream.destroy(); + } + }); + result.stream.pipe(res); + } catch (err) { + return next(err); + } +}); +``` + +### 6.3 為什麼用 mergeParams + +router 掛在 `/jobs/:id/result`、handler 用 `/` path、`mergeParams: true` 才能讀到 `:id`: + +```javascript +const router = express.Router({ mergeParams: true }); +router.get('/', handler); +// in createV1Router: +router.use('/jobs/:id/result', requireApiKey(), perClientLimiter, createResultRouter({ ... })); +``` + +### 6.4 Log 規則 + +| 場景 | level | action | +|------|-------|--------| +| Happy path(200)| INFO | `result.success`(含 job_id、size_bytes、duration_ms) | +| 404 / 409 / 410 | INFO | `result.not_available`(含 reason) | +| 502 MinIO 失敗 | ERROR | `result.minio_failed`(含 error_name、error_code,不 log MinIO endpoint) | +| Stream 中斷(已送 headers)| ERROR | `result.stream_error` | +| Client 主動斷線 | INFO | `result.client_closed` | + +--- + +## 7. Test 範圍(Backend 實作 + Testing 驗證) + +### 7.1 Integration test(必做) + +- ✅ Happy path(200):completed job + 有 NEF + 不過期 → 完整 stream NEF binary、Content-Type / Content-Length / Content-Disposition 正確 +- ❌ 401(missing API key) +- ❌ 401(wrong API key) +- ❌ 404 `job_not_found`(jobID 不存在) +- ❌ 404 `result_not_found`(completed 但沒 NEF) +- ❌ 409 `job_not_completed`(status = ONNX / BIE / NEF / FAILED) +- ❌ 410 `result_expired`(expires_at 在過去) +- ❌ 410 `result_expired`(MinIO `getObjectStream` 回 null) +- ❌ 502 `storage_unavailable`(MinIO throw) +- ❌ 503 `service_unavailable`(CONVERTER_API_KEY 未設定 — 但其實這在 middleware 層、走全部 endpoint 都會中) + +### 7.2 Unit test + +- `extractNefObjectKey`:新格式、舊格式、缺失 → null +- `buildFilename`:標準情境、缺 source_filename、缺 platform、副檔名變體(.onnx / .pt / .tflite) +- Stream error handling(mock stream emit error) +- Client close handling(mock req emit close) + +### 7.3 Stress / 邊界 test(選做) + +- 大檔 stream(200MB NEF)— 確認記憶體不爆 +- 多並發 stream(10 個 client 同時下載)— 確認 Scheduler 不掛 +- Slow client(client 收得慢)— 確認 stream 不會無限堆 buffer + +--- + +## 8. Curl 範例 + +```bash +# Happy path +curl -i \ + -H "Authorization: Bearer $CONVERTER_API_KEY" \ + https://converter.innovedus.com/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/result \ + -o output.nef + +# 預期: +# HTTP/1.1 200 OK +# Content-Type: application/octet-stream +# Content-Length: 12345678 +# Content-Disposition: attachment; filename="yolov5s_720.nef" +``` + +```bash +# 過期情境 +curl -i \ + -H "Authorization: Bearer $CONVERTER_API_KEY" \ + https://converter.innovedus.com/api/v1/jobs/expired-job-id/result + +# 預期: +# HTTP/1.1 410 Gone +# Content-Type: application/json; charset=utf-8 +# {"error":{"code":"result_expired","message":"...","request_id":"..."}} +``` + +--- + +## 9. Rate Limit + Bandwidth Quota(Phase B 設計,**Security 2026-05-17 review 後修正**) + +> **重要變更**:原 60 req/min single tier 設計被 Security review(`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md` §1 Q3 / Major M4)否定。 +> **新設計**:two-tier req-based limit + bandwidth quota two-tier。理由:req-count 無法區分大檔/小檔,`/result` 核心攻擊面是**頻寬**不是 req count。 + +### 9.1 為什麼 `/result` 要獨立 rate limit + bandwidth quota + +| | `/jobs` 寫入端點(既有) | `/result` 下載端點(Phase B) | +|--------------|----------------------|---------------------------| +| 既有配額 | 300 req / 5 min per client_id | — | +| 工作負載成本 | CPU(multer parse)+ MinIO write | MinIO read + 持續 streaming(可達 100MB+ / req) | +| Blast radius(attacker 拿到 key) | 占用 worker queue / 灌滿 MinIO | 流量放大鏡:1 個 jobID = 100MB+ 下載;快速耗盡頻寬 | +| 限流軸 | req count 為主 | **req count + bandwidth 雙軸**(攻擊面在頻寬而非次數) | + +Security 量化分析(review §1 Q3): + +| 設計 | Normal user(P95 ≈ 120 req/min)| Attacker(每 req 100MB)| +|------|-------------------------------|------------------------| +| 原 60 req/min | **過嚴**(誤殺 retry burst)| **過寬**(6 GB/min = 800 Mbps、8.6 TB/day = $770/day cloud egress)| +| 新 two-tier + bandwidth quota | 充分(20 req/min sustained + 5 req/10s burst 覆蓋 retry pattern)| 1 GB/hr ceiling 直接堵頻寬攻擊 | + +`/result` 開放後是「流量放大鏡」:attacker 拿到 key 後不在乎 req 次數、在乎每次能拉多少 byte。**只擋次數不擋頻寬 = 沒有實質保護**。 + +### 9.2 限流軸總表 + +| 限制軸 | 數值 | 用途 | bucket key | +|--------|------|------|-----------| +| **Burst rate** | 5 req / 10 sec | 阻擋短時間 burst | `token_fingerprint` | +| **Sustained rate** | 20 req / min | 涵蓋 visionA P95 normal load(120 req/min ÷ 10 caller ≈ 12 req/min/key、留 1.7× headroom);阻擋持續 mass request | `token_fingerprint` | +| **Bandwidth hourly** | 1 GB / hr | 阻擋大量 NEF 下載(attacker 撞滿 = 24 GB/day、可控成本)| `token_fingerprint` | +| **Bandwidth daily** | 6 GB / 24hr | 阻擋 attacker 用「每小時剛好 1 GB」迴避 hourly limit | `token_fingerprint` | + +**bucket key 用 `token_fingerprint`(A.7 已實作 SHA-256)**: + +- 不用 `clientId`:當前 1:1 trust 下所有 caller 都是 `'visionA-service'`、bucket 平坦化、無區分力 +- `token_fingerprint` 在 Phase 0.8b 1:1 trust 下實際等同 caller id;Phase 2 引入 per-caller credential 後自動對齊 +- forensic 用途:audit log 已記錄 `token_fingerprint`、限流統計與 forensic 同 key 可 cross-correlate + +### 9.3 為什麼 two-tier req limit + +Single tier 處理不了 visionA exponential backoff retry pattern: + +| Retry 間隔 | 10 sec window 內 req 數 | 1 min window 內 req 數 | +|-----------|-----------------------|----------------------| +| 1s / 5s / 15s(visionA `ConverterClient.GetResult` 預設)| 1-2 個 | 3-4 個 | + +- **Burst tier(5 req / 10s)**:允許 retry burst(不誤殺合法 retry) +- **Sustained tier(20 req / min)**:阻擋持續高頻 request(attacker 不靠 burst 而是穩定打) +- 兩者**同時生效**:任一觸發即 429 + +### 9.4 為什麼 bandwidth quota 必補 + +`req count` 無法區分 attack pattern: + +| 場景 | 20 req / min 是否擋住 | 實際 bandwidth | +|------|-------------------|--------------| +| Normal user 拿 1 個 100MB NEF | ❌ 不擋(合法)| 100MB(合法)| +| Attacker 用 20 req/min × 6hr × 100MB | ❌ 不擋(剛好踩線)| **720 GB / 6hr ≈ $65 cloud egress / event** | + +加上 1 GB/hr bandwidth quota 後: + +| 場景 | bandwidth quota 是否擋住 | +|------|----------------------| +| Normal user 拿 1 個 100MB NEF | ❌ 不擋(10 個 NEF/hr 內合法)| +| Attacker 撞 20 req/min 但 size 大 | ✅ 第 10-11 個 NEF 後 429 `bandwidth_quota_exceeded` | +| Attacker 每小時剛好 1 GB 迴避 hourly | ✅ 第 6 hr 後 daily quota 觸發 | + +### 9.5 設計:複用 + 新增 middleware + +**Req-based rate limit(複用既有 factory)**: + +- 沿用 `src/middleware/perClientRateLimit.js` 的 `createPerClientRateLimiter` factory +- 建立 **2 個獨立 limiter instance**(burst + sustained),都用 `token_fingerprint` 為 bucket key(**需注入新 `keyGenerator`**,原 factory 用 `req.auth.clientId`) +- 不需改 factory 介面、只需擴充 `keyGenerator` opts 注入點 + +**Bandwidth quota(新增 middleware)**: + +- 新檔 `src/middleware/resultBandwidthQuota.js`(不複用 perClientRateLimit、語意不同) +- In-memory counter(Phase 1 / Phase B:單 instance 部署、Map / 物件即可) +- pre-check + post-stream incr 雙階段(見 §9.7 實作骨幹) +- Phase 2 多 instance 部署前必須切 Redis(候補 #8) + +### 9.6 Status code + response + +**Req-based limit hit**: + +```http +HTTP/1.1 429 Too Many Requests +Retry-After: 30 +RateLimit-Limit: 20 +RateLimit-Remaining: 0 +RateLimit-Reset: 1700000000 +Content-Type: application/json + +{ + "error": { + "code": "rate_limit_exceeded", + "message": "請求頻率過高,請稍後再試", + "details": { + "limit_type": "sustained" | "burst", + "retry_after_seconds": 30 + }, + "request_id": "..." + } +} +``` + +**Bandwidth quota hit**: + +```http +HTTP/1.1 429 Too Many Requests +Retry-After: 3600 +Content-Type: application/json + +{ + "error": { + "code": "bandwidth_quota_exceeded", + "message": "下載額度已用完,請稍後再試", + "details": { + "limit_type": "bandwidth_hourly" | "bandwidth_daily", + "retry_after_seconds": 3600 + }, + "request_id": "..." + } +} +``` + +**為什麼 429 不是 503**: + +- 429(RFC 6585)= client 端 request rate / quota 超標、client 應降速 + 指數退避 +- 503 = server 暫時不可用、client 應 retry-as-is、不應降速 +- 兩者語意不同;visionA 端的 retry 邏輯必須依此 code 區分 + +### 9.7 Wire 順序 + 實作骨幹 + +```javascript +// src/routes/v1/index.js +const resultBurstLimiter = createPerClientRateLimiter({ + windowMs: 10 * 1000, // 10 sec + max: 5, // 5 req / 10s + keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown', // ← 新 keyGen + errorDetails: { limit_type: 'burst' }, +}); +const resultSustainedLimiter = createPerClientRateLimiter({ + windowMs: 60 * 1000, // 1 min + max: 20, // 20 req / min + keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown', + errorDetails: { limit_type: 'sustained' }, +}); +const resultBandwidthQuota = createResultBandwidthQuota({ + hourlyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES) || 1 * 1024 * 1024 * 1024, + dailyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES) || 6 * 1024 * 1024 * 1024, + keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown', +}); + +router.use('/jobs/:id/result', + requireApiKey(), // 1. auth 先過、未驗證 401 + resultBurstLimiter, // 2. burst tier + resultSustainedLimiter, // 3. sustained tier + resultBandwidthQuota, // 4. bandwidth pre-check + post-stream incr + resultStreamSemaphore, // 5. concurrent stream cap(見 §15) + createResultRouter({ ... })); +``` + +**順序原則**:auth 在最前(避免未驗證流量耗 quota slot);req-based 在 bandwidth 之前(req limit 比 bandwidth pre-check 廉價)。 + +### 9.8 Bandwidth quota 實作骨幹 + +```javascript +// src/middleware/resultBandwidthQuota.js(新檔) +function createResultBandwidthQuota({ hourlyLimitBytes, dailyLimitBytes, keyGenerator }) { + // In-memory counter(Phase 2 切 Redis) + // 結構:Map + const counters = new Map(); + + function getOrCreate(key) { + const now = Date.now(); + let c = counters.get(key); + if (!c) { + c = { hourlyBytes: 0, hourlyResetAt: now + 3600_000, + dailyBytes: 0, dailyResetAt: now + 86_400_000 }; + counters.set(key, c); + } + // window reset + if (now >= c.hourlyResetAt) { c.hourlyBytes = 0; c.hourlyResetAt = now + 3600_000; } + if (now >= c.dailyResetAt) { c.dailyBytes = 0; c.dailyResetAt = now + 86_400_000; } + return c; + } + + return function middleware(req, res, next) { + const key = keyGenerator(req); + const c = getOrCreate(key); + + // Pre-check:用 Content-Length 估算(從 MinIO HEAD 拿、塞 req.estimatedSize) + // 若 pre-check 不確定 size、conservatively 用最大 NEF size(如 500MB)估 + // 注意:實際 quota 觸發在 stream 結束時 incr、pre-check 用於避免「一口氣下載超 quota」 + const estSize = req.estimatedResultSize || 0; + if (c.hourlyBytes + estSize > hourlyLimitBytes) { + const retryAfterSec = Math.ceil((c.hourlyResetAt - Date.now()) / 1000); + logAudit({ level: 'WARN', action: 'result.bandwidth_quota_exceeded', + limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec, + /* + A.7 五欄 + /result 四欄 */ }); + res.setHeader('Retry-After', retryAfterSec); + return next(new ApiError(429, 'bandwidth_quota_exceeded', + '下載額度已用完,請稍後再試', { limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec })); + } + if (c.dailyBytes + estSize > dailyLimitBytes) { + const retryAfterSec = Math.ceil((c.dailyResetAt - Date.now()) / 1000); + // 同上、limit_type: 'bandwidth_daily' + // ... + } + + // 在 res.on('finish' / 'close') 累計實際 stream 過的 bytes + res.once('finish', () => { + const bytesStreamed = res._bytesStreamed || 0; // handler 在 stream.on('data') 累計 + c.hourlyBytes += bytesStreamed; + c.dailyBytes += bytesStreamed; + }); + + next(); + }; +} +``` + +**為什麼 pre-check + post-stream 雙階段**: + +- Pre-check 防「一次性過量」:若已用 950MB、再來 200MB request 直接拒、不浪費頻寬 +- Post-stream incr 才是 ground truth:實際 stream 過的 bytes(含中斷、含 partial)才算 +- 兩階段組合在 worst case(attacker 同時打多個剛好不過 pre-check)下、最多多放 N × max_size(N = concurrent stream cap = 10、見 §15) + +### 9.9 Multi-instance 部署的限制 + +當前 in-memory store(per-process counter)。Phase 1 / 0.8b 部署是單 instance、可接受。 + +**Phase 2 多 instance 部署前必做**(已升 HIGH、見 security.md 候補 #8): + +- 切 Redis store(perClientRateLimit factory 已有 `opts.store` 注入點、bandwidth quota 用 Redis `INCRBY` + `EXPIRE` counter) +- 不然 quota 會被「乘以 instance 數」放鬆: + - 2 instance × 20 req/min = 40 req/min 實際 quota + - 2 instance × 1 GB/hr = 2 GB/hr 實際 bandwidth quota +- 同時影響 burst / sustained / hourly / daily 四軸、不可只切其一 + +**Redis 切換時的 audit log**:切換期間應記錄 `service.rate_limit_store_switched` 事件(含 from / to / timestamp)、forensic 用。 + +### 9.10 與 Q4 audit log 的關係 + +每次限流命中都必寫 audit log(見 §11 事件清單): + +- `result.rate_limited`(含 `limit_type: 'burst' | 'sustained'`、`token_fingerprint`、`retry_after_seconds`) +- `result.bandwidth_quota_exceeded`(含 `limit_type: 'bandwidth_hourly' | 'bandwidth_daily'`、`token_fingerprint`、累計 bytes、retry_after_seconds) + +forensic 用途:cluster 同 fingerprint 的限流命中 → 識別 attack pattern / abuser key。 + +--- + +## 10. Range Header / Partial Download 防護(**Security 2026-05-17 review 加強**) + +> **Security 必補三件事**(review §1 Q2 / Major M3): +> 1. Server **必須**在 response 加 `Accept-Ranges: none` header(明示不支援、不是省略) +> 2. 收到 Range header 時 server **silently ignore + 回 200 整段**(不回 416、不回 206) +> 3. 收到 Range header 時 **必寫 audit log `result.range_attempted`**(forensic 用、INFO level) + +### 10.1 攻擊面分析 + +HTTP `Range` request(RFC 7233)讓 client 拿檔案的特定 byte range。對 `/result` 這類大檔 streaming endpoint 是已知 attack vector: + +| 攻擊向量 | 描述 | 對本系統 risk | +|---------|------|--------------| +| **Existence probing** | Attacker 用 `Range: bytes=0-0` 探測檔案存在(取極小 byte 確認 200 vs 410/404 差異) | 即使有 410 / 404 區分、attacker 已能 enumerate jobID。但 Phase 0.8b 已接受「拿到 key = 可下載任意 jobID」(security.md §Trust Boundary)、existence probing 的邊際 risk 接近 0 | +| **Range request DoS** | Attacker 發送大量小 Range request(每個 1 byte)、每個都觸發 MinIO read overhead、放大 server load | 有風險、但 §9 rate limit 60 req/min 上限了單一 client 的 burst | +| **Overlapping range exhaustion** | Multiple ranges in single request(`Range: bytes=0-100, 200-300, ...`)、parser 處理多 range merge 邏輯有 CVE 史(Apache CVE-2011-3192 / Nginx CVE-2022-41741 等) | 若實作 Range 必須謹慎處理 multipart/byteranges response、增加 attack surface | +| **Slow Range pattern** | Attacker 故意以慢速 Range 連線、長時間占用 MinIO connection pool | TCP 層 + Node Stream backpressure 已 mitigate、但 Range 多連線會放大 | + +### 10.2 設計選擇 + +| 方案 | 描述 | 評估 | +|------|------|------| +| **A. 不支援 Range(推薦)** | 收到 Range header 時 **silently ignore**、回 200 + 完整 stream | 簡單、attack surface 最小 | +| B. 支援 Range + 防護 | 實作 single-range 解析、reject multi-range、加 chunk size minimum、rate limit Range count | 對 visionA 沒明確需求、額外 ~200 行 code + test、增加維護成本 | +| C. 支援 Range + 明確拒絕 multi-range | 收到 multi-range → 416 Range Not Satisfiable | 部分 mitigation、仍要實作 single-range parser | + +### 10.3 推薦:方案 A(不支援 Range) + +**理由**: + +1. **visionA 端不需要 Range**: + - `docs/autoflow/04-architecture/conversion.md` v0.6.1 §2.3 ConverterClient.GetResult 為一次性 download、不分段 + - visionA backend 拿到 NEF 後立即 stream 給 browser、沒有 resume / seek 需求 +2. **NEF size 落在 single-request stream 合理範圍**:常見 < 100MB、極端 < 500MB、Node Stream + HTTP/1.1 chunked encoding 可穩定處理 +3. **既有 `minio.getObjectStream` 預期回完整 stream**:實作 Range 需要傳 byteRangeStart / byteRangeEnd 到 MinIO client、增加 API surface +4. **Attack surface 最小**:不解析 Range header、不需處理 multipart/byteranges response、不需 CVE-history-aware parser + +### 10.4 實作細節 + +**收到 Range header 時的行為**: + +```javascript +// src/routes/v1/result.js +router.get('/', async (req, res, next) => { + // ... 既有 1-5 步:拿 job / 檢查 status / expires / nefKey / MinIO stream + + // 6. 設 headers + res.setHeader('Content-Type', result.contentType || 'application/octet-stream'); + if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength)); + res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`); + + // 重要:明確不支援 Range request + // + // 設計:Range header 收到時 silently ignore、回 200 + 完整 stream + // 不回 416:避免 attacker 透過 416 (有 Range support) vs 200 (沒有) 差異探測 + // 不設 'Accept-Ranges: bytes':避免暗示 client 可 retry with Range + res.setHeader('Accept-Ranges', 'none'); // RFC 7233 §2.3 明確標示 server 不支援 + + // 7. Stream pipe(既有) + result.stream.pipe(res); +}); +``` + +**為什麼不回 416**: + +- 416 (Range Not Satisfiable) 是「Range syntactically valid 但檔案範圍不符」 +- 如果 client 帶 Range header、我們回 416、attacker 知道 server **能 parse Range**(只是這次拒絕) +- 如果 client 帶 Range header、我們 silently ignore + 回 200 完整 stream、attacker 看不到 server 是否懂 Range +- 後者更安全(feature detection 失敗)+ 對 well-behaved client 完全相容(沒 Range 也能正常處理 200) + +**為什麼設 `Accept-Ranges: none` 而非省略此 header**: + +- RFC 7233 §2.3:server 應**明示**支援狀態 +- `Accept-Ranges: none` 等於明確告知 client「不要試 Range」 +- 省略 header 時 client 仍可能 speculatively 試 Range(HTTP 預設假設可能支援) + +### 10.5 對 visionA 端的契約變動 + +**API spec 加註**: +- visionA 端發 Range header 不會得到 206 Partial Content;server 永遠回 200 完整 stream +- 若 visionA 未來真有 resume / seek 需求、需重新評估(Phase 2 候補) + +**文件化到 §2.2 Query / Body**:「無;Range header 收到時 ignored」 + +### 10.6 監控建議 + 必寫 audit event + +收到 Range header 時、handler **必須**寫獨立 audit event `result.range_attempted`(不是只在 `result.requested` 加 boolean flag): + +```javascript +// handler 進入後、處理 Range 之前: +if (req.headers && req.headers.range) { + logAudit({ + level: 'INFO', // 不是 WARN——預期 attacker 會 probe、是 forensic baseline、不該觸發告警 + action: 'result.range_attempted', + // A.7 五欄 + source_ip: req.ip, + token_fingerprint: req.auth?.tokenFingerprint, + request_id: req.requestId, + http_method: 'GET', + http_path: req.originalUrl, + // /result 特有 + job_id: req.params.id, + // event 特有 + range_header_received: String(req.headers.range).slice(0, 100), // sanitize 截短 100 字、避免 log injection + }); +} +``` + +**為什麼 INFO 而不是 WARN**: + +- Range header 本身**不是 attack**(HTTP/1.1 standard、許多 client 自動發) +- 但 Range header 出現在 `/result` 是 **anomalous signal**(visionA 端不會發)→ 值得記錄、不該告警 +- WARN 留給「真正異常」(rate limited / stream timeout / minio failed) + +**正常 vs 異常 pattern**: + +- 正常情境:`range_header_received` 欄位幾乎不出現(visionA 不發 Range) +- 異常 pattern:突然出現大量 `result.range_attempted` 同 token_fingerprint → 可能有 attacker 試探 Range support / 試探不同 byte range + +**Anomaly detection 候選**(Phase 2): + +- alert:同 fingerprint 1 小時內 > 10 次 `result.range_attempted` → 觸發人工 review +- alert:同 fingerprint 短時間內試多個不同 Range value → 觸發 forensic snapshot + +**注意:原 `result.requested` 已被新事件清單取代**(見 §11、改為 `result.streamed` / `result.stream_error` 等明確終態事件、不再有 `result.requested` 進入事件)。 + +--- + +## 11. Audit Log(Phase B 沿用 A.7 pattern,**Security 2026-05-17 review 後擴充**) + +> **Security 必補**(review §1 Q4 / Minor m3): +> - 補 3 個事件:`result.rate_limited`、`result.range_attempted`、`result.stream_timeout`、`result.bandwidth_quota_exceeded`(實作層 4 個) +> - 所有 `result.*` 事件**強制**含 A.7 五欄位 + /result 特有四欄位 +> - 100% 寫(不 sample)—— 流量低(P95 < 1000 req/day)、bandwidth quota forensic 需要完整資料 + +### 11.1 設計原則 + +對齊 `apiKeyMiddleware.js` A.7 audit log pattern: +- 結構化 JSON(stdout) +- 統一用 `console.log` / `console.error`(與既有 audit log infra 一致) +- token 內容絕不寫;fingerprint 由 `requireApiKey` middleware 處理、handler 從 `req.auth.tokenFingerprint` 讀取後寫入每個 audit event +- **每個事件必含 A.7 五欄位**(不可省略)+ **每個 `/result` 事件必含 4 個 endpoint-特有欄位** + +### 11.2 A.7 五欄位(所有事件必含) + +| 欄位 | 來源 | 為什麼必含 | +|------|------|---------| +| `source_ip` | `req.ip`(trust proxy 已設定)| forensic:cluster attacker IP | +| `token_fingerprint` | `req.auth.tokenFingerprint`(A.7 已實作 SHA-256) | forensic:cluster 同 key 攻擊 + rate limit / bandwidth quota bucket key 對齊 | +| `request_id` | `req.requestId`(middleware 設定)| cross-event 追蹤(串 `auth.api_key.authenticated` ↔ `result.*`) | +| `http_method` | `'GET'`(固定)| A.7 對齊(即使固定值也寫、log analysis 一致性)| +| `http_path` | `req.originalUrl` | A.7 對齊、forensic 確認 endpoint | + +### 11.3 `/result` 特有四欄位(按事件類型必含或可選) + +| 欄位 | 何時必含 | 何時可選 / 不適用 | +|------|--------|----------------| +| `job_id` | 所有事件(從 `req.params.id` 取)| — | +| `size_bytes` | `result.streamed`(成功)、`result.stream_error`、`result.client_closed`、`result.stream_timeout`(已 stream 多少)| 4xx 終態事件不適用(還沒開始 stream)| +| `duration_ms` | 所有「終態」事件(streamed / stream_error / client_closed / stream_timeout / not_*)| `range_attempted` 不適用(不是終態)| +| `stream_completed` | `result.streamed`(true)、`result.stream_error`(false)、`result.client_closed`(false)、`result.stream_timeout`(false)| 4xx 終態事件不適用 | + +### 11.4 事件清單(11 個事件、實作覆蓋 Security review Q4 + Architect 原設計) + +| Action | Level | 觸發時機 | 必含欄位(A.7 五欄 + /result 四欄之外) | +|--------|-------|---------|--------------------------------| +| `result.streamed` | INFO | Stream 完整送出(`stream.on('end')` 且 bytes = content_length)| `content_length` | +| `result.stream_error` | ERROR | Stream 中途出錯(MinIO disconnect / network)| `error_type`、`error_message`(截短 100 chars) | +| `result.client_closed` | INFO | Client 主動斷線(`req.on('close')` + bytes < content_length)| — | +| `result.stream_timeout` | **WARN** | response stream 5min timeout 觸發(**Security 必補**)| `timeout_ms`、`bytes_streamed_at_timeout` | +| `result.not_found` | WARN | 404 `job_not_found` / `result_not_found` | `reason: 'job_not_found' \| 'no_nef_key'` | +| `result.not_completed` | WARN | 409 `job_not_completed` | `current_status` | +| `result.expired` | WARN | 410 `result_expired` | `expires_at`、`expired_by_ms`(now - expires_at)| +| `result.storage_unavailable` | ERROR | 502 `storage_unavailable`(MinIO 連不上 / throw)| `error_name`、`error_code`(**不**含 MinIO endpoint URL)| +| `result.rate_limited` | **WARN** | 429 rate limit hit(**Security 必補**)| `limit_type: 'burst' \| 'sustained'`、`retry_after_seconds` | +| `result.bandwidth_quota_exceeded` | **WARN** | 429 bandwidth quota hit(**Security 必補**)| `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`、`bytes_used_in_window`、`retry_after_seconds` | +| `result.range_attempted` | **INFO** | Request 含 Range header(**Security 必補**、forensic baseline)| `range_header_received`(sanitize 截短 100 字)| +| `result.filename_assertion_failed` | ERROR | `buildFilename` assertion 失敗(**defense-in-depth**、見 §13)| `expected_pattern`、`actual_filename`(已 sanitize 截短) | + +**為什麼移除 `result.requested`**: + +- 原設計用 `result.requested` 作為「進入事件」+ 加 `range_header_present: boolean` 表達 Range 偵測 +- 新設計: + - `result.range_attempted` 變成獨立的 forensic event(更清楚的 anomaly signal) + - `auth.api_key.authenticated`(A.7 已寫)已涵蓋「caller 進來」的紀錄、`result.requested` 冗餘 + - 每個 request 必有一個**終態事件**(streamed / stream_error / client_closed / stream_timeout / not_* / rate_limited / bandwidth_quota_exceeded / range_attempted 之一)、用 request_id 串接到 `auth.api_key.authenticated` 即可完整追蹤 + - 減少 log volume(每 request 1 個終態 vs 2 個進入+終態) + +### 11.5 `error_type` 分類(stream 中斷) + +| `error_type` | 觸發 | Level 例外 | +|-------------|------|-----------| +| `minio_disconnect` | MinIO stream emit error / socket reset | — | +| `client_abort` | Client 端先斷(與 `client_closed` 區分:client_closed 是 req close、stream_error 是 stream emit error) | — | +| `network` | 其他 network 層錯誤(DNS / TLS) | — | +| `partial_stream` | `streamCompleted=false` 且 `res.on('finish')` 觸發的 race condition:最可能是 `res.destroy()` 後 underlying socket 先 flush 完 buffered chunk 再 emit `finish`(client 中斷下載 / network slow drain 的衍生情境)、或 backpressure 異常 | **INFO**(覆寫 §11.4 的 ERROR、屬 client-side expected behaviour、不是 attack signal)| +| `unknown` | 兜底 | — | + +**Level 例外處理規則**:§11.4 將 `result.stream_error` 預設為 ERROR、但 `error_type = partial_stream` 的 race condition 屬 expected client behaviour(非 server 異常、非攻擊訊號)、降為 INFO 以避免污染 ERROR alert pipeline。實作位置:`apps/task-scheduler/src/routes/v1/result.js`(`res.on('finish')` handler 內)。 + +### 11.6 audit log 範例 + +**Happy path(成功 stream)**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"auth.api_key.authenticated","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result"} +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:48Z","level":"INFO","action":"result.streamed","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":52428800,"duration_ms":3210,"stream_completed":true,"content_length":52428800} +``` + +**Rate limit hit**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.rate_limited","request_id":"def-456","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":2,"limit_type":"burst","retry_after_seconds":10} +``` + +**Bandwidth quota hit**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.bandwidth_quota_exceeded","request_id":"ghi-789","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":3,"limit_type":"bandwidth_hourly","bytes_used_in_window":1073741824,"retry_after_seconds":2847} +``` + +**Range probing**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"result.range_attempted","request_id":"jkl-012","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","range_header_received":"bytes=0-7"} +``` + +**Stream timeout**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.stream_timeout","request_id":"mno-345","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":1024,"duration_ms":300000,"stream_completed":false,"timeout_ms":300000,"bytes_streamed_at_timeout":1024} +``` + +**Expired**: +```json +{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.expired","request_id":"pqr-678","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":15,"expires_at":"2026-05-10T00:00:00Z","expired_by_ms":604800000} +``` + +### 11.7 不寫 log 的事 + +對齊 A.7 + `auth.md` §1.8 原則: +- ❌ NEF binary 內容(任何 byte) +- ❌ Token 原文(fingerprint 已由 `requireApiKey` middleware 處理) +- ❌ 完整 MinIO endpoint URL(避免 infra topology leak) +- ❌ 完整 `Authorization` header value +- ❌ Stack trace(截短 message 即可) +- ❌ Range header 原文超過 100 字(截短 + 標 `...`) + +### 11.8 Sample 策略 + +**全 100% 寫、不 sample**: + +- 流量低(P95 < 1000 req/day、Phase B 估算) +- bandwidth quota forensic 需要完整資料(任何 byte 都進 quota counter、log 漏寫 = forensic 漏) +- anomalous events(rate_limited / range_attempted / stream_timeout / bandwidth_quota_exceeded)一律 100% 寫、cluster 用 + +**未來如流量上升到 > 100k/day**(Phase 2 candidate): +- 考慮 sample `result.streamed` 到 10% +- 但**保留 100% 的 4xx/5xx + 所有 anomalous events** + +### 11.9 與 forensic 的關係 + +**cross-event 追蹤**:用 `request_id` 串接 `auth.api_key.authenticated`(middleware 已寫)→ `result.*`(handler 寫終態)兩個事件。 + +**新增 cross-fingerprint 追蹤**:用 `token_fingerprint` cluster 同 caller 的所有事件: + +- 1 個 fingerprint + 大量 `result.rate_limited` → identify abuser / mis-configured caller +- 1 個 fingerprint + 大量 `result.range_attempted` → identify Range probing attempt +- 1 個 fingerprint + 大量 `result.bandwidth_quota_exceeded` → identify mass-download attempt +- 1 個 fingerprint + 大量 `result.stream_timeout` → identify slowloris attack + +--- + +## 12. 404 vs 410 區分的 Security Trade-off + +### 12.1 問題 + +§4.1 規定 4 種「找不到」情境: + +| HTTP | error.code | 情境 | +|------|-----------|------| +| 404 | `job_not_found` | jobID 不存在 | +| 404 | `result_not_found` | job 完成但無 NEF | +| 410 | `result_expired` | NEF 已過期清除 | + +**Security 觀點的疑慮**:區分 404 vs 410 讓 attacker 能偵測「jobID 是否曾存在」: + +- 帶 jobID X → 回 404 `job_not_found` → X **從未存在** +- 帶 jobID Y → 回 410 `result_expired` → Y **曾存在、但 NEF 已過期** + +→ Attacker 可枚舉 jobID 空間、區分「unused」vs「used-but-expired」、收集 victim activity pattern。 + +### 12.2 評估:對本系統 risk 接近 0 + +**前提條件**:Phase 0.8b 已接受 `security.md §Trust Boundary` 風險模型——**attacker 拿到 CONVERTER_API_KEY = 可下載任意 jobID 的 NEF**(per-job auth 在 Phase 2 候補 #12)。 + +在這個前提下: + +| Attacker 能力 | 區分 404/410 帶來的 marginal risk | +|-------------|-------------------------------| +| 拿到 key、知道某個有效 jobID | 直接下載 NEF;404/410 區分不增加能力 | +| 拿到 key、不知道有效 jobID | jobID 是 UUIDv4(128 bits)、暴力枚舉不可行(即使區分 404/410 也救不了 attacker) | +| 拿到 key + 某個 leaked / guessable jobID 列表 | 區分 404/410 確實讓 attacker 知道哪些 ID 曾存在;**但他已能直接下載**,知道過期或不過期的價值極低 | + +→ **區分 404/410 的 marginal risk 在當前 trust model 下接近 0**。 + +### 12.3 Trade-off 的另一面:UX / debug 價值 + +保持 404 / 410 區分的好處: + +- visionA 端 client 可區分「user 給錯 jobID」(404、提示 user 重新確認)vs「job 已過期」(410、提示 user 重新轉檔)—— UX 訊息精度差異大 +- TODO-v2 §4.1 已寫定、visionA backend ConverterClient.GetResult 已實作對應 mapping,**改回統一 404 會 break visionA 端契約** +- debug 友善:log 看 410 vs 404 能立即知道是 lifecycle 清除還是錯誤 jobID + +### 12.4 決策:**保持 TODO-v2 §4.1 規格不變**(404 / 410 分開) + +**理由**: + +1. 在當前 trust model 下、區分帶來的 risk 是 marginal 的;攻擊面已被 §Trust Boundary 接受 +2. UX / debug 價值大、且 visionA 端契約已固定 +3. Phase 2 候補 #12(per-job auth)才是根本解;補了 per-job auth 後、attacker 無法下載非自己的 job、404/410 區分問題自動消失 + +### 12.5 同步 Phase 2 候補 + +`security.md` Phase 2 候補 #12 「`/result` per-job authorization」標 **MEDIUM** 優先級(A.7 follow-up §4 已升級)。本決策在 #12 完成前**有效**;完成後可考慮統一錯誤回應為 404(移除 jobID enumeration 攻擊面、與 #12 形成 defense-in-depth)。 + +### 12.6 文件化的事 + +- TODO-v2 §4.1 規格**不變** +- 本 trade-off 寫入 `security.md` 變更歷史(Architect 下次更新 security.md 時補一行 entry) +- audit log(§11)的 `result.not_found` / `result.expired` 區分依此規格 + +--- + +## 13. `source_filename` 寫入點調查 + Backend Acceptance Criteria + +### 13.1 調查結果 + +`buildFilename(job)`(§3.2)讀 `job.source_filename`。**Grep 結論**: + +```bash +grep -rn "source_filename" apps/task-scheduler/src +# 結果:0 命中 +``` + +**現況**:`src/routes/v1/jobs.js` createJob handler 在 `jobRecord` 內**完全沒寫** `source_filename` 欄位(line 721-774 jobRecord 構造處)。Worker / Web UI legacy / API v1 全鏈條都沒有寫入點。 + +**已寫入的相關欄位**: +- `jobRecord.input.filename`(line 741)— **已 sanitized** 的 `safeFilename`(如 `model.onnx` → `model.onnx`、特殊字元 stripped) +- `safeModelFilename` 來自 `sanitizeFilename(modelFile.originalname || 'model')`(validators/createJob.js:240) + +### 13.2 設計選擇 + +| 方案 | 描述 | 評估 | +|------|------|------| +| **A. Backend 在 createJob handler 加 `source_filename`** | jobRecord 新增 `source_filename: modelFile.originalname \|\| null`(**原始未 sanitize**)| 與 §3.2 「sourceFilename 已 sanitized 不再二次 sanitize」說法衝突;存原始 originalname 有 XSS / log injection 風險 | +| **B. Backend 寫 sanitized 的 stem**(推薦) | jobRecord 新增 `source_filename: input.safeFilename`(與 `input.filename` 同值、是已 sanitized 的安全字串)| 對齊 §3.2 假設;無安全風險;冗餘但語意清楚 | +| C. `buildFilename` 改讀 `job.input.filename` | 直接用既有欄位、不新增 `source_filename`| 最少改動;缺點:`input.filename` 是「input file 的 safe name」、語意不是「source filename for output」、未來如改用其他來源(如 metadata)會散在多處 | +| D. Backend 改 `buildFilename` 退化 fallback | 若 `job.source_filename` 缺、改讀 `job.input.filename`;都缺才 fallback 到 `job_.nef`| 容錯性好、但隱式相依造成除錯困難 | + +### 13.3 推薦:方案 B + 容錯保留 D + +**Backend B1 任務 acceptance criteria**: + +#### B1.1 createJob handler 補寫 `source_filename` + +**檔案**:`apps/task-scheduler/src/routes/v1/jobs.js` line 721-774(jobRecord 構造) + +**改動**:在 `input` 物件**之上**(jobRecord 頂層)新增 `source_filename`: + +```javascript +const jobRecord = { + job_id: jobId, + status: 'ONNX', + // ... 既有欄位 + + // Phase B 新增:給 GET /jobs/:id/result 構造 download filename 用 + // 來源是已 sanitized 的 safeFilename(與 input.filename 同值;冗餘但語意清楚) + // 為什麼不存原始 modelFile.originalname: + // - originalname 可能含 XSS / 控制字元 / path traversal pattern + // - 即使 Content-Disposition header 不會被 browser render,仍可能在 log / error message 處被 echo + // - sanitized 版本是 defense-in-depth + source_filename: input.safeFilename, + + input: { + filename: input.safeFilename, + // ... 既有 + }, + // ... 其他既有欄位 +}; +``` + +#### B1.2 acceptance criteria checklist + +- [ ] `jobRecord.source_filename` 寫入點存在(line ~740 附近、`input.safeFilename` 取得後) +- [ ] 寫入的值**必須**是 sanitized 字串(`input.safeFilename`、不是 `modelFile.originalname`) +- [ ] 寫入點在 `claimActiveAndCreate` 之前(jobRecord 構造階段、不要事後 update) +- [ ] 既有 job(無 `source_filename` 欄位)讀取時 `buildFilename` fallback 仍可運作(§3.2 fallback 邏輯) +- [ ] Unit test cover: + - `source_filename` 寫入後讀取(happy path) + - `source_filename` 為空字串時 `buildFilename` fallback + - `source_filename` 為 undefined 時 `buildFilename` fallback(向後相容、既有 job) + +#### B1.3 `buildFilename` 容錯邏輯確認 + +§3.2 fallback 邏輯保留: + +```javascript +function buildFilename(job) { + const sourceFilename = job.source_filename || ''; + const platform = (job.platform || '').toLowerCase(); + const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, ''); + if (stem && platform) { + return `${stem}_${platform}.nef`; + } + return `job_${job.job_id || 'unknown'}.nef`; +} +``` + +**向後相容**:既有 job 無 `source_filename` → fallback 到 `job_.nef`(不會 crash、不會洩漏 jobID 以外資訊)。 + +#### B1.4 platform 欄位調查(同步處理) + +`buildFilename` 也讀 `job.platform`。Grep `apps/task-scheduler/src/`: + +```bash +grep -rn "platform" apps/task-scheduler/src/routes/v1/jobs.js | grep -v "//" +``` + +**Backend B1 需驗證**:`job.platform` 在 createJob handler 寫入點存在(透過 `parameters.platform` 或頂層 `job.platform`)。本次 Architect 不在 grep range 內逐行驗證、留給 Backend 在 B1 任務內順手確認。若也缺、補同樣 acceptance criteria。 + +### 13.4 安全考量 + +- **不存原始 `originalname`**:原始檔名可能含 XSS payload / 控制字元 / RTL override / 超長字串 +- **Sanitized 版本已 enforce**:`safeFilename` 經過 `sanitizeFilename`、白名單字元、截長 200、leading-dot removal(見 `security.md §Input Validation`) +- **Content-Disposition header 注入**:filename 寫進 `Content-Disposition: attachment; filename="..."`、若含未跳脫的 `"` 可能 break header;sanitized 版本已禁止 `"` / `\`、安全 + +### 13.4a Defense-in-Depth:Content-Disposition Header Construction(**Security 2026-05-17 review 必補**) + +> **Security 必補**(review §1 Q6 / Minor m1):即使 `sanitizeFilename` 已堵 `"` / `\`、`Content-Disposition` header 仍須**明確** quote-escape + RFC 5987 fallback + buildFilename assertion。這是 defense-in-depth:防後續 sanitize 升級時意外引入 bug。 + +#### (1) Belt-and-suspenders quote-escape + +即使 sanitize 應已堵 quotes / backslash、`Content-Disposition` 寫入時**仍須明確 escape**: + +```javascript +// 在 setHeader 之前: +const filename = buildFilename(job); +const escapedFilename = filename.replace(/[\\"]/g, '\\$&'); // 雙重轉義 \ 和 " +``` + +**為什麼需要**: + +- Sanitize 與 setHeader 在不同檔案、未來 sanitize 升級時可能放寬允許某些字元(如允許中文)、若沒在 setHeader 端 escape、header 注入風險回流 +- Defense-in-depth 原則:每一層都應該負責自己邊界的安全、不依賴上游 + +#### (2) RFC 5987 `filename*` fallback(為未來 unicode 支援預留) + +Phase B 階段 `sanitizeFilename` 限制 ASCII alnum + `._-`、不會有 non-ASCII;但 header construction 仍**預留 `filename*` extended syntax hook**: + +```javascript +res.setHeader('Content-Disposition', + `attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`); +``` + +**為什麼預留**: + +- Phase 2 若放寬 sanitize 允許 unicode(如中文檔名 `模型_kl720.nef`)、必須補 `filename*` parameter(RFC 5987)才能讓 browser 正確顯示 +- 預留 hook 是 zero-cost、Phase 2 unicode 開放時不需改 header construction code +- 對當前 ASCII-only 場景無副作用:`filename` 與 `filename*` 同值、browser 優先用 `filename*`(如支援)、否則 fallback `filename` + +#### (3) buildFilename assertion(防 sanitize 升級時意外 bug) + +`buildFilename` 結尾加 sanitization re-check assertion: + +```javascript +function buildFilename(job) { + const sourceFilename = job.source_filename || ''; + const platform = (job.platform || '').toLowerCase(); + const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, ''); + const candidate = (stem && platform) + ? `${stem}_${platform}.nef` + : `job_${job.job_id || 'unknown'}.nef`; + + // Defense-in-depth assertion: + // 確保 buildFilename 結果仍符合白名單(catch sanitize 升級時意外引入的 bug) + if (!/^[A-Za-z0-9._-]+$/.test(candidate)) { + // 不該發生;fail-secure:log + fallback 到絕對安全的 jobID-only 命名 + logAudit({ + level: 'ERROR', + action: 'result.filename_assertion_failed', + // A.7 五欄 + /result 四欄 + // ... + expected_pattern: '^[A-Za-z0-9._-]+$', + actual_filename: candidate.slice(0, 100), // 截短 100 字、避免 log injection + }); + return `job_${job.job_id || 'unknown'}.nef`; // 絕對安全 fallback(UUIDv4 保證符合白名單) + } + + return candidate; +} +``` + +**為什麼 fail-secure 而非 throw**: + +- `/result` 必須回給 visionA 端 NEF;assertion 失敗時 throw 會導致整個 request 502、影響合法用戶 +- Fallback 到 `job_.nef`(UUIDv4 保證符合白名單)讓 stream 仍能完成、但 audit log 記錄異常 +- 異常頻率應為 0(assertion 觸發即代表上游 sanitize 有 bug、需立即修復);audit log 即告警入口 + +#### (4) Backend assertion test 必補 + +Backend 必補 unit test: + +```javascript +describe('buildFilename assertion', () => { + it('returns sanitized result when input is clean', () => { + expect(buildFilename({ source_filename: 'model.onnx', platform: 'kl720', job_id: 'uuid' })) + .toBe('model_kl720.nef'); + }); + + it('falls back to job_.nef when source_filename has invalid chars', () => { + // 模擬上游 sanitize bug:傳入含 `"` 的 stem(不該發生、但 defense-in-depth) + const result = buildFilename({ source_filename: 'evil".onnx', platform: 'kl720', job_id: 'safe-uuid' }); + expect(result).toBe('job_safe-uuid.nef'); + // + assert audit log 含 result.filename_assertion_failed + }); + + it('falls back when buildFilename result somehow contains invalid char', () => { + // hypothetical:platform 含非法字元(不該發生) + const result = buildFilename({ source_filename: 'model', platform: 'kl 720', job_id: 'safe-uuid' }); + expect(result).toBe('job_safe-uuid.nef'); + }); +}); +``` + +#### (5) `Accept-Ranges: none` header 同步設定 + +§10 已規範、與 Content-Disposition 一起在 §6.2 stream pipe 前 setHeader: + +```javascript +// 在 stream.pipe(res) 之前: +res.setHeader('Content-Type', result.contentType || 'application/octet-stream'); +if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength)); +res.setHeader('Content-Disposition', + `attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`); +res.setHeader('Accept-Ranges', 'none'); // 明示不支援 Range +``` + +### 13.5 對既有 job 的影響 + +`source_filename` 是新增欄位、**不需要 migration**: + +- 既有 job(Phase A 前建立)`source_filename === undefined` → `buildFilename` fallback 到 `job_.nef` +- Phase B 後建立的 job 都有 `source_filename` → 正常 stem-based filename +- 兩種 job 同時存在的過渡期 ~7 天(既有 job 過期清掉後完全消失) + +--- + +## 15. Streaming Resource Limits(**Security 2026-05-17 review 必補**) + +> **Security 必補**(review §1 Q1 / Major M1 + M2):streaming endpoint 特有的攻擊面 mitigation。Phase B 啟動必做、不可延後。 + +### 15.1 攻擊面與限制總表 + +| 攻擊面 | 限制 | 預設值 | 觸發行為 | +|--------|------|-------|---------| +| **Slowloris(慢讀霸佔 connection)** | Stream response timeout | **5 分鐘**(300_000 ms) | destroy res + destroy stream + log `result.stream_timeout` | +| **Connection exhaustion(同時開大量 stream)** | Concurrent stream cap(per-instance) | **10 同時 stream** | 503 `service_busy` + `Retry-After: 30` + log `result.rate_limited`(limit_type: concurrent)| +| **Bandwidth abuse** | (見 §9 bandwidth quota)| 1 GB/hr + 6 GB/24hr | 429 `bandwidth_quota_exceeded`(見 §9)| +| **Disk I/O DoS** | (由 concurrent stream cap 順帶 mitigate)| — | — | + +### 15.2 Stream Response Timeout(M1) + +#### 為什麼必補 + +Node http server 預設: + +- `server.timeout = 0`(**無上限**) +- `server.headersTimeout = 60_000ms`(只管 header 接收) +- `server.requestTimeout = 300_000ms`(Node 18+、管 request 完成) +- **沒有 response write timeout**——attacker 連線後超慢讀(1 byte/30s)可霸佔 Node socket + MinIO upstream connection 數小時 + +#### 設計 + +`res.setTimeout(STREAM_TIMEOUT_MS)`,預設 5 分鐘(300_000 ms)。 + +**5 分鐘 rationale**(量化): + +- NEF 最大 size:500 MB(合理上限、實際 < 100 MB 為主) +- 5 min 最低 throughput 容忍:500 MB / 5 min = **100 MB/min ≈ 1.7 MB/s ≈ 13.3 Mbps** +- 合法 client 即使在中等網路(10 Mbps)也能在 5 min 內拿完整個 500MB +- Attacker 用 < 13.3 Mbps 拉 = 5 min 內必被 timeout 切斷 +- 即使 client 真的有正當原因網路慢(如行動網路 3G)、5 min 上限仍足夠應付實際使用 + +**可由 env 覆寫**:`RESULT_STREAM_TIMEOUT_MS`(預設 300_000)。Phase 2 如有 ultra-large NEF(GB 級)支援、可調高。 + +#### 實作骨幹 + +```javascript +// 在 setHeader 之後、stream.pipe(res) 之前: +const STREAM_TIMEOUT_MS = Number(process.env.RESULT_STREAM_TIMEOUT_MS) || 300_000; +let bytesStreamed = 0; +const streamStartAt = Date.now(); + +// 累計 stream 過的 bytes(給 audit log + bandwidth quota incr 用) +result.stream.on('data', (chunk) => { + bytesStreamed += chunk.length; +}); + +// 設 timeout +res.setTimeout(STREAM_TIMEOUT_MS, () => { + logAudit({ + level: 'WARN', + action: 'result.stream_timeout', + // A.7 五欄 + source_ip: req.ip, + token_fingerprint: req.auth?.tokenFingerprint, + request_id: req.requestId, + http_method: 'GET', + http_path: req.originalUrl, + // /result 四欄 + job_id: jobId, + size_bytes: bytesStreamed, + duration_ms: Date.now() - streamStartAt, + stream_completed: false, + // event 特有 + timeout_ms: STREAM_TIMEOUT_MS, + bytes_streamed_at_timeout: bytesStreamed, + }); + // 同步呼叫 req.setTimeout 確保兩端都被清理 + if (result && result.stream && typeof result.stream.destroy === 'function') { + result.stream.destroy(); + } + if (!res.destroyed) res.destroy(new Error('Response stream timeout')); +}); + +// 另外設 req.setTimeout 同樣值(兩端覆蓋) +req.setTimeout(STREAM_TIMEOUT_MS); + +// 然後 stream.pipe(res) +result.stream.pipe(res); +``` + +### 15.3 Concurrent Stream Cap(M2) + +#### 為什麼必補 + 為什麼新寫不複用 `uploadConcurrency.js` + +**Express + Node 預設無 per-process connection 上限**: + +- Node 預設 `server.maxConnections = Infinity` +- Attacker 用 valid key 一次開 1000 個 `/result` connection、瞬間耗光 fd table(typical 1024-65536)+ MinIO upstream connection + +**為什麼不複用 `uploadConcurrency.js`**: + +| `uploadConcurrency.js` | `resultStreamConcurrency.js`(新寫)| +|----------------------|------------------------------| +| 限制「同一 job_id 不能重複 upload」(per-job key)| 限制「server 整體最多 N 個 stream」(global counter)| +| 語意:互斥(同 job 只能一個 upload)| 語意:容量(server 同時最多服務 N 個 download)| +| 觸發時 409 conflict | 觸發時 503 service_busy + Retry-After | + +兩者**語意不同**、不該複用同一支 middleware;但可以**參考實作結構**(lock acquire / release、`res.once('close')` cleanup)。 + +#### 設計 + +| 參數 | 值 | 理由 | +|------|---|------| +| `maxConcurrent` | **10** | 平衡:normal load(P95 < 5 同時 stream、見 §9.1 量化)的 2× headroom;blast radius(attacker 用 10 個 slow connection 配合 M1 5 min timeout = 最多霸佔 50 min × fd × MinIO connection)可控 | +| `retryAfterSeconds` | 30 | 短 retry 間隔(attacker 撞牆後 30s 內應已釋放部分 slot)| + +**Multi-instance scaling**:當前單 instance、10 是絕對上限;Phase 2 多 instance 時可乘以 instance 數(10 × N)、或切 Redis 分散式 semaphore 維持全局上限。 + +#### 實作骨幹 + +```javascript +// src/middleware/resultStreamConcurrency.js(新檔) +function createResultStreamConcurrencyLimiter({ maxConcurrent, retryAfterSeconds }) { + let activeStreams = 0; + + return { + middleware(req, res, next) { + if (activeStreams >= maxConcurrent) { + logAudit({ + level: 'WARN', + action: 'result.rate_limited', + // A.7 五欄 + source_ip: req.ip, + token_fingerprint: req.auth?.tokenFingerprint, + request_id: req.requestId, + http_method: 'GET', + http_path: req.originalUrl, + // /result 四欄(只有 job_id 適用、其他不適用因為還沒開始 stream) + job_id: req.params.id, + // event 特有 + limit_type: 'concurrent', + retry_after_seconds: retryAfterSeconds, + active_streams_at_reject: activeStreams, + }); + res.setHeader('Retry-After', retryAfterSeconds); + return next(new ApiError(503, 'service_busy', + '伺服器忙碌中,請稍後再試', + { limit_type: 'concurrent', retry_after_seconds: retryAfterSeconds })); + } + + // Acquire slot + activeStreams++; + let released = false; + const release = () => { + if (released) return; + released = true; + activeStreams--; + }; + + // 在 response finish / close / error 任一情境釋放 slot + res.once('finish', release); + res.once('close', release); + res.once('error', release); + + next(); + }, + + // 給 health check / monitoring 暴露 internal state(不要直接寫) + getActiveCount: () => activeStreams, + }; +} +``` + +**Multi-instance 部署的限制**: + +- 當前 in-memory counter(per-process);單 instance 部署可接受 +- Phase 2 多 instance 部署前必做(已升 HIGH、見 security.md 候補 #8):切 Redis 分散式 semaphore +- 不切:N instance × 10 = N×10 實際上限、blast radius 放大 N 倍 + +### 15.4 限制間的協作 + +四個 mitigation 形成 defense-in-depth: + +| 攻擊 vector | M1 stream timeout | M2 concurrent cap | §9 rate limit | §9 bandwidth quota | +|-----------|------------------|------------------|--------------|-----------------| +| **Slowloris**(10 個 slow connection × 6 hr)| ✅ 5 min 切斷每個 | ✅ 10 個上限阻擋第 11 個 | — | — | +| **Connection exhaustion**(1000 個 connection 不讀)| ✅ 5 min 切斷 | ✅ 第 11 個立即 503 | — | — | +| **Mass download**(20 req/min × 100MB)| — | — | ✅ 20 req/min sustained 上限 | ✅ 1 GB/hr 撞牆 | +| **Bandwidth abuse**(少 req 大檔)| — | — | ❌ 不擋 | ✅ 1 GB/hr 撞牆 | +| **Burst attack**(10 req/sec spike)| — | — | ✅ 5 req/10s burst 上限 | — | + +**未列入的攻擊面**: + +- **HTTP/2 stream multiplexing**:Phase 1 仍是 HTTP/1.1、暫不阻擋。Phase 2 上 H2 時、需顯式設 `http2.SETTINGS_MAX_CONCURRENT_STREAMS = 100`(per TCP connection) +- **Compression bomb**:NEF 已是 binary、helmet 預設不開 gzip on octet-stream;確認 nginx / reverse proxy 端也不對 `Content-Type: application/octet-stream` 開 gzip +- **MinIO socketTimeout 對齊**:Phase 2 候補 #16(新增、見 security.md) + +--- + +## 14. 給 Backend 的 Phase B Acceptance Criteria 總清單(**Security 2026-05-17 review 後重寫**) + +> **重要變更**:原 B1-B9 acceptance criteria 因 Security review 發現的 4 Major + 3 Minor 必須擴充。新清單採 **AC-1 到 AC-12** 編號(對齊 Security review §3 / 12 條 acceptance criteria),是 Backend implementer 的 single source of truth。 +> Reviewer 把這份當 checklist;缺任一條 → PR 不接受。 + +### 14.1 Middleware 鏈(AC-1 到 AC-4) + +順序:`requireApiKey → resultBurstLimiter → resultSustainedLimiter → resultBandwidthQuota → resultStreamSemaphore → handler`(quota / semaphore 必須在 auth 之後、避免 unauth 流量擠占 slot) + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **AC-1** | `/result` 套用 `requireApiKey()` middleware | §2.3 | wire 在 v1/index.js、與 jobs/promote 一致;不通過 → 401 + 主動 socket.destroy() | +| **AC-2** | `/result` 套用 two-tier rate limit | §9.2、§9.3、§9.5 | **burst**:5 req / 10 sec + **sustained**:20 req / min;bucket key 用 `token_fingerprint`(不是 clientId);超限 → 429 `rate_limit_exceeded` + `Retry-After` + audit log `result.rate_limited`(含 `limit_type: 'burst' \| 'sustained'`) | +| **AC-3** | `/result` 套用 bandwidth quota | §9.4、§9.8 | **hourly**:1 GB / hr + **daily**:6 GB / 24hr per `token_fingerprint`;in-memory counter(Phase 2 切 Redis);pre-check + post-stream incr;超限 → 429 `bandwidth_quota_exceeded` + `Retry-After` + audit log(含 `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`、累計 bytes)| +| **AC-4** | `/result` 套用 concurrent stream cap | §15.3 | `MAX_CONCURRENT_RESULT_STREAMS = 10`(env 可覆寫);**新寫** `src/middleware/resultStreamConcurrency.js`(**不**複用 `uploadConcurrency.js`、語意不同);release on `res.once('finish' / 'close' / 'error')`;超限 → 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited`(含 `limit_type: 'concurrent'`) | + +### 14.2 Range Header 處理(AC-5、AC-6) + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **AC-5** | Range header silently ignored、明示 `Accept-Ranges: none` | §10.4 | response header 必含 `Accept-Ranges: none`(不省略、不設為 `bytes`);不解析 Range;不回 416;不切片 MinIO request | +| **AC-6** | Range header 寫 audit log `result.range_attempted` | §10.6、§11.4 | request 含 Range header 時必寫 audit event(INFO level、不是 WARN);含 `range_header_received`(sanitize 截短 100 字)+ A.7 五欄 + `job_id` | + +### 14.3 Streaming Timeout / Connection 安全(AC-7、AC-8) + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **AC-7** | Stream response timeout 5 分鐘 | §15.2 | `res.setTimeout(STREAM_TIMEOUT_MS)`(預設 300_000 ms、env `RESULT_STREAM_TIMEOUT_MS` 可覆寫);同步呼叫 `req.setTimeout` 確保兩端覆蓋;timeout 觸發 → `res.destroy()` + `result.stream.destroy()` + audit log `result.stream_timeout` | +| **AC-8** | Stream 結束 / 中斷 / client close cleanup | §4.4、§6.2 | `stream.on('error')`:destroy + log `result.stream_error`(含 `stream_completed: false`、bytes < contentLength);`req.on('close')`:destroy stream + log `result.client_closed`(含 `stream_completed: false`);`stream.on('end')` 且 bytes = contentLength → log `result.streamed`(含 `stream_completed: true`) | + +### 14.4 Audit Log 完整性(AC-9、AC-10) + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **AC-9** | 所有 `result.*` event 必含 A.7 五欄 + /result 四欄 | §11.2、§11.3 | **A.7 五欄**:`source_ip`、`token_fingerprint`、`request_id`、`http_method`、`http_path`(每 event 必含);**/result 四欄**:`job_id`(所有 event)+ `size_bytes` / `duration_ms` / `stream_completed`(按事件類型必含)| +| **AC-10** | 12 個 audit event 全實作 | §11.4 | `result.streamed` / `result.stream_error` / `result.client_closed` / `result.stream_timeout` / `result.not_found` / `result.not_completed` / `result.expired` / `result.storage_unavailable` / `result.rate_limited` / `result.bandwidth_quota_exceeded` / `result.range_attempted` / `result.filename_assertion_failed` — 共 **12 個**(Security review 必補 `rate_limited` / `bandwidth_quota_exceeded` / `range_attempted` / `stream_timeout` 4 個 + Architect 原設計 7 個 + filename assertion 1 個)| + +### 14.5 Filename / Response Header(AC-11、AC-12) + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **AC-11** | `Content-Disposition` defense-in-depth | §13.4a | (1) quote-escape:`filename.replace(/[\\"]/g, '\\$&')`;(2) RFC 5987 fallback:`filename*=UTF-8''${encodeURIComponent(filename)}`;(3) buildFilename assertion:`/^[A-Za-z0-9._-]+$/.test(candidate)`;assertion 失敗 → fail-secure fallback `job_.nef` + audit log `result.filename_assertion_failed`;(4) unit test cover assertion | +| **AC-12** | response 不設 `Accept-Ranges: bytes` | §10.4、§14.2 AC-5 | response header 必須明確 `Accept-Ranges: none`;不可省略;不可設為 `bytes`(與 AC-5 重複強調)| + +### 14.6 Sub-acceptance:既有 `source_filename` 寫入點(B1) + +從原 B1 保留、不算 Security 新增、但 Backend 仍需做: + +| # | 項目 | 章節 | Acceptance criteria | +|---|------|------|-------------------| +| **B1** | `source_filename` 寫入 createJob handler | §13.3 | `jobRecord.source_filename = input.safeFilename`(line ~740)+ unit test cover happy path / fallback | +| **B1.5** | 確認 `job.platform` 寫入點 | §13.3 | Grep 確認、缺則補 | + +### 14.7 Integration Test 場景(必補 6 個 + 原 happy path test) + +**Security review §3.6 必補 5 個 + 原 happy path 系列**: + +| # | 場景 | 對應 AC | 預期行為 | +|---|------|---------|---------| +| **IT-1** | Happy path:completed job + 有 NEF + 不過期 | — | 200 + 完整 stream + Content-Type/Length/Disposition 正確 + Accept-Ranges: none + audit `result.streamed`(stream_completed: true)| +| **IT-2** | Rate limit burst:快速打 6 req / 10 sec | AC-2 | 第 6 個回 429 + `limit_type: 'burst'` + `Retry-After` + audit log | +| **IT-3** | Rate limit sustained:穩定打 21 req / 1 min | AC-2 | 第 21 個回 429 + `limit_type: 'sustained'` + audit log | +| **IT-4** | Bandwidth quota hourly:累積下載超 1 GB | AC-3 | 超限後回 429 `bandwidth_quota_exceeded` + `limit_type: 'bandwidth_hourly'` + audit log | +| **IT-5** | Range header probing:request 含 `Range: bytes=0-7` | AC-5、AC-6 | 仍回 **200 整段**(不是 206、不是 416)+ `Accept-Ranges: none` + audit log `result.range_attempted` 含 `range_header_received: 'bytes=0-7'` | +| **IT-6** | Stream timeout:mock 慢讀 client(每 5s 讀 1 byte)| AC-7 | 5 min 後 server destroy connection + audit log `result.stream_timeout` 含 `timeout_ms: 300000` | +| **IT-7** | Concurrent stream cap:同時打 11 個 stream(mock 慢 stream)| AC-4 | 第 11 個立即回 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited`(concurrent)| +| **IT-8** | Audit log forensic:cross-event 追蹤完整性 | AC-9、AC-10 | 1 個 `request_id` 串接 `auth.api_key.authenticated` → `result.*` 一個終態 event;event 含 A.7 五欄 + /result 四欄 | +| **IT-9** | filename assertion fallback:模擬上游 sanitize bug 傳入含 `"` 的 stem | AC-11 | 回 200 + filename 是 `job_.nef`(不是 `evil".onnx_kl720.nef`)+ audit log `result.filename_assertion_failed` | + +原既有 test(保留): + +- ❌ 401(missing API key / wrong API key) +- ❌ 404 `job_not_found`(jobID 不存在) +- ❌ 404 `result_not_found`(completed 但沒 NEF) +- ❌ 409 `job_not_completed`(status = ONNX / BIE / NEF / FAILED) +- ❌ 410 `result_expired`(expires_at 在過去 / MinIO `getObjectStream` 回 null) +- ❌ 502 `storage_unavailable`(MinIO throw) +- ❌ 503 `service_unavailable`(CONVERTER_API_KEY 未設定) + +### 14.8 不在 Phase B scope(明確不做) + +| # | 項目 | 理由 | 對應 Phase 2 候補 | +|---|------|------|------------------| +| ❌ | per-job authorization(檢查 caller 是否為 job 建立者) | 當前 1:1 trust + client_id 寫死、per-job check 仍會通過;需先做 per-caller credential | #12(MEDIUM)| +| ❌ | Range request support(206 Partial Content)| visionA 端無需求、增加 attack surface | 真有需求再評估 | +| ❌ | HMAC user_id 簽章 | ADR-015 已決策不做 | — | +| ❌ | 多 caller credential(separate API key per service)| Phase 2 真有第二個 caller 時再做 | #13(LOW)| +| ❌ | Multi-instance rate limit / bandwidth quota / concurrent cap Redis store | 當前單 instance、可接受 | #8(**HIGH**、升級)| +| ❌ | MinIO socketTimeout 對齊 stream timeout | Phase 2 evaluate | 候補 #16 | +| ❌ | 4xx 統一回 404(不揭露 lifecycle) | 需配合 #12 per-job auth 啟動 | 候補 #15 | + +### 14.9 環境變數(Backend + DevOps 同步、Phase B 新增) + +| Env | 預設值 | 用途 | +|-----|-------|------| +| `RESULT_STREAM_TIMEOUT_MS` | 300000(5 min)| Stream response timeout(AC-7)| +| `MAX_CONCURRENT_RESULT_STREAMS` | 10 | Concurrent stream cap(AC-4)| +| `RESULT_RATE_LIMIT_BURST_PER_10S` | 5 | Burst rate limit(AC-2)| +| `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | 20 | Sustained rate limit(AC-2)| +| `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | 1073741824(1 GB)| Hourly bandwidth quota(AC-3)| +| `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | 6442450944(6 GB)| Daily bandwidth quota(AC-3)| + +### 14.10 Reviewer 在 Phase B 完成後必驗證 + +| # | 驗證項 | 方法 | +|---|--------|------| +| R1 | `source_filename` 寫入點存在且為 sanitized 字串 | grep `apps/task-scheduler/src/routes/v1/jobs.js` | +| R2 | Token / NEF binary 不出現在任何 log statement | grep `console.log\|console.error` + 人工 review | +| R3 | Two-tier rate limit(burst + sustained)+ bandwidth quota + concurrent cap 四個 middleware 都掛在 `/result`、wire 順序正確 | 看 v1/index.js `router.use('/jobs/:id/result', ...)` | +| R4 | Range header 處理:不解析、不回 416、設 `Accept-Ranges: none`、寫 audit log `result.range_attempted` | 看 handler header 段、grep `416` 無命中 | +| R5 | Stream response timeout 5 min + audit log | grep `res.setTimeout` + grep `result.stream_timeout` | +| R6 | Concurrent stream cap = 10 + 新寫 middleware(不複用 uploadConcurrency)| grep `resultStreamConcurrency` | +| R7 | Audit log 12 個 action 全寫了 + 每個都含 A.7 五欄 + /result 四欄 | grep `action: 'result\.` 至少 12 個 distinct action match + 抽 3 個 event 驗欄位完整 | +| R8 | Content-Disposition quote-escape + RFC 5987 + buildFilename assertion | 看 setHeader / buildFilename + grep `result.filename_assertion_failed` | +| R9 | bandwidth quota bucket key 用 `token_fingerprint`(不是 clientId)| 看 keyGenerator 注入 | +| R10 | OpenAPI spec 更新含 429 / 503 等新 status code | 看 `docs/openapi.yaml` | +| R11 | 6 個新 integration test(IT-2 到 IT-7)全寫 + 4xx 系列原 test 保留 | grep test 檔 | +| R12 | 6 個新 env 在 README / deploy doc 有文件化 | 看 README + `.env.example` | diff --git a/docs/autoflow/04-architecture/auth.md b/docs/autoflow/04-architecture/auth.md new file mode 100644 index 0000000..0701d17 --- /dev/null +++ b/docs/autoflow/04-architecture/auth.md @@ -0,0 +1,286 @@ +# Auth 設計(Phase 0.8b) + +> **scope**:visionA → converter 的對外 API auth;converter → FAA 的 promote auth。 +> +> **狀態**:Phase 0.8b 重寫 — visionA → converter 改用 API key;converter → FAA 仍走 OAuth client_credentials。 +> +> **配套**:`design-doc.md` §3.2 / §3.3、`security.md` Trust Boundary 章節。 +> +> **設計演進**:visionA repo `adr-015-server-to-server-api-key.md` v2.1(為什麼用 API key)。 + +--- + +## 1. visionA → Converter:API key middleware + +### 1.1 設計概要 + +- **Header**:`Authorization: Bearer `(重用既有 Bearer 格式) +- **比對**:`crypto.timingSafeEqual` constant-time compare(防 timing attack) +- **長度**:64 hex chars(`openssl rand -hex 32`) +- **失敗行為**:401 `invalid_token` + 主動 `socket.destroy()`(沿用 OAuth middleware 的 M2 行為) +- **req.auth shape**:通過後設定固定值(無 scope check) + +### 1.2 Middleware 介面 + +```javascript +// src/auth/apiKeyMiddleware.js +function requireApiKey(deps = {}) { + // deps.expectedApiKey 可注入測試;正式環境 lazy load from config + return function apiKeyMiddleware(req, res, next) { ... }; +} +``` + +**使用方式**(取代既有 `requireAuth(scope)`): + +```javascript +const { requireApiKey } = require('../../auth/apiKeyMiddleware'); + +// 取代 requireAuth(config.converter.scopeWrite) +router.post('/jobs', requireApiKey(), perClientLimiter, handler); +router.get('/jobs', requireApiKey(), perClientLimiter, handler); +router.get('/jobs/:id', requireApiKey(), perClientLimiter, handler); +router.post('/jobs/:id/promote', requireApiKey(), perClientLimiter, handler); +router.get('/jobs/:id/result', requireApiKey(), perClientLimiter, handler); +``` + +### 1.3 req.auth shape(通過後) + +```javascript +req.auth = { + sub: 'visionA-service', + clientId: 'visionA-service', + tenantId: null, + scopes: ['converter:job.write', 'converter:job.read'], // implicit full access + raw: { authType: 'api_key' }, +}; +``` + +**為什麼這樣設計**: +- `clientId` 固定值讓既有 per-client rate limiter / log infra 無需修改 +- `scopes` 列兩個值是「兼容性 placeholder」,下游 handler 不會再 check(middleware 已不做 scope check) +- `raw.authType: 'api_key'` 給 log / metrics 分類用,未來如果加回 OAuth 可從這個欄位區分 + +### 1.4 失敗情境 + +| 情境 | HTTP | error.code | 訊息 | +|------|------|-----------|------| +| Missing Authorization header | 401 | `invalid_token` | 缺少或格式錯誤的 Authorization header(需為 Bearer ) | +| Authorization 不是 Bearer 格式 | 401 | `invalid_token` | 同上 | +| Token 為空字串 | 401 | `invalid_token` | 同上 | +| Token 與 CONVERTER_API_KEY 不符(constant-time compare) | 401 | `invalid_token` | API key 驗證失敗 | +| `CONVERTER_API_KEY` env 未設定(fail-fast) | 503 | `service_unavailable` | API key not configured | +| 任何未預期 exception | 401 | `invalid_token` | API key 驗證失敗(兜底,避免 5xx 洩漏內部細節) | + +### 1.5 Constant-time compare 實作 + +```javascript +function constantTimeEquals(a, b) { + if (typeof a !== 'string' || typeof b !== 'string') return false; + const bufA = Buffer.from(a, 'utf8'); + const bufB = Buffer.from(b, 'utf8'); + if (bufA.length !== bufB.length) return false; // 必須先比長度(timingSafeEqual 在長度不同時 throw) + return crypto.timingSafeEqual(bufA, bufB); +} +``` + +**注意**: +- 長度先比是必要的(`timingSafeEqual` 在長度不同時會 throw `RangeError`) +- 長度本身不算 secret(key 長度為公開資訊,本專案固定 64 chars) +- 比較完整 byte,不可截短 + +### 1.6 Destroy socket 行為(M2 沿用) + +對齊既有 `auth/middleware.js` 的 `sendAuthError`: + +1. 設 `Connection: close` header +2. `res.status(401).json({ error: {...} })` 寫 response +3. `res.once('finish', () => req.socket.destroy())` 在 response 寫完後主動斷線 + +**為什麼**:401 後 client 可能還在繼續上傳 500MB body,Node 會持續往 socket buffer 灌資料。destroy socket 防止這個情境吃光記憶體 / 頻寬。 + +### 1.7 Fail-fast 行為(CONVERTER_API_KEY 未設定) + +```javascript +if (!expected || expected === '') { + // log 一次(不印 key) + console.error(JSON.stringify({ + level: 'ERROR', + action: 'auth.api_key.not_configured', + message: 'CONVERTER_API_KEY env not set; rejecting all requests', + })); + // 503 拒絕,不要 silently allow + return sendApiKeyError(req, res, 503, 'service_unavailable', 'API key not configured'); +} +``` + +**為什麼不 throw / process.exit**: +- 不想啟動時就 throw(Web UI legacy 路徑也跑在同 process、應該還能用) +- 但對外 API 必須擋(403 / 503 比 silently allow 安全) + +### 1.8 Log 規則 + +| 場景 | log level | 欄位 | +|------|-----------|------| +| 啟動時 API key 已設定 | INFO | `action: 'config.api_key_enabled'`、`api_key_length`(不印 key 本身) | +| 啟動時 API key 未設定 | WARN | `action: 'config.api_key_not_set'` | +| Middleware 收到 request 但 API key 未配置 | ERROR | `action: 'auth.api_key.not_configured'` | +| Middleware 驗證失敗 | (不 log 個別失敗,避免 log injection;計入 metrics 即可)| — | +| Middleware 驗證成功 | (不 log;下游 handler 會 log request)| — | +| Middleware 兜底 exception | ERROR | `action: 'auth.api_key.unexpected_error'`、`error_message` 截短 100 chars | + +**絕不 log**: +- API key 內容(含 expected 或 received 任何一邊) +- Authorization header 完整內容 +- token / secret 字串 + +--- + +## 2. Converter → FAA:OAuth client_credentials(保留不動) + +### 2.1 範圍 + +Promote 流程(`POST /api/v1/jobs/:id/promote`)中,Converter 以自己的身分取 `files:upload.write` token、PUT 結果檔到 FAA。**Phase 0.8b 完全不動**。 + +詳細 client 行為見既有 `apps/task-scheduler/src/auth/oauthClient.js`(**保留**),本節僅記架構決策。 + +### 2.2 設定 + +| 環境變數 | 用途 | Phase 0.8b 狀態 | +|---------|------|---------------| +| `MEMBER_CENTER_TOKEN_URL` | MC token endpoint | **保留** | +| `KNERON_CONVERTER_CLIENT_ID` | Converter 作為 client 的 ID | **保留** | +| `KNERON_CONVERTER_CLIENT_SECRET` | Converter client secret | **保留** | +| `FILE_ACCESS_AGENT_AUDIENCE` | FAA 的 audience(取 token 時用)| **保留** | +| `FILE_ACCESS_AGENT_BASE_URL` | FAA API base URL | **保留** | +| `PROMOTE_TIMEOUT_MS` | FAA PUT timeout | **保留** | +| `OAUTH_TOKEN_REFRESH_SKEW_MS` | Cache token 距 expiresAt 多少 ms 主動 refresh | **保留** | +| `OAUTH_TOKEN_TIMEOUT_MS` | 取 token 的網路 timeout | **保留** | + +### 2.3 Client 行為(沿用既有) + +- `grant_type=client_credentials` +- `Authorization: Basic base64(client_id:client_secret)`(RFC 6749 §2.3.1) +- `scope=files:upload.write`、`audience=` +- Token cache:per-scope,distance to expiresAt > refreshSkewMs(預設 60s)算 valid +- In-flight Promise dedup(同 scope 並發只發一次 request) +- AbortController timeout(預設 10s) +- 錯誤分類:`OAuthClientError`(4xx,不重試)/ `OAuthServerError`(5xx,可重試)/ `OAuthTimeoutError`(網路 / timeout,可重試) +- FAA 回 401 → `invalidate(scope)` + retry 一次;仍 401 → 503 `auth_service_unavailable` + +--- + +## 3. 砍除清單(Phase 0.8b 移除) + +| 檔案 / 模組 | 處理 | +|------------|------| +| `src/auth/middleware.js`(OAuth resource server)| **砍** | +| `src/auth/jwks.js` | **砍** | +| `src/auth/middleware.test.js` | **砍** | +| `src/auth/jwks.test.js` | **砍** | +| `src/auth/oauthClient.js` | **保留**(promote 用) | +| `src/auth/oauthClient.test.js` | **保留** | +| `src/config.js` 內:`MEMBER_CENTER_ISSUER` / `MEMBER_CENTER_JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*` / `JWT_CLOCK_TOLERANCE_SEC` | **砍** | +| `src/config.js` 內:`MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_*` / `FILE_ACCESS_AGENT_*` / `PROMOTE_TIMEOUT_MS` / `OAUTH_*` | **保留** | +| `src/config.js` 新增:`CONVERTER_API_KEY` | **新增** | +| `.env.example` 移除 OAuth resource server 段、新增 `CONVERTER_API_KEY=` placeholder | **改** | +| `README.md` auth 章節(OAuth → API key)| **改** | +| `docs/openapi.yaml` security scheme(OAuth → bearer / api_key)| **改** | + +### 3.1 砍除的 unit test 範圍 + +- JWT 過期 / 簽章錯 / aud 錯 / iss 錯 / kid 不存在 / scope 不足 / tenant_mismatch +- JWKS cache hit / miss / cooldown / 演算法 pin +- 既有 `routes/v1/jobs.test.js` 內驗 401 / 403 的部分 → 改測 API key 401 + +### 3.2 加入的 unit test 範圍 + +- API key middleware: + - Happy path(正確 key → next() + req.auth 正確) + - Missing Authorization header → 401 + - Authorization 非 Bearer 格式 → 401 + - Token 為空 → 401 + - Token 不符 → 401(constant-time 比對行為驗證 — 不同 prefix 仍須完成比對) + - API key 未設定(env 缺)→ 503 + - destroy socket 行為(response 寫完後 socket 確實被關) + +--- + +## 4. CONVERTER_API_KEY 管理 + +### 4.1 產生 + +```bash +openssl rand -hex 32 +# 輸出 64 hex chars(128 bits 安全強度,遠超 NIST 推薦的 80 bits) +``` + +### 4.2 部署位置 + +| 環境 | 位置 | +|------|------| +| dev | `apps/task-scheduler/.env`(gitignored) | +| stage | docker-compose env / k8s secret | +| prod | docker secret / k8s secret / cloud secrets manager | + +### 4.3 雙端對齊 + +- visionA `.env.stage`:`VISIONA_CONVERTER_API_KEY=` +- converter `.env`:`CONVERTER_API_KEY=` +- **兩端必須完全相同字串** + +### 4.4 Rotation 流程 + +1. 雙端各自 stop deployment(或允許短暫 401 期) +2. `openssl rand -hex 32` 產新 key +3. 更新雙端 `.env` +4. converter 先 redeploy(接受新 key) +5. visionA 後 redeploy(用新 key call) +6. 驗證:`curl -H "Authorization: Bearer " https://converter.../api/v1/health`(雖然 /health 無 auth,但用其他 endpoint 驗) + +**極小停機**(< 1 分鐘)做法:暫時讓 converter 接受新舊兩把 key(middleware 拓展成 array compare),visionA 切到新 key,再砍舊 key。Phase 0.8b 不實作此優化(接受短暫 401)。 + +### 4.5 外洩處理 + +- 立即 rotate 雙端 key +- 檢視 audit log:「在 rotation 前是否有可疑請求」(用 `request_id` + `user_id` 追蹤) +- 若有 anomalous activity(同 client_id 短期內 100+ 不同 user_id),通報 + +--- + +## 5. 與既有 promote 流程的關係 + +``` +visionA-backend → converter: + POST /api/v1/jobs/:id/promote + Authorization: Bearer ← API key + ↓ + converter requireApiKey() middleware 過 + ↓ + converter promote handler: + 1. 讀 job from Redis + 2. status === 'COMPLETED' ? + 3. for each target: + a. minio.headObject(sourceKey) + b. oauthClient.getServiceToken('files:upload.write') ← OAuth client(保留) + c. faaClient.putFile(targetKey, stream, ...) + ↓ + 回 200 + { promoted: [...] } +``` + +→ **API key 只在 converter 對外那一層**;converter 內部對 FAA 仍是 OAuth client_credentials。 + +--- + +## 6. 安全性檢查清單(Phase 0.8b) + +- [x] 用 `crypto.timingSafeEqual` constant-time compare +- [x] 長度先比避免 throw +- [x] 不 log key 內容(含 expected / received) +- [x] Fail-fast:env 未設定不要 silently allow +- [x] Destroy socket 行為對齊既有 OAuth middleware +- [x] req.auth shape 對齊下游 handler 預期 +- [x] OAuth client(promote)程式碼完全不動 +- [x] Secret 不進 git(`.env` 已在 .gitignore,但 Sec C1 history 仍待 rewrite) +- [x] Log 結構化、不含 secret +- [ ] **Backend 實作時驗收**:tests cover 上述全部情境 +- [ ] **Reviewer 驗收**:grep `CONVERTER_API_KEY` 不出現在任何 log statement diff --git a/docs/autoflow/04-architecture/database.md b/docs/autoflow/04-architecture/database.md new file mode 100644 index 0000000..f16578d --- /dev/null +++ b/docs/autoflow/04-architecture/database.md @@ -0,0 +1,187 @@ +# Database 設計 + +> **狀態**:Phase 1 完工 — Phase 0.8b **完全不動**。 +> +> **配套**:`design-doc.md` §3.7、`api/api-jobs.md`。 + +--- + +## 1. 為什麼用 Redis、不用 PostgreSQL + +- Phase 1 資料模式簡單:job 是 state machine、user index 是 key-value +- 既有哲學「Crash 即 Reset」對 Redis 友善(PG 引入持久化反而變複雜) +- Redis Set 做 user 索引足夠(單 user 7 天內 < 10 個 job) +- 未來若要跨 Crash recovery / 多 instance HA,再評估 PG + +--- + +## 2. Key 規劃 + +| Key | 類型 | 用途 | TTL | +|-----|------|------|-----| +| `job:{job_id}` | String (JSON) | Job 完整 record | 7 天 | +| `user:{user_id}:jobs` | Set | 該 user 所有 job_id(不分狀態) | 每次寫入時 `EXPIRE 7d` | +| `user:{user_id}:active_job` | String | 當前 in-progress job_id(= `created` 或 `running`)| 隨 job 結束刪除 | +| `ratelimit:client:{client_id}` | 由 `express-rate-limit` 管理 | per-client_id rate limit | 5 min | +| `queue:onnx` / `queue:bie` / `queue:nef` | Redis Stream | Worker 任務佇列 | — | +| `queue:done` | Redis Stream | Worker 完成事件 | — | +| `queue:progress` | Redis Stream | Worker stage 內進度(選配,Phase 2)| — | + +--- + +## 3. Job record schema + +```jsonc +{ + // 既有欄位 + "job_id": "uuid", + "created_at": "...", + "updated_at": "...", + "status": "ONNX | BIE | NEF | COMPLETED | FAILED", // 內部仍用大寫 + "stage": "onnx | bie | nef | null", + "progress": 0, + "parameters": { + "model_id": 1001, + "version": "0001", + "platform": "520", + "enable_evaluate": false, + "enable_sim_fp": false, + "enable_sim_fixed": false, + "enable_sim_hw": false + }, + "output": { // 舊格式(向後相容) + "bie_path": null, + "nef_path": null, + "onnx_path": null + }, + "result_object_keys": { // 新格式 + "onnx": "jobs/{job_id}/output/out.onnx", + "bie": "jobs/{job_id}/output/out.bie", + "nef": "jobs/{job_id}/output/out.nef" + }, + "error": null, + "origin": "api | web", + "user_id": "visionA-user-12345", + "tenant_id": "uuid-or-null", + "created_by_client_id": "visionA-service", // API key 模式下固定值 + "source_filename": "model.onnx", // Phase 0.8b 新增(/result endpoint filename 用) + "input": { + "filename": "model.onnx", + "object_key": "jobs/{job_id}/input/model.onnx", + "size_bytes": 204800000, + "ref_images_count": 0 + }, + "stage_timings": { + "onnx": { "started_at": "...", "completed_at": "..." }, + "bie": { "started_at": "...", "completed_at": null }, + "nef": null + }, + "stage_progress": 0, + "expires_at": "2026-05-23T12:00:00Z", + "metadata": {}, + "promoted": false, // 冪等性 flag + "promoted_object_keys": [] // 已 promote 的目標 +} +``` + +### 3.1 `source_filename` 欄位 + +Phase 0.8b 新增需求:`/result` endpoint 需要這個欄位構造 download filename。 + +**寫入點**:`POST /api/v1/jobs` handler 在 multer 接收 `model` 檔後,把 `multipart.filename` 寫入 `job.source_filename`(已 sanitized)。 + +**Backend 端 task**:確認 `jobService.createJob` 寫入這個欄位(檢查既有 code、可能已存在;若沒有則補上)。 + +--- + +## 4. 對外 status 映射(不變) + +詳見 `api/api-jobs.md` §5.3。 + +--- + +## 5. User 索引設計 + +### 5.1 Key 寫入時機 + +``` +建立 job: + MULTI + SET job:{id} {...} + SADD user:{user_id}:jobs {id} + EXPIRE user:{user_id}:jobs 604800 + SETNX user:{user_id}:active_job {id} + EXEC + + 若 SETNX 回 0 → 衝突,回滾,回 409 + 若 SETNX 回 1 → 成功 + +完成 / 失敗時: + MULTI + SET job:{id} {...} + DEL user:{user_id}:active_job # 僅在 value == 當前 job_id 時才 DEL + EXEC +``` + +### 5.2 Lua script(claim_active_job) + +```lua +-- KEYS[1] = user:{user_id}:active_job +-- KEYS[2] = job:{job_id} +-- KEYS[3] = user:{user_id}:jobs +-- ARGV[1] = job_id +-- ARGV[2] = job_json +-- ARGV[3] = ttl_seconds + +if redis.call('EXISTS', KEYS[1]) == 1 then + return {'conflict', redis.call('GET', KEYS[1])} +end +redis.call('SET', KEYS[1], ARGV[1]) +redis.call('SET', KEYS[2], ARGV[2]) +redis.call('SADD', KEYS[3], ARGV[1]) +redis.call('EXPIRE', KEYS[3], tonumber(ARGV[3])) +return {'ok'} +``` + +### 5.3 避免 `KEYS *` + +**錯誤做法**:`redis.keys('job:*')` O(N) 阻塞。 + +**正確做法**: +```javascript +const ids = await redis.smembers(`user:${userId}:jobs`); +const pipeline = redis.pipeline(); +for (const id of ids) pipeline.get(`job:${id}`); +const results = await pipeline.exec(); +``` + +--- + +## 6. 記憶體預估 + +- 每個 job record 約 2-4 KB(含 stage_timings 等) +- 每個 user index Set 每個元素 < 40 bytes +- 1000 並發 user × 10 jobs = 10k job record ≈ 40 MB + +Redis 輕鬆。Converter Bucket lifecycle 7 天,Redis 也跟著 TTL 7 天,記憶體上限可控。 + +--- + +## 7. M5 方案 A:先寫 MinIO 後 Lua claim + +避免「拿到 Lua claim 但 MinIO 失敗」需要 rollback Redis 的複雜度: + +- MinIO 失敗 → 直接回 502,Redis 完全乾淨 +- Lua conflict / throw → cleanup MinIO(fire-and-forget,靠 7d lifecycle 兜底) +- enqueue 失敗 → 補償 release Redis + cleanup MinIO + +--- + +## 8. Phase 0.8b 變動 + +**無**。Database 完全不動。 + +唯一相關變動: + +- `created_by_client_id` 在 API key 模式下會固定為 `visionA-service`(middleware 設定 `req.auth.clientId`)— 此為 handler 行為,不是 schema 改變 +- `source_filename` 欄位確認存在(既有實作可能已有;若無,Backend 補上 — 屬於 Phase B 任務) diff --git a/docs/autoflow/04-architecture/design-doc.md b/docs/autoflow/04-architecture/design-doc.md index 30659c9..c4fbb42 100644 --- a/docs/autoflow/04-architecture/design-doc.md +++ b/docs/autoflow/04-architecture/design-doc.md @@ -1,28 +1,36 @@ -# Design Doc — Kneron Model Converter 對外 API(L 級新功能) +# Design Doc — Kneron Model Converter 對外 API ## 作者:Architect Agent -## 狀態:Draft(三方交叉審閱前) -## 最後更新:2026-04-25 -## 範圍:Phase 1(對外 API、OAuth2、File Access Agent 整合、promote)+ Phase 2 規格預留 +## 狀態:Draft(Phase 0.8b 重寫,三方交叉審閱前) +## 最後更新:2026-05-16 +## 範圍:Phase 1 完工 + Phase 0.8b 設計轉向(API key + `/result` endpoint) + +> **auth 設計演進**:本文件反映 Phase 0.8b 拍板後的「目標狀態」(API key + `/result` 中轉)。完整決策歷史見 visionA repo: +> - `docs/autoflow/04-architecture/adr/adr-015-server-to-server-api-key.md` v2.1 — 為什麼 visionA → converter 改用 pre-shared API key +> - `docs/autoflow/04-architecture/adr/adr-016-download-via-converter.md` v1.0 — 為什麼 download 改成 converter `/result` 中轉 +> - converter 端的設計沿革見本 repo `git log docs/autoflow/04-architecture/design-doc.md` ## 變更歷程 | 日期 | 變更 | 作者 | |------|------|------| -| 2026-04-25 | 初版 Draft | Architect Agent | -| 2026-04-25 | 原始模型上傳路徑改為 visionA-backend multipart 直接上傳 Converter;移除 File Access Agent 的 GET/HEAD S2S 需求(R1 / TBD-1 / §5.5 / ADR-002 input 部分);user_id 改放 multipart 欄位 | Architect Agent | +| 2026-04-25 | 初版 Draft;OAuth resource server + promote 設計 | Architect Agent | +| 2026-04-25 | 原始模型上傳路徑改 multipart 直傳;移除 FAA GET/HEAD 相關 | Architect Agent | +| 2026-05-16 | **Phase 0.8b 重寫**:visionA → converter 改 API key;新增 `GET /api/v1/jobs/:id/result` streaming endpoint;保留 converter → FAA OAuth client(promote 用) | Architect Agent | --- ## 0. 文件導讀 -本 Design Doc 聚焦「系統層級架構決策」。若你是工程師要開始寫程式,請看 `TDD.md`(或本專案若拆分為 `TDD-*.md`)。 +本 Design Doc 聚焦「系統層級架構決策」。工程師實作細節請看 `TDD.md` 索引及其子檔案。 對應文件: -- 產品需求:`../02-prd/PRD.md`(§1.2、§4.3、§4.4、§5.5、§5.6、§14、§15) +- 產品需求:`../02-prd/PRD.md` - 使用者流程:`../03-design/user-flow-cross-system.md` - 設計審閱:`../03-design/design-review.md` -- 專案健檢:`../00-onboarding/health-check.md` +- 專案健檢:`.autoflow/00-onboarding/health-check.md` +- 安全設計:`security.md` +- visionA repo 的 ADR-015 / ADR-016(caller 端設計脈絡) --- @@ -30,27 +38,33 @@ ### 1.1 背景 -Kneron Model Converter(下稱 Converter)目前是一個「只有 Web UI 的內部工具」,支援 AI 工程師以圖形化介面執行 ONNX → BIE → NEF 三階段模型轉檔。本次 L 級新功能將其擴展為「對外提供 OAuth2 保護 REST API 的服務」,讓 Innovedus 生態中的其他服務(首個消費者為 VisionA)能以程式化方式整合轉檔能力。 +Kneron Model Converter(下稱 Converter)原本是「只有 Web UI 的內部工具」,支援 AI 工程師以 GUI 執行 ONNX → BIE → NEF 三階段模型轉檔。L 級新功能將其擴展為「對外提供 REST API 的服務」,讓 Innovedus 生態中的其他服務(首個消費者為 visionA-backend)能以程式化方式整合轉檔能力。 -關鍵生態組成: -- **Converter(本專案)**:Node.js + Python Worker,部署在靠近 NAS 網段的位置 -- **visionA-backend**:Go 服務,Converter API 的消費者(Persona C) -- **Member Center**:OAuth2 / OIDC authorization server(C# / OpenIddict) -- **File Access Agent**:tenant 邊界內檔案閘道(C# / ASP.NET Core),駐守 NAS 側,單一 tenant per instance +Phase 1 上線後(2026-04-25 完工、5/9 部署 stage)撞了兩個 root cause、5/16 拍板設計轉向: -### 1.2 系統定位(新架構) +1. **5/9 stage e2e blocker**:MC 沒註冊 `converter:job.read/write` scope、converter image 過舊、FAA OAuth 狀態不明 → ADR-015 拍板 visionA ↔ converter 為 1:1 internal trust、改用 pre-shared API key +2. **5/16 grep MC source**:發現 MC 從未實作 `/file-access/download-tokens` endpoint、delegated download token 鏈從 5/2 寫完到現在一直是斷的 → ADR-016 拍板改成 visionA → converter `GET /api/v1/jobs/:id/result` 中轉 + +### 1.2 生態組成 + +- **Converter(本專案)**:Node.js Task Scheduler + Python Worker,部署在靠近 NAS 網段 +- **visionA-backend**:Go 服務,Converter 對外 API 的唯一 caller(Persona C) +- **Member Center(MC)**:OAuth2 / OIDC authorization server(C#)— Phase 0.8b 後 Converter 對外 API **不再經過 MC**;但 Converter → FAA 仍以 client_credentials 取 token,這條保留 +- **File Access Agent(FAA)**:tenant 邊界內檔案閘道(C# / ASP.NET Core),駐守 NAS 側,single-tenant per instance + +### 1.3 系統定位 ```mermaid flowchart TB subgraph AWS["AWS 側"] - VisionAFE["VisionA 前端"] - VisionABE["visionA-backend
(Go, Phase 1)"] - MC["Member Center
(OAuth2 + JWKS)"] + VisionAFE["visionA 前端"] + VisionABE["visionA-backend
(Go)"] + MC["Member Center
(只給 converter → FAA
取 promote token)"] end subgraph NAS["NAS 側(內部網段)"] subgraph ConverterNode["Converter 部署節點"] - Nginx["Nginx
(public + internal vhost)"] + Nginx["Nginx
(public + internal vhost)"] Scheduler["Task Scheduler
(Node.js Express)"] Redis["Redis
(job state + user index)"] Workers["Workers
(onnx / bie / nef)"] @@ -62,18 +76,23 @@ flowchart TB end VisionAFE -->|HTTPS| VisionABE - VisionABE -->|1. token
(client_credentials)| MC - VisionABE -->|2. POST /api/v1/jobs
multipart: model + user_id
(aud=kneron_converter_api)| Nginx + VisionABE -->|1. POST /api/v1/jobs
multipart
Authorization: Bearer
<API key>| Nginx Nginx -->|public vhost| Scheduler - Scheduler -->|驗 token
(JWKS)| MC - Scheduler -->|multer memory
寫入 input| ConvBucket - Scheduler -->|取 token
(client_credentials,
僅 promote 需要)| MC + Scheduler -->|constant-time
compare API key| Scheduler + Scheduler -->|寫 input| ConvBucket Scheduler -->|put / get job state| Redis - Scheduler -->|enqueue stage| Redis Workers -->|consume queue| Redis Workers -->|read input / write output| ConvBucket - Scheduler -->|讀 MinIO 暫存| ConvBucket - Scheduler -->|promote: PUT 結果檔
(files:upload.write)| FAA + + VisionABE -->|2. GET /api/v1/jobs/:id
(poll, Bearer API key)| Nginx + VisionABE -->|3. POST /api/v1/jobs/:id/promote
(Bearer API key)| Nginx + Scheduler -->|3a. 取 token
client_credentials| MC + Scheduler -->|3b. PUT 結果檔
(files:upload.write)| FAA + + VisionABE -->|4. GET /api/v1/jobs/:id/result
(Bearer API key)
NEW Phase 0.8b| Nginx + Scheduler -->|4a. stream 從 MinIO| ConvBucket + ConvBucket -->|NEF binary stream| Scheduler + Scheduler -->|stream proxy
application/octet-stream| VisionABE classDef aws fill:#ffe0b2,stroke:#ef6c00 classDef nas fill:#c8e6c9,stroke:#2e7d32 @@ -83,45 +102,46 @@ flowchart TB class Nginx,Scheduler new ``` -**關鍵資料流(Happy Path)**: +**關鍵資料流**(Phase 0.8b 後): -1. **取 token**(visionA-backend → Member Center) - visionA-backend 以 `client_credentials` 取得 `aud=kneron_converter_api` 的 access token(scope=`converter:job.write`)。 -2. **建 job**(visionA-backend → Converter,multipart/form-data) - visionA-backend 以 `POST /api/v1/jobs` 直接把原始模型 multipart 上傳到 Converter;Converter 驗 OAuth token 後把檔案寫入 Converter Bucket(`jobs/{job_id}/input/{filename}`),流程與既有 Web UI `POST /jobs` multipart 上傳一致(`multer.memoryStorage()`,`fileSize: 500MB`)。 -3. **轉檔**(Worker pool 處理,順序固定 onnx → bie → nef) - Workers 從 Converter Bucket 讀檔、處理、寫回結果檔。Phase 1 Converter **完全不從 File Access Agent 讀任何東西**。 -4. **polling 進度**(visionA-backend → Converter) +1. **建 job**(visionA-backend → Converter,multipart/form-data) + `POST /api/v1/jobs` 帶 `Authorization: Bearer `。Converter middleware 用 `crypto.timingSafeEqual` constant-time compare API key,通過後把 multipart 寫入 Converter Bucket(`jobs/{job_id}/input/{filename}`)。 +2. **轉檔**(Worker pool 順序處理 onnx → bie → nef) + Workers 從 Converter Bucket 讀寫,產出結果寫回 Converter Bucket。 +3. **Polling**(visionA-backend → Converter) 每 2-5 秒 `GET /api/v1/jobs/:id`。 -5. **promote 到 NAS**(visionA-backend → Converter → File Access Agent) - 使用者在 VisionA 前端按「加進模型庫」時,visionA-backend 呼叫 `POST /api/v1/jobs/:id/promote`,Converter 以自己的 OAuth client 身分取 `files:upload.write` token,把 Converter Bucket 中的結果檔 PUT 到 File Access Agent。 +4. **Promote 到 NAS**(visionA-backend → Converter → MC → FAA) + `POST /api/v1/jobs/:id/promote` 時 Converter 用既有 OAuth client(cached)取 `files:upload.write` token,PUT 結果檔到 FAA。**這條 OAuth 鏈條保留不變**。 +5. **Download**(visionA-backend → Converter,**Phase 0.8b 新增**) + `GET /api/v1/jobs/:id/result` streaming 從 Converter Bucket 把 NEF binary proxy 回 visionA-backend。**不經過 FAA、不經過 MC**。 --- ## 2. 目標與非目標 (Goals and Non-Goals) -### Goals(Phase 1 必達) +### Goals(Phase 0.8b 後) -- [ ] 對外 API 以 OAuth2 Bearer 驗證,對齊 Innovedus Member Center(JWKS 驗簽) -- [ ] Converter 同時具備 **Resource Server**(驗他人 token)與 **OAuth Client**(取自己 token)雙重身分 -- [ ] 提供四個對外端點:`POST /api/v1/jobs`、`GET /api/v1/jobs`、`GET /api/v1/jobs/:id`、`POST /api/v1/jobs/:id/promote` -- [ ] 同使用者同時一個轉檔限制(以 `user_id` 為界,不是 `client_id`) +- [ ] 對外 API(visionA → converter)用 pre-shared API key 驗證 +- [ ] Converter 仍保留 **OAuth Client** 身分(promote 流程用 client_credentials 取 FAA token) +- [ ] 提供 5 個對外端點:`POST /api/v1/jobs`、`GET /api/v1/jobs`、`GET /api/v1/jobs/:id`、`POST /api/v1/jobs/:id/promote`、`GET /api/v1/jobs/:id/result`(**新增**) +- [ ] `/result` endpoint streaming proxy NEF binary(不 buffer,支援數百 MB) +- [ ] 同使用者同時一個轉檔限制(以 `user_id` 為界) - [ ] Recovery 支援(`GET /api/v1/jobs?user_id=...&status=in_progress`) - [ ] 既有 `/jobs/*` 舊路徑保留不動,Web UI 零影響 -- [ ] 部署分流:公網只開 `/api/v1/*`,`/jobs/*` 只在內部網段可達 -- [ ] OpenAPI 3.0 規格產出,供下游整合 -- [ ] API SLA 可觀測(p95、錯誤率、token failure rate) +- [ ] 部署分流:公網只開 `/api/v1/*`,`/jobs/*` 只在內網可達 +- [ ] OpenAPI 3.0 規格產出 +- [ ] API SLA 可觀測(p95、錯誤率) -### Non-Goals(Phase 1 明確不做) +### Non-Goals(Phase 0.8b 明確不做) -- [ ] 使用者直連下載(delegated download)— 阻塞於 Member Center endpoint,延至 Phase 2 -- [ ] Webhook / SSE 對外推送(polling 已足夠,見 ADR-004) -- [ ] Job 取消 / 重試(非本次範圍,API 僅保留路徑) +- [ ] **不再做 OAuth resource server**(visionA → converter 不驗 MC JWT;只有未來真有第二個 caller 才考慮回補) +- [ ] **不做 delegated download token**(MC 沒實作對應 endpoint、ADR-016 改成 `/result` 中轉) +- [ ] Webhook / SSE 對外推送(polling + `/result` 已足夠) +- [ ] Job 取消 / 重試(非本次範圍) - [ ] Job 持久化 / 跨 Crash recovery(維持「Crash 即 Reset」哲學) -- [ ] Web UI 改走新 OAuth 流程(本次 Phase 1 不動,見 ADR-006) -- [ ] 單階段轉換 API 後端對齊(既有 backlog,與本次獨立) -- [ ] 使用者層級 ACL(Converter 不管,責任邊界在 visionA-backend) -- [ ] 跨 tenant 隔離的複雜授權模型(本次設計為 single-tenant per Converter deployment,見 §5.3) +- [ ] Web UI 改走 API key 流程(內網工具,不動) +- [ ] 使用者層級 ACL(責任邊界在 visionA-backend) +- [ ] 多租戶(single-tenant per Converter deployment) --- @@ -129,119 +149,158 @@ flowchart TB ### 3.1 架構模式選擇 -- **選擇**:維持現有 **單體 Task Scheduler(Node.js Express)+ Worker Pool(Python)** 架構。對外 API 以新增路由群的方式加入,不另開新服務。 +- **選擇**:維持現有 **單體 Task Scheduler(Node.js Express)+ Worker Pool(Python)** 架構 - **理由**: - 1. Phase 1 範圍聚焦「多一層 auth + 多一組 API 端點 + promote 時對 File Access Agent 一次寫入」,不足以撐起新服務的運維複雜度。 - 2. 既有 Crash 即 Reset 哲學對單體有利:Scheduler stateless,重啟 = 復原。 - 3. 新舊路徑共用同一份 Redis job record,不需跨服務同步。 + 1. Phase 0.8b 改動範圍是「換 auth middleware + 加一個 streaming endpoint」,不足以撐起新服務的運維複雜度 + 2. 既有 Crash 即 Reset 哲學對單體有利:Scheduler stateless,重啟 = 復原 - **取捨**: - - 代價:Scheduler 單體變胖(預估 +600 行)。可接受,因為 API 介面屬於 I/O 密集,不是 CPU 密集,Node.js 單 process 足以負擔。 - - 若未來 QPS 需求爆增(例如 > 500 RPS),可把 auth middleware 與 OAuth client 抽出為獨立 sidecar,但 Phase 1 不做。 + - Scheduler 單體承擔 download proxy 的網路 I/O;NEF 通常 < 50MB,stream 模式下記憶體足跡受 Node fetch / HTTP buffer 控制(不會 buffer 整個檔) + - 若未來 download QPS 高,可在 Nginx 層加 sendfile / proxy_cache,或把 `/result` 拆出獨立微服務(Phase 1 不做) -### 3.2 分階段架構演進 +### 3.2 Auth 策略:純 API key(1:1 internal trust) -#### Phase 1 架構(本次) +#### 為什麼從 OAuth 改 API key -見 §1.2 的圖。重點變化: +| 維度 | OAuth client_credentials | Pre-shared API key | +|------|--------------------------|---------------------| +| 跨團隊依賴 | 需要 MC 註冊 audience / client / scope | 雙方協議好 secret 即可 | +| 部署阻塞 | 5/9 stage 撞到 MC 沒註冊 scope | 無 | +| trust model | 多 caller、scope-based authorization | 1:1 internal trust,full access | +| code 複雜度 | JWKS cache + JWT verify + scope check | constant-time string compare | +| Token rotation | MC 端管理 | 雙方手動同步 .env | +| 適用情境 | 多 caller、跨組織 | 單一 caller、同組織 | -| 元件 | 是否新增 / 修改 | -|------|---------------| -| Nginx | 新增 public vhost(`/api/v1/*`)與 internal vhost(`/jobs/*`)的分流設定(見 §7) | -| Task Scheduler | 新增 auth middleware、OAuth client、新 `/api/v1/*` 路由群、user 索引、promote 實作 | -| Redis | 新增 `user:{user_id}:jobs` Set 索引、job record 新增欄位 | -| Workers | **不需要大改**(Phase 1 保持從 Converter Bucket 讀寫,見 §3.4 關鍵設計決定) | -| MinIO(Converter Bucket) | 不變 | -| File Access Agent | 不在本專案部署範圍(由 Innovedus 生態團隊部署)| -| Member Center | 不在本專案部署範圍 | +當前 visionA ↔ converter 是 1:1 internal trust(同公司、單一 caller),OAuth 是 over-engineering。 -**月度基礎設施成本預估**(本專案側):Phase 1 沒有新增基礎設施,與現況相同。跨團隊依賴的 Member Center / File Access Agent 成本由對方團隊吸收。 +#### API key middleware 設計 -#### Phase 2 架構(預留) +- **接受**:`Authorization: Bearer `(重用既有 Bearer header 格式,client / log infra 不變) +- **比對**:`crypto.timingSafeEqual` constant-time compare(防 timing attack) +- **失敗行為**:401 `invalid_token`、回應後主動 `socket.destroy()`(沿用既有 OAuth middleware 的 M2 行為,防大檔 body 繼續灌入) +- **req.auth shape**:通過後設定 `req.auth = { sub: 'visionA-service', clientId: 'visionA-service', tenantId: null, scopes: ['converter:job.write', 'converter:job.read'], raw: { authType: 'api_key' } }`,下游 handler / rate limiter / log 不需大改 +- **Fail-fast**:啟動時 `CONVERTER_API_KEY` 未設定 → 直接 503 拒絕所有 request(不 silently allow) -Phase 2 對 Converter 本體無架構變更。Delegated download 的流程完全發生在「visionA-backend ↔ Member Center ↔ 使用者瀏覽器 ↔ File Access Agent」,不經 Converter。 +詳細實作見 `auth.md` §1。 -唯一可能的 Converter 變化:Phase 2 上線後可考慮讓 Converter 在 `promote` 完成時回傳更多資訊(例如 `download_hint_object_key`)方便 visionA-backend 直接拿來換 delegated token,但這是 nice-to-have,Phase 1 API 契約已足夠支撐。 +#### Trust boundary 簡化 -### 3.3 技術選型(Technology Radar) +| 風險 | OAuth 模型 | API key 模型 | +|------|-----------|-------------| +| caller 被 compromise → 冒充任意 user_id | 需要 OBO / HMAC-signed user_id 緩解 | 同樣風險、但 API key 本身就是 visionA 服務的完整身分證明、不需要 OBO | +| Token / Key rotation | MC 介面管理 | 手動 rotate 雙端 .env + redeploy | + +→ API key 沒有比 OAuth 更安全(trust boundary 模型一致),但**也沒有更不安全**。差別只在 rotation 操作複雜度。 + +詳見 `security.md`。 + +### 3.3 `/result` endpoint:streaming proxy + +#### 為什麼需要 + +ADR-016 §1:MC 沒實作 `POST /file-access/download-tokens`、FAA 的 `MemberCenterDelegatedDownloadTokenValidator` 從來沒跑通。delegated download token 鏈在 Phase 1 設計時就是斷的、只是因為從未 e2e 過所以沒人發現。 + +→ visionA 直接 download FAA 不可行(少了 delegated token endpoint)。改設計成 visionA → converter 拿 NEF。 + +#### 架構位置 + +``` +visionA-backend Converter Scheduler MinIO + (Converter Bucket) + + GET /jobs/:id/result + Authorization: Bearer + + ────────────────────→ + requireApiKey middleware + ──────────────────────── + getJob from Redis + ──────────────────────── + check status / expires_at + ──────────────────────── + extractNefObjectKey + ──────────────────────── + minio.getObjectStream(nefKey) + ─────────────────────────────────→ + stream + ←───────────────────────────────── + set headers: + Content-Type: application/octet-stream + Content-Length: ... + Content-Disposition: attachment; + filename="_.nef" + pipe stream → response + ←──────────────────────── (streaming NEF binary) +``` + +#### 關鍵設計決定 + +1. **不 buffer 整個檔**:用 Node stream `pipe(res)`,NEF 可能數百 MB +2. **Content-Length 必須帶**:visionA 端用來決定 timeout +3. **Filename 規則**:`_.nef`(例:`yolov5s.onnx` + `KL720` → `yolov5s_kl720.nef`);fallback `job_.nef` +4. **雙路徑 NEF key 解析**:支援新格式(`result_object_keys.nef`)+ 舊格式(`output.nef_path`),對齊 promote 流程的 `getJobOutputKey` 邏輯 +5. **4xx 情境**:401(invalid API key)/ 404(job_not_found)/ 409(job_not_completed)/ 410(result_expired,過 7 天)/ 502(storage_unavailable)/ 503(service_unavailable) +6. **Stream error handling**:headers 送出後 stream 失敗 → 只能 `res.destroy()`、client 看到 ECONNRESET;client `req.on('close')` → 主動釋放 MinIO stream connection + +詳細 API spec 見 `api/api-result.md`。 + +### 3.4 Phase 0.8b 改動範圍總覽 + +| 元件 | 改動 | +|------|------| +| Nginx | **不動**(既有 public vhost 已 proxy `/api/v1/*`) | +| Task Scheduler — auth | **大改**:新增 `apiKeyMiddleware`、移除 `auth/middleware.js` (OAuth) / `auth/jwks.js`(但保留 `auth/oauthClient.js`,promote 用) | +| Task Scheduler — routes | **小改**:`POST /jobs` / `GET /jobs` / `GET /jobs/:id` / `POST /jobs/:id/promote` 改掛 `requireApiKey()`;新增 `/jobs/:id/result` 路由 + handler | +| Task Scheduler — config | 移除 `MEMBER_CENTER_ISSUER` / `MEMBER_CENTER_JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*`;保留 `MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_ID` / `KNERON_CONVERTER_CLIENT_SECRET` / `FILE_ACCESS_AGENT_*`(promote 用);新增 `CONVERTER_API_KEY` | +| Redis 資料模型 | **不動** | +| Workers | **不動** | +| MinIO(Converter Bucket) | **不動** | +| FAA / MC | **不動**(converter → FAA 仍走 OAuth client_credentials) | + +### 3.5 技術選型(Technology Radar) | 層級 | 技術選擇 | 狀態 | 選型理由 | 退出成本 | |------|---------|------|---------|---------| -| Auth 驗 JWT | `jose`(npm)| Adopt | 零依賴純 JS,支援 JWKS remote + cache,主流專案廣泛採用 | 低(抽 1 個 middleware 即可換 `jsonwebtoken` + `jwks-rsa`)| -| Auth 取 token | 自寫輕量 HTTP client(`node-fetch` 或 Node 18 原生 fetch)+ in-memory cache | Adopt | client_credentials 只是一個 HTTP POST,不需引入 `openid-client` 這種大套件 | 低 | -| HTTP client 對 File Access Agent | Node 18 原生 fetch + stream | Adopt | 支援大檔 stream,無需額外 deps | 低 | -| Rate Limit | `express-rate-limit`(既有)+ per-client_id key 擴展 | Adopt | 既有套件擴展即可 | 低 | -| OpenAPI 產出 | 手寫 YAML + `@redocly/cli` 或 `swagger-ui-express` 提供 `/openapi.json` 檢視 | Adopt | Phase 1 手寫可控,避免 code-first 產出不穩定 | 低 | -| Redis 索引 | Redis Set(`user:{user_id}:jobs`)+ 原有 `job:{id}` | Adopt | Phase 1 量級不足以需要 PostgreSQL;維持 stateless 設計一致 | 中(未來要遷 PG 需雙寫) | -| 觀測工具 | 結構化 log(JSON)+ Nginx access log | Trial | Phase 1 先不引入 Prometheus;留待 Phase 2 | — | +| Auth — API key compare | `crypto.timingSafeEqual` | Adopt | Node 標準庫,constant-time | 低 | +| Auth — 取 FAA token(promote) | 自寫 OAuth client + Basic auth + in-memory cache | Adopt | 既有實作、Phase 1 已驗證 | 低 | +| HTTP client 對 FAA(promote) | Node 18 原生 fetch + stream | Adopt | 支援大檔 stream | 低 | +| Stream proxy(`/result`) | Node Stream `pipe(res)` | Adopt | 標準做法,記憶體足跡受控 | 低 | +| Rate Limit | `express-rate-limit`(既有)+ per-client_id key | Adopt | 既有套件 | 低 | +| OpenAPI 產出 | 手寫 YAML | Adopt | 規格穩定、人工可審 | 低 | +| Redis 索引 | Redis Set(`user:{user_id}:jobs`)+ `job:{id}` | Adopt | 不引入 PG | 中 | -### 3.4 關鍵設計決定:原始模型的上傳路徑 +### 3.6 API 設計概覽 -**背景**: -- visionA-backend 需要把使用者上傳的原始模型交給 Converter 轉檔。 -- 原本考慮過「檔案先上傳 File Access Agent,Converter 再從 File Access Agent 拉」的方案,但發現: - 1. 原始模型在轉檔成功、使用者按「加進模型庫」前**不屬於 NAS 模型庫**,沒必要先進 File Access Agent。 - 2. File Access Agent 的 `GET /files/{objectKey}` 只接受 delegated download token,Converter 以 S2S JWT 無法下載(除非對方擴充 API,這會是額外的跨團隊阻塞)。 - 3. Converter 既有 Web UI `POST /jobs` 已經是 multipart 上傳架構(`multer.memoryStorage()`, `fileSize: 500MB`),對外 API 直接沿用同一條路徑即可。 - -**決定**: -**Phase 1 採「visionA-backend 直接 multipart 上傳 Converter」的策略,與既有 Web UI 行為完全對齊。Converter Phase 1 完全不從 File Access Agent 讀任何東西。** - -具體流程: -1. `POST /api/v1/jobs`(multipart/form-data)進來時,Scheduler 驗 OAuth token、以 `multer.memoryStorage()` 接收檔案(`model` required ≤500MB、`ref_images[]` optional maxCount 100)。 -2. Scheduler 檢查 `user_id` 是否已有 in-progress job;若無則把 buffer 寫入 Converter Bucket(`jobs/{job_id}/input/{filename}`、`jobs/{job_id}/ref_images/*`)。 -3. 建 job record、enqueue 到第一階段。 -4. Worker 從 Converter Bucket 讀 input(**和既有 MinIO 模式完全一致**),產出結果寫回 Converter Bucket。 -5. `promote` 時 Scheduler 以自己的 OAuth client 身分取 `files:upload.write` token,從 Converter Bucket 讀結果、PUT 到 File Access Agent。 - -**為什麼這樣選**: -- Worker 程式零改動(現有 `STORAGE_BACKEND=minio` 模式直接沿用) -- 對外 API 上傳路徑與既有 Web UI 程式幾乎 100% 共享(`multer` 中介層、500MB 限制、儲存路徑約定) -- Phase 1 Converter 只需要「當 OAuth client 打 File Access Agent」的單一場景(promote 寫入),不需要 `files:download.read` / `files:metadata.read` -- 避免阻塞於 File Access Agent GET 授權模型的擴充(原 ADR-002 的待確認項已解除) - -**代價**: -- `POST /api/v1/jobs` 的 p95 會受 multipart 上傳大小影響(500MB 在一般網路環境約 5-30s),需調整 SLA(見 §6)。 -- 大檔 multipart 會暫時佔用 Scheduler 記憶體(`multer.memoryStorage()`),與既有 Web UI 一致的風險模型。 -- visionA-backend 必須自行處理上傳超時與重傳(和一般檔案上傳 API 行為一致)。 - -**替代方案**(見 ADR-002): -- A. 檔案先進 File Access Agent,Converter 再拉 — 需要跨團隊擴充 File Access Agent GET S2S 授權,Phase 1 被阻塞 -- B. 檔案流經 visionA-backend 兩次上傳(VisionA 前端 → backend → Converter)— visionA-backend 要扛兩次大檔流量,浪費頻寬 -- C. 使用者瀏覽器 direct-to-Converter presigned URL — 沒有對應基礎設施,Phase 1 不做 - -### 3.5 API 設計概覽 - -- **API 風格**:REST + JSON +- **API 風格**:REST + JSON(`/result` 例外,回傳 binary stream) - **Base Path**:`/api/v1/*` -- **認證**:所有端點(除 `/health`)都要 `Authorization: Bearer ` +- **認證**:所有端點(除 `/health`)都要 `Authorization: Bearer ` - **錯誤格式**:統一 `{error: {code, message, details, request_id}}` - **版本策略**:breaking change 走 `/api/v2/*`,小變更在 `/api/v1/*` 內向後相容新增欄位 -- **Rate Limit**:以 `client_id`(來自 token claim)為 key,預設 300 requests / 5 min(可調) -- **ETag 支援**:`GET /api/v1/jobs/:id` 支援 `If-None-Match`,304 Not Modified 省流量(採納 Design 建議) +- **Rate Limit**:以 `client_id`(API key 模式下固定 `visionA-service`)為 key,預設 300 req / 5min +- **ETag 支援**:`GET /api/v1/jobs/:id` 支援 `If-None-Match`,304 Not Modified 省流量 -詳細 API spec 見 `TDD.md §1`。 +詳細 API spec 見 `TDD.md` 索引 + `api/*.md` 子檔案。 -### 3.6 資料架構 +### 3.7 資料架構 #### 核心資料(Redis) | Key | 類型 | 內容 | TTL | |-----|------|------|-----| -| `job:{job_id}` | String (JSON) | Job 完整 record(新增 `user_id`、`tenant_id`、`created_by_client_id`、`metadata`、`stage_timings`、`expires_at` 欄位)| 7 天 | +| `job:{job_id}` | String (JSON) | Job 完整 record(含 `user_id`、`tenant_id`、`created_by_client_id`、`metadata`、`stage_timings`、`expires_at`、`result_object_keys`、`promoted_object_keys`) | 7 天 | | `user:{user_id}:jobs` | Set | 該 user 的 job_id 集合 | 隨最新 job 延長,建議 7 天 | -| `user:{user_id}:active_job` | String | 當前 in-progress job_id(存在即代表有 active)| 隨 job 完成時刪除 | +| `user:{user_id}:active_job` | String | 當前 in-progress job_id | 隨 job 完成時刪除 | | `ratelimit:client:{client_id}` | 由 `express-rate-limit` 管理 | — | 5 min | -不引入 PostgreSQL 的理由:Phase 1 資料量不大(單個 user 7 天內通常 < 10 個 job),Redis 足以承擔。未來若需要歷史任務持久化、跨 Crash recovery,再評估 PG。 +不引入 PostgreSQL:Phase 1 資料量小(單 user 7 天內 < 10 個 job),Redis 足以承擔。 -#### 資料流(Phase 1) +#### 資料流(Phase 0.8b) ```mermaid flowchart LR - BE[visionA-backend] -->|1. POST /api/v1/jobs
multipart: model + user_id| Sched[Scheduler] - Sched -->|2. 驗 token| MC[Member Center] - Sched -->|3. 檢查 user:active_job| Redis[(Redis)] - Sched -->|4. multer 寫入 input| CB[(Converter Bucket)] + BE[visionA-backend] -->|1. POST /api/v1/jobs
multipart + Bearer API key| Sched[Scheduler] + Sched -->|2. constant-time
compare API key| Sched + Sched -->|3. 檢查 active_job| Redis[(Redis)] + Sched -->|4. multer 寫 input| CB[(Converter Bucket)] Sched -->|5. 建 job + 索引| Redis Sched -->|6. enqueue onnx| Redis Workers[Workers] -->|consume| Redis @@ -250,9 +309,13 @@ flowchart LR BE -->|7. poll GET /api/v1/jobs/:id| Sched Sched -->|8. 讀 job| Redis BE -->|9. POST /promote| Sched - Sched -->|10. 取 Converter token
(files:upload.write, cache)| MC + Sched -->|10. 取 FAA token
(client_credentials, cache)| MC[Member Center] Sched -->|11. 讀結果| CB Sched -->|12. PUT 結果| FAA[File Access Agent] + BE -->|13. GET /jobs/:id/result
NEW Phase 0.8b| Sched + Sched -->|14. stream NEF| CB + CB -->|15. NEF binary| Sched + Sched -->|16. stream proxy| BE ``` --- @@ -264,461 +327,251 @@ flowchart LR | 服務 | SLI | SLO | 依據 | |------|-----|-----|------| | `/api/v1/*` 可用率 | 2xx+3xx 請求 / 總請求 | ≥ 99.5%(工作時段)| PRD §9.2.1 | -| `GET /api/v1/jobs/:id` p95 | 回應時間 95 百分位 | < 200ms | PRD §9.2.1、Design 4.1.2 | -| `POST /api/v1/jobs` p95 | 回應時間 95 百分位(含 multipart 上傳到 Converter Bucket,與檔案大小相關)| < 5s(200MB 檔案)| 依 PRD §9.2.1 的頻寬假設調整 | -| `POST /api/v1/jobs/:id/promote` p95 | 回應時間 95 百分位 | < 3s | PRD §9.2.1、Design 4.1.3 | -| Token 驗證失敗率 | 401 / 總請求 | < 1%(排除正常過期)| PRD §9.2.1 | - -**關於 `POST /api/v1/jobs` p95**:因為改為 multipart 直接上傳,延遲主要受檔案大小與 visionA-backend 到 Converter 的網路頻寬影響。200MB @ 50MB/s ≈ 4s,500MB @ 50MB/s ≈ 10s。若觀測到頻繁超 SLA,Phase 2 可考慮拆為「建 job」+「上傳 chunk」兩個端點。 +| `GET /api/v1/jobs/:id` p95 | 回應時間 95 百分位 | < 200ms | PRD §9.2.1 | +| `POST /api/v1/jobs` p95 | multipart 上傳到 MinIO 完成 | < 5s(200MB)/ < 12s(500MB)| 詳見 `performance.md` | +| `POST /api/v1/jobs/:id/promote` p95 | 回應時間 95 百分位 | < 3s | PRD §9.2.1 | +| `GET /api/v1/jobs/:id/result` p95 | TTFB(time to first byte) | < 500ms | Phase 0.8b 新增;NEF 50MB stream 在 50MB/s 鏈路下完整下載 ≈ 1s | +| API key 驗證失敗率 | 401 / 總請求 | < 0.1%(同 caller 不該錯)| Phase 0.8b 新增 | ### 4.2 容錯設計 | 失敗情境 | 設計應對 | |---------|---------| -| Member Center JWKS 不可達 | JWKS 本地 cache(TTL 10 min,stale-while-revalidate 24h),短時離線仍可驗 token | -| Member Center token endpoint 不可達(取 Converter 自己的 token)| Token cache(有效期內不重取);過期後重試 3 次,失敗則 `promote` 回 503 + `auth_service_unavailable`,由 visionA-backend 重試 | -| multipart 上傳失敗 / 超過 500MB | `POST /api/v1/jobs` 回 400 `validation_error` 或 413 `file_too_large`;Redis 不建 job record(避免殘留)| -| File Access Agent promote 失敗 | Converter Bucket 檔案保留 7 天,使用者可重試 promote;API 回 502 | -| Worker Crash | 既有 Crash 即 Reset 機制:Worker 重啟後繼續 consume Redis Stream,in-progress 的 job 若在 Redis 中會被接手(前提:Redis 沒死)| -| Redis Crash | 符合「Crash 即 Reset」設計哲學,所有 job 遺失,使用者重送 | -| Scheduler Crash | 同上,重啟後繼續服務;進行中的 Worker 下階段 done 事件會找不到 job record 而被忽略(既有行為)| +| MC token endpoint 不可達(取 Converter 自己的 FAA token)| Token cache(有效期內不重取);過期後重試 3 次,失敗則 promote 回 503 + `auth_service_unavailable` | +| multipart 上傳失敗 / 超過 500MB | `POST /api/v1/jobs` 回 400 / 413 | +| FAA promote 失敗 | Converter Bucket 檔案保留 7 天,可重試 promote | +| `/result` MinIO stream 失敗(過期清除)| 410 `result_expired` | +| Worker Crash | 既有 Crash 即 Reset;Worker 重啟後繼續 consume Redis Stream | +| Redis Crash | 符合「Crash 即 Reset」設計哲學,所有 job 遺失 | +| Scheduler Crash | 重啟後繼續服務 | ### 4.3 災難復原 -維持既有 **Crash 即 Reset**:RPO = 無保證(Redis 重啟即清空)、RTO < 30s(Docker restart)。本次不新增 DR 機制。 +維持既有 **Crash 即 Reset**:RPO = 無保證、RTO < 30s(Docker restart)。 --- ## 5. 安全架構 (Security) -### 5.1 威脅模型(STRIDE 摘要) +詳見 `security.md`。本節僅列高層原則。 + +### 5.1 威脅模型(STRIDE 摘要,Phase 0.8b 版) | 威脅 | 風險 | 防護 | |------|------|------| -| Spoofing(偽造 visionA-backend)| 中 | OAuth2 Bearer JWT(Member Center 簽發),JWKS 驗簽 | -| Spoofing(偽造 user_id)| 低(接受)| 信任 visionA-backend 的 user_id;Converter 不做 user ACL(PRD §5.6 明確)| -| Tampering(改動 job record)| 低 | Redis 在內部網段,無外部存取;Job record 僅 Scheduler 可寫 | +| Spoofing(偽造 visionA-backend)| 中 | API key constant-time compare;TLS protected 傳輸 | +| Spoofing(偽造 user_id)| 中(接受)| 信任 visionA-backend 的 user_id;Converter 不做 user ACL | +| Tampering(改 job record)| 低 | Redis 在內部網段 | | Repudiation(否認呼叫)| 中 | Log `client_id` + `request_id` + `user_id`,保留 30 天 | -| Info Disclosure(跨 client 看別人的 job)| 中 | Job 查詢過濾:只回 `created_by_client_id` 吻合的 job(見 §5.3)| -| DoS | 中 | Rate Limit per `client_id`;檔案大小上限 500MB(既有)| -| Elevation of Privilege | 低 | scope 檢查嚴格:`converter:job.read` vs `converter:job.write` | +| Info Disclosure(跨 client 看別人的 job)| 低(Phase 0.8b 只有 1 個 caller)| Job 查詢預設過濾 `created_by_client_id` | +| DoS | 中 | Rate Limit per `client_id`;檔案大小上限 500MB | +| Elevation of Privilege | **降低**(API key 沒有 scope,但只有 1 個 caller) | API key 即為完整 caller 身分 | -### 5.2 安全邊界 +### 5.2 API key 管理 -#### 5.2.1 端點 auth 要求 +- 產生:`openssl rand -hex 32`(64 hex chars) +- 部署:放 .env / docker-compose env / k8s secret +- 雙端對齊:visionA `VISIONA_CONVERTER_API_KEY` = converter `CONVERTER_API_KEY` +- Rotation 策略:每環境獨立(dev / stage / prod);外洩時雙端同步 rotate + redeploy +- **絕不**進 git / Slack / email / log(log 只記 `api_key_length` 或 `api_key_set: true` boolean) -| 端點 | 需要 auth? | 需要的 scope | -|------|-----------|-------------| -| `GET /health` | ❌ 不需要 | — | -| `POST /api/v1/jobs` | ✅ | `converter:job.write` | -| `GET /api/v1/jobs` | ✅ | `converter:job.read` | -| `GET /api/v1/jobs/:id` | ✅ | `converter:job.read` | -| `POST /api/v1/jobs/:id/promote` | ✅ | `converter:job.write` | -| 舊 `/jobs/*`(Web UI 走的路徑)| ❌ 不加 OAuth | — | -| `/jobs/*/events`(SSE)| ❌ 不加 OAuth | — | +### 5.3 部署分流 -#### 5.2.2 Token 驗證清單 - -每個 `/api/v1/*` 請求的 middleware 必須檢查: - -1. `Authorization: Bearer ` header 存在 -2. JWT 格式合法 -3. JWT 簽章(用 Member Center JWKS 驗) -4. `iss` == 設定的 Member Center issuer -5. `aud` 包含 `kneron_converter_api`(可能是 string 或 array) -6. `exp` 未過期(含 clock skew ±60s) -7. `scope` 包含該端點要求的 scope(空白分隔字串) -8. `client_id` claim 存在(記錄用) -9. (可選)`tenant_id` claim(見 §5.3) - -**失敗處理**: -- 1-5 失敗 → 401 `invalid_token` -- 6 失敗 → 401 `token_expired` -- 7 失敗 → 403 `insufficient_scope` + `details.required_scope` -- 8 失敗 → 401 `invalid_token`(缺必要 claim) - -#### 5.2.3 user_id 的邊界(方式 A:multipart 欄位) - -**再次強調:user_id 不是授權邊界。** Converter 的設計如下: - -| 操作 | 處理方式 | -|------|---------| -| `POST /api/v1/jobs` 建 job | 信任 multipart field 中的 `user_id`,寫入 job record | -| `GET /api/v1/jobs/:id` 查 job | 回傳 job record(不比對呼叫者 user_id)| -| `GET /api/v1/jobs?user_id=X` 查列表 | 以 query 中的 `user_id` 過濾(**呼叫者可以查任意 user 的 job**)| -| `POST /api/v1/jobs/:id/promote` | 不需要 user_id(只檢查 job 狀態)| - -**為什麼這樣設計**: -- 服務消費者(例如 visionA-backend)本來就需要看自己所有 user 的 job(例如 admin 監控面板) -- 授權邊界在 `client_id` 層:只要 client 有 `converter:job.read` scope 就能查任何 job -- 防止跨 client 串連(見 §5.3) - -#### 5.2.4 跨 client 隔離(重要) - -**風險**:若未來有 client A 和 client B 都接 Converter,client B 不應看到 client A 建的 job。 - -**設計**:Job record 記錄 `created_by_client_id`,查詢時**預設**只回相同 client_id 的 job。 - -| 查詢 | 預設行為 | -|------|---------| -| `GET /api/v1/jobs` | 只回 `created_by_client_id == token.client_id` 的 job | -| `GET /api/v1/jobs/:id` | 若 `created_by_client_id != token.client_id`,回 404(**不是 403**,避免資訊洩露)| -| `POST /api/v1/jobs/:id/promote` | 同上 | - -Phase 1 的第一個 client 是 visionA-backend,此規則對它無感。未來加新 client 時,自動獲得隔離。 - -**例外**:若需要「管理員 client」能跨 client 查詢(例如監控用),可設定特殊 scope `converter:admin.read`,Phase 1 不實作,但保留擴展空間。 - -### 5.3 Tenant 策略(回應 PM 疑問 A.1.3) - -**決定**:Phase 1 採 **single-tenant per Converter deployment**,Converter 不自行管理 tenant 隔離。 - -具體作法: -- Converter 在設定檔中記錄 `EXPECTED_TENANT_ID`(從環境變數 `CONVERTER_TENANT_ID`)。 -- 驗 token 時若 JWT 有 `tenant_id` claim,則檢查等於 `EXPECTED_TENANT_ID`,不等則 403 `tenant_mismatch`。 -- 若 JWT 沒有 `tenant_id` claim,依 Member Center owner 的決定行事(Phase 1 初期可能沒有 tenant claim,可先 warn log,不擋)。 -- Job record 記錄 `tenant_id`(方便未來 log 與審計)。 -- 所有 Converter 打 File Access Agent 的請求自然帶著 Converter 自己的 `tenant_id`(來自 token),File Access Agent 的 `INSTANCE_TENANT_ID` 必須與之吻合。 - -**未來(多租戶)的擴展路徑**: -- 若要 Converter 一份程式碼支援多個 tenant,需要新增「根據 token 的 tenant_id 路由到對應的 File Access Agent instance」邏輯。Phase 1 不做。 - -### 5.4 舊 `/jobs/*` 路徑的保護(部署分流) - -Web UI 走的 `/jobs/*` 路徑**不加 OAuth**。若公網可達,會被繞過對外 API 的 scope 檢查直接打 Scheduler。 - -**解決**:**部署層級分流**(見 §7)。Nginx 分兩個 vhost: -- **public vhost**(443 對公網):只 proxy `/api/v1/*` 到 Scheduler,其他路徑一律 404 -- **internal vhost**(僅內部網段可達):proxy `/jobs/*`、`/health`、`/queues/stats`、`/jobs/*/events` 到 Scheduler - -這樣 Web UI(部署在內部網段 / 跳板後面)正常運作,對外 API 僅暴露 `/api/v1/*`。 - -採納 Design Review §2.3 建議的方案 B。 +維持既有設計:Nginx 雙 vhost(public `/api/v1/*` + internal `/jobs/*`)。詳見 `infra.md`。 --- ## 6. 效能工程 (Performance) -### 6.1 延遲預算(`POST /api/v1/jobs`) +詳見 `performance.md`。重點: -| 階段 | 預算 | -|------|------| -| Nginx ingress(含 multipart 前置處理)| 10ms | -| JWT 驗證(JWKS cache hit)| 5ms | -| multipart 接收(multer memory,與檔案大小相關)| 200MB @ 50MB/s ≈ 4s | -| multipart validation(欄位、mimetype、副檔名)| 20ms | -| Redis 查 active_job | 10ms | -| MinIO PutObject(寫入 Converter Bucket,buffer 已在記憶體)| 200MB @ 200MB/s ≈ 1s | -| Redis 寫 job record + 索引 | 20ms | -| Enqueue 到 Redis Stream | 10ms | -| **總預算(200MB 檔案)** | **~5s(p95)** | -| **總預算(500MB 檔案)** | **~12s(p95)** | - -**若檔案很大**:p95 會超過 5s SLA。考慮未來: -- 改為 chunked upload(前端把檔案切塊,多個 PUT /api/v1/jobs/:id/chunks)— Phase 2 可選 -- 改為 async 上傳模式(先回 202 + job_id,背景接收剩餘 chunks)— Phase 2 可選 - -### 6.2 Token cache 策略 - -| Cache | TTL | 退出條件 | -|-------|-----|---------| -| JWKS(驗 JWT 用)| 10 min(主動 refresh)| 遇到未知 `kid` 時強制 refresh 一次 | -| Converter 自己的 access token | `expires_in - 60s`(快到期才 refresh)| 遇到 401 時強制 refresh 一次 | - -### 6.3 Rate Limit 策略 - -| 範圍 | 限制 | 動機 | -|------|------|------| -| 全局(IP)| 200 req / 15min(既有)| 維持既有防護 | -| Per `client_id` | 300 req / 5min | 防止單一 client 暴力 polling,但容許正常 2-5s polling(5min 可 60-150 次已足夠)| +- `POST /api/v1/jobs` p95 < 5s(200MB) / < 12s(500MB) +- `GET /api/v1/jobs/:id` p95 < 200ms +- `GET /api/v1/jobs/:id/result` TTFB < 500ms(Phase 0.8b 新增 SLO) +- `POST /api/v1/jobs/:id/promote` p95 < 3s --- ## 7. 部署架構 -### 7.1 Nginx 雙 vhost 分流 +詳見 `infra.md`。重點: -Phase 1 採**一份 Nginx process、兩個 `server` block** 的設計(方案 B): - -``` - ┌─────────────────────────────────────────────────────────────┐ - │ Nginx(單一 process) │ - │ │ - │ ┌────────────────────────┐ ┌────────────────────────────┐ │ - │ │ server { │ │ server { │ │ - │ │ listen 443 ssl; │ │ listen 10.0.0.1:80; │ │ - │ │ server_name │ │ server_name │ │ - │ │ converter....com; │ │ converter-internal...; │ │ - │ │ │ │ │ │ - │ │ location /api/v1/ {} │ │ location /jobs {} │ │ - │ │ location = /health {} │ │ location /queues/stats {} │ │ - │ │ location / { │ │ location / { │ │ - │ │ return 404; │ │ proxy_pass web:3000; │ │ - │ │ } │ │ } │ │ - │ │ } │ │ } │ │ - │ │ (public vhost) │ │ (internal vhost, 內網 IP) │ │ - │ └───────────┬─────────────┘ └────────────┬────────────────┘ │ - └──────────────┼──────────────────────────────┼───────────────────┘ - │ │ - ▼ ▼ - ┌──────────────────────────────────────────────────┐ - │ Task Scheduler (:4000) │ - │ - /api/v1/* (OAuth 保護,僅 public vhost 轉入)│ - │ - /jobs/* (無 auth,僅 internal vhost 轉入) │ - │ - /jobs/*/events(SSE) │ - │ - /health, /queues/stats │ - └──────────────────────────────────────────────────┘ - ▲ - │ (僅 internal vhost 流入) - │ - Web UI / 內部工具(內網) -``` - -實務上可以兩種實作方式: -- **A. 兩份 Nginx instance**(一個 public、一個 internal,各自獨立 process) -- **B. 一份 Nginx process、兩個 `server` block**,根據 `listen` 的 interface(public IP vs internal IP)做分流 - -Phase 1 採 **B**(設定簡單、資源省、單一 reload 管理)。上方 ASCII 圖即為方案 B 的實際結構;對應完整 Nginx config 詳見 `TDD.md §7.1`。 - -### 7.2 docker-compose 變化 - -本次**不**在 docker-compose.yml 新增 File Access Agent / Member Center(由對方團隊部署,Converter 只是 client)。 - -新增環境變數(詳見 `TDD.md §9`): -- `MC_ISSUER`, `MC_JWKS_URL`, `MC_TOKEN_URL` -- `KNERON_CONVERTER_CLIENT_ID`, `KNERON_CONVERTER_CLIENT_SECRET` -- `KNERON_CONVERTER_AUDIENCE`(接收端) -- `FILE_ACCESS_AGENT_BASE_URL`, `FILE_ACCESS_AGENT_AUDIENCE` -- `CONVERTER_TENANT_ID` -- `CONVERTER_SCOPES_REQUIRED_WRITE`, `CONVERTER_SCOPES_REQUIRED_READ` +- Nginx 雙 vhost(`/api/v1/*` public、`/jobs/*` internal)— **不動** +- 環境變數變動:移除 `MEMBER_CENTER_ISSUER` / `JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*`;新增 `CONVERTER_API_KEY`;保留 `MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_*` / `FILE_ACCESS_AGENT_*` +- 部署順序:converter 先 deploy(並存舊 OAuth + 新 API key 一段時間)→ verify `/result` → visionA deploy → e2e → 砍 OAuth 殘留 --- ## 8. ADR(架構決策紀錄) -### ADR-001:對外 API 採 Member Center OAuth2,不自建 API Key +### ADR-001:對外 API 採 Member Center OAuth2(已 superseded) -**狀態**:Accepted -**背景**:對外 API 需要身分驗證機制。選項有(a)自建 API Key、(b)採 Innovedus Member Center OAuth2。 +**狀態**:Superseded by ADR-010(Phase 0.8b) -**決定**:採 **Member Center OAuth2**。 +**原因**:5/9 stage e2e 撞 MC 沒註冊 scope;visionA ↔ converter 為 1:1 internal trust、OAuth 為 over-engineering。 -**理由**: -1. 使用者已決定 Converter 對齊 Innovedus 生態(progress.md)。 -2. OAuth2 是業界標準,visionA-backend 本來就要接 Member Center(Phase 1 規劃)。 -3. 自建 API Key 等於要自己管 secret rotation、scope、audit,重複發明輪子。 -4. Member Center 已提供 JWKS、token endpoint、client management,Converter 只要實作 resource server + client 兩個 OAuth2 角色即可。 - -**代價**:跨團隊依賴(註冊 audience、client、scope),若 Member Center owner 不配合會阻塞。已列入 progress.md 風險。 - -**替代方案**: -- A. 自建 API Key:簡單但不符合生態標準,未來遷移成本高 -- B. mTLS:運維成本高,對 Node.js 生態不夠友善 +→ visionA repo `adr-015-server-to-server-api-key.md` v2.1 記錄完整轉向理由。 --- -### ADR-002:promote 結果檔採「做法 2」— Converter 自己推到 File Access Agent +### ADR-002:promote 結果檔採「做法 2」— Converter 自己推到 FAA -**狀態**:Accepted +**狀態**:Accepted(**不變**,Phase 0.8b 保留) -**背景**:轉檔完的結果要搬回 File Access Agent(`promote`)。有三種搬檔做法: +**範圍**:本 ADR 只針對結果檔 promote 的搬檔路徑;原始模型不會進 FAA。Converter → FAA 仍走 OAuth client_credentials + `files:upload.write` scope。 -| 做法 | 說明 | 優劣 | -|------|------|------| -| 1 | 結果檔流經 visionA-backend(Converter → visionA-backend → File Access Agent)| 浪費頻寬、visionA-backend 要扛大檔 | -| 2 | Converter 自己 PUT File Access Agent(`files:upload.write`)| 檔案只在 NAS 側流動,單次寫入 | -| 3 | File Access Agent 主動拉 Converter Bucket | File Access Agent 沒這功能,不能為了我們改 | - -**決定**:採做法 2。 - -**理由**: -1. 省流量:Converter 和 File Access Agent 都在 NAS 側,直連 HTTP。 -2. visionA-backend 職責單純(只管 orchestration + 原始檔 multipart 轉送,不碰結果檔)。 -3. 符合 PRD US-13 的明確要求。 -4. File Access Agent 的 `PUT /files/{key}` 已明確支援 S2S JWT + `files:upload.write` scope(對方現有 API 已實作,無跨團隊阻塞)。 - -**範圍澄清(2026-04-25 更新)**: -本 ADR **只針對結果檔 promote 的搬檔路徑**。原始模型不會進 File Access Agent(visionA-backend 直接 multipart 上傳 Converter,見 §3.4),Phase 1 Converter 完全不需要 `files:download.read` / `files:metadata.read` scope。 - -**代價**: -- Converter 需要取自己的 service token(OAuth client 邏輯)。Token 可 cache、失敗可 retry。 -- Converter 需要設定 `KNERON_CONVERTER_CLIENT_ID` + `CLIENT_SECRET`(secret 管理責任)。 - -**替代方案**(已排除): -- 備案 A:讓 visionA-backend 自己從 Converter 下載結果、再上傳到 File Access Agent — 浪費頻寬 × 2,不合理 -- 備案 B:File Access Agent 主動拉 — 對方沒這個介面,不做 +詳細決策內容沿用前版(5/2 寫入)。 --- -### ADR-003:user_id 以 multipart 欄位傳遞(方式 A),不放 token claim、不放自訂 header +### ADR-003:user_id 以 multipart 欄位傳遞 -**狀態**:Accepted(已由使用者決策) +**狀態**:Accepted(**不變**,Phase 0.8b 保留) -**背景**:需要記錄 job 是誰的(VisionA 使用者 ID)。有三種方式: - -| 方式 | 做法 | 優劣 | -|------|------|------| -| A | 放 multipart 欄位 `user_id`,Converter 信任 | 與 model_id / version / platform 等其他業務欄位用同一條路徑,和既有 Web UI 的 `POST /jobs` 對齊 | -| B | Member Center 簽 token 時把 `user_id` 塞 claim | 看起來像「user 的 token」,但 client_credentials 本質是 S2S,不該綁 user | -| C | 自訂 `X-User-Id` header | 跟一般業務欄位(model_id 等)放不同位置,增加心智負擔 | - -**決定**:採方式 A(multipart 欄位)。 - -**理由**: -1. client_credentials 是服務對服務的 token,沒有「user」概念,不應把 user_id 放 claim -2. `POST /api/v1/jobs` 本身就是 multipart,多一個 `user_id` 欄位最自然,和 `model_id`、`version`、`platform` 等業務欄位放一起 -3. 與既有 Web UI `POST /jobs` 的 multipart 欄位路徑一致,程式碼可共用 validation -4. Converter 不做 user 層 ACL(PRD §5.6 明確),user_id 只用於業務邏輯(同使用者限制、查詢過濾) -5. 避免跟 Member Center 要客製化 claim,維持標準 OAuth2 - -**代價**:Converter 完全信任 visionA-backend 送的 user_id。若 visionA-backend 被入侵或 bug 亂送,可能造成「A 使用者看到 B 的 job」。 -- **緩解**:授權責任邊界在 visionA-backend,這是合理的 trust boundary。Converter 的日誌會記錄 user_id 變更頻率,可做異常監測(PRD §12.2 已列)。 +API key 模式下,user_id 仍是 visionA-backend 傳的 multipart field。Trust boundary 假設不變(visionA-backend 內部受控)。 --- ### ADR-004:Polling 而非 Webhook -**狀態**:Accepted(PM 已決策) +**狀態**:Accepted(**不變**) -**背景**:轉檔時間長(可能 30s-數分鐘),visionA-backend 需要知道何時完成。 +--- -**決定**:Phase 1 只提供 polling(`GET /api/v1/jobs/:id`),不實作 Webhook。 +### ADR-005:Phase 1 使用者下載改用 `/result` 中轉(**取代原 delegated download token 設計**) + +**狀態**:Accepted(Phase 0.8b 拍板,取代原 ADR-005「Phase 1 使用者下載延至 Phase 2」) + +**背景**:5/16 grep MC source 發現 MC 從未實作 `/file-access/download-tokens` endpoint;FAA 的 `MemberCenterDelegatedDownloadTokenValidator.cs` 假設 MC 有對應 introspection endpoint,也是假設錯了。delegated download token 鏈從 5/2 寫完到現在一直是斷的。 + +**決定**:不動 MC、不動 FAA。改設計成 visionA → converter `GET /api/v1/jobs/:id/result` 中轉。 **理由**: -1. 下游是另一個 backend 服務,polling 對它是標準做法 -2. Webhook 需要處理:retry、簽章、對方 endpoint 驗證、重放防護,surface area 太大 -3. Phase 1 先快速上線驗證產品價值,Webhook 可以 Phase 3 再考慮 +1. MC owner 時程不可控,延 Phase 2 不可行 +2. NEF 通常 < 50MB,streaming proxy 對 Converter Scheduler 不重 +3. 走 converter 而非 FAA,與 `/promote` 同一個 caller 介面,visionA 端 client 邏輯統一 -**代價**:進度延遲 = polling 間隔。API 文件建議 2-5s 間隔,對 UX 足夠。 +**代價**: +- Converter Scheduler 多扛一條 download 路徑(streaming,記憶體足跡受控) +- NEF 在 Converter Bucket 7 天 TTL 後過期,client 需處理 410 `result_expired` +- 雙存(promote 後 NEF 仍在 Converter Bucket 7 天 + FAA NAS Bucket 永久),下載走 Converter Bucket、不下載 FAA + +**替代方案**:詳見 visionA repo `adr-016-download-via-converter.md` v1.0 §3(6 個方案完整分析)。 --- -### ADR-005:Phase 1 使用者下載延至 Phase 2 +### ADR-006:Phase 1 Web UI 不改 -**狀態**:Accepted(已由使用者決策) +**狀態**:Accepted(**不變**) -**背景**:Member Center 的 `POST /file-access/download-tokens` 尚未實作,該 endpoint 是使用者直連下載的前提。 +Web UI 仍走 `/jobs/*` 路徑、無 auth。Phase 0.8b 不動。 -**決定**:Phase 1 **完全不做**使用者下載功能,延至 Phase 2。 +--- + +### ADR-010:visionA → converter 改用 pre-shared API key(**Phase 0.8b 新增**) + +**狀態**:Accepted(2026-05-09 + 2026-05-16 雙重 user 拍板) + +**背景**: +1. 5/9 stage e2e 撞 MC 沒註冊 `converter:job.read/write` scope +2. converter image 過舊、缺 OAuth middleware +3. FAA OAuth 整合狀態不明 +4. visionA ↔ converter 是 1:1 internal trust,OAuth 過度設計 + +**決定**:visionA → converter 改用 pre-shared API key(`CONVERTER_API_KEY`),constant-time compare。 **理由**: -1. 阻塞於外部依賴(Member Center owner 的時程) -2. Phase 1 先讓「上傳 → 轉檔 → 搬進模型庫」閉環可以跑 -3. UX 缺口由 VisionA 產品團隊用 messaging 策略處理(Design 議題 #1) +1. 砍跨團隊依賴(不需要 MC 註冊任何東西) +2. visionA 是當前唯一 caller,OAuth 的 scope-based authorization 沒用上 +3. API key 本身已是 visionA 服務的完整身分證明,scope 概念對 1:1 trust 無意義 -**代價**:VisionA 使用者暫時沒有下載能力。VisionA 產品團隊需有 fallback(Design Review §6 議題 #1)。 +**代價**: +- 砍掉 `auth/middleware.js` (OAuth resource server)、`auth/jwks.js`、相關 unit test +- 砍 `MEMBER_CENTER_ISSUER` / `JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*` env +- 失去 scope-based fine-grained authorization(接受:1:1 trust 不需要) +- 失去多 caller 擴展彈性(未來真有第二個 caller 再加 OAuth 回來) + +**保留**: +- Converter → FAA 仍走 OAuth client_credentials(`files:upload.write` scope)→ `MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_*` 保留 +- Promote 流程完全不動 + +**替代方案**:詳見 visionA repo `adr-015-server-to-server-api-key.md` v2.1 §3。 + +**保留設計脈絡的歷史記錄**:本 ADR 取代原 ADR-001。git history + visionA repo ADR-015 v2.1 是完整 audit trail。 --- -### ADR-006:Phase 1 Web UI 不改,維持既有 multipart 路徑 +### ADR-011:`/result` endpoint 採 streaming proxy(**Phase 0.8b 新增**) -**狀態**:Accepted(PM 已決策) +**狀態**:Accepted(2026-05-16 user 拍板) -**背景**:Web UI 目前走 `POST /jobs`(multipart)、`GET /jobs/:id/events`(SSE)、`GET /jobs/:id/download/...`。 +**背景**:見 ADR-005 superseded 改用 `/result` 中轉。 -**決定**:**全部保留不動**,Web UI 與對外 API 兩套並存。 +**決定**:`GET /api/v1/jobs/:id/result` 用 Node Stream `pipe(res)` 把 MinIO `GetObjectStream` 直接 proxy 回 caller,不 buffer。 **理由**: -1. Web UI 是內部工具,persona 與 API 消費者不同 -2. 給 Web UI 加 OAuth 等於加登入流程,UX 倒退(Design Review §2.3) -3. 降低本次 L 級的範圍,避免同時改兩套 +1. NEF 可能數百 MB(極端情境),buffer 整個檔會 OOM +2. Stream 模式下 Scheduler 記憶體足跡受 Node fetch / HTTP buffer 控制(typical < 64KB per request) +3. Client(visionA)可以一邊收一邊處理,TTFB < 500ms 比完整下載時間短 -**代價**:Web UI 的 `/jobs/*` 路徑若曝露公網會繞過 OAuth。 -- **緩解**:部署層分流(§5.4、§7),public Nginx 只 proxy `/api/v1/*`。 +**代價**: +- Stream error handling 較複雜(headers 已送出後 stream 中斷只能 `res.destroy()`) +- Content-Length 必須在 stream 開始前算好(從 MinIO HEAD 取) + +**替代方案**: +- A. Buffer 整個 NEF 再回(簡單但會 OOM)— 排除 +- B. Redirect 到 MinIO presigned URL(簡單但 MinIO 暴露公網風險)— 排除 +- C. Stream proxy(選擇) --- -## 9. 回應 PRD 附錄 A 疑問清單 - -| # | PM 疑問 | 架構決策 | -|---|---------|---------| -| A.1.1 | Web UI 要不要也改走 OAuth | **不改**,見 ADR-006。以 Nginx 分流保護。未來若要改,屬於獨立的 M/L 級任務 | -| A.1.2 | 同使用者限制的範圍 | **整個 Converter 服務共用 user_id 空間**。現階段第一個 client 是 visionA-backend,未來加 client 時若真的有 user_id 衝突風險,再考慮用 `(client_id, user_id)` 複合鍵(Phase 1 不做)| -| A.1.3 | tenant_id 策略 | **single-tenant per Converter deployment**。見 §5.3。Job record 會記錄 tenant_id 方便 audit | -| A.1.4 | Phase 2 fallback | 本架構文件不決定此議題(屬 VisionA 產品團隊決策)。我們的架構不阻擋 VisionA 選任一方案 | -| A.1.5 | API 採用度 baseline | 非架構議題。建議 Phase 1 上線後跑 1 個月 beta 再訂 SLA 目標 | -| A.1.6 | 既有 `[推測]` 標記清理 | 非本次範圍 | -| A.2.1 | Member Center owner 協調 | **阻塞項**,必須在 kickoff 前解決。建議的 naming:`kneron_converter_api` audience、`kneron_converter` client、scopes 見 TDD §8(Phase 1 Converter 只需 `files:upload.write` 一個 scope 打 FAA)| -| A.2.2 | File Access Agent deployment / tenant_id | 中度依賴(只 promote 時用)。Converter 在 setup 時需要 FILE_ACCESS_AGENT_BASE_URL + 確認 tenant_id 吻合;FAA 現有 `PUT /files/{key}` 已支援 S2S JWT,無需擴充 | -| A.2.3 | VisionA Phase 1 OAuth 整合時程 | 需雙方 kickoff 對齊 | -| A.2.4 | Member Center download-tokens 實作時程 | Phase 2 啟動觸發條件 | -| A.3.1 | scope 命名 | 建議採 `converter:job.write`、`converter:job.read`,格式對齊 File Access Agent 的 `files:*.*` 慣例 | -| A.3.2 | Effort 估算 | 見 TDD §12(按 T1-T8 拆分,預估 4-5 人週)| -| A.3.3 | OpenAPI 維護策略 | **手寫 YAML**(Phase 1),手動與實作同步。未來再評估自動生成 | -| A.3.4 | user_id 索引的 Redis 策略 | **新增 `user:{user_id}:jobs` Set + `user:{user_id}:active_job` String**。避免 `KEYS *` 全掃(Design 4.1.2 建議已採納)| - ---- - -## 10. 回應 Design Review 7 條建議 - -| # | Design 建議 | 是否採納 | 說明 | -|---|-----------|---------|------| -| 1 | Response schema(`stage_timings`、`stage_progress`、`expires_at`、結構化 error)| ✅ 全採納 | 見 TDD §1 | -| 2 | 錯誤碼結構化(`{error: {code, message, details}}`,409 帶 active_job 詳情)| ✅ 全採納 | 見 TDD §1.5 | -| 3 | Polling 效能(p95 < 200ms,ETag 支援)| ✅ 採納 | 見 §6.1、TDD §1.3(ETag)| -| 4 | 部署隔離(避免 `/jobs/*` 公網曝光)| ✅ 採納方案 B(Nginx 分流) | 見 §7 | -| 5 | 預留擴展(metadata、ETA、DELETE 路徑、progress 顆粒度)| ✅ 大部分採納 | API 留 `metadata: {}`、保留 `DELETE` 路徑可回 501 Not Implemented(Phase 2 再啟用)| -| 6 | promote 同步(p95 < 3s)| ✅ 採納 | 見 §6.1。超過 10s timeout 的 async 模式 Phase 1 不做 | -| 7 | Rate limit per client_id | ✅ 採納 | 見 §6.3 | - ---- - -## 11. 風險與待確認事項(給使用者決策) - -> 註:2026-04-25 變更後,原 R1(File Access Agent GET S2S 授權問題)已移除(見 §0 變更歷程),現存 R1-R6 為原 R2-R6 的內容沿用原敘述挪上一格後重編;R7 為本次針對 multer memoryStorage 大檔並發 OOM 新增的風險。 +## 9. 風險與待確認事項 | # | 風險 / 議題 | 影響 | 行動 | |---|-----------|------|------| -| R1 | Member Center client / audience / scope 註冊時程 | 高 | Orchestrator 協助排跨團隊會議 | -| R2 | Member Center 是否支援 `tenant_id` claim?格式?| 中 | 待確認;Phase 1 可先不擋 tenant(warn log),等 Member Center 定案 | -| R3 | File Access Agent 的 `object_key` 命名約定與 VisionA 對齊(僅 promote 需要)| 中 | 見 TDD §6.1 建議,需 VisionA 確認 | -| R4 | JWKS cache stale 時 Member Center 更新 key 的同步策略 | 低 | 10 min TTL + 遇到未知 kid 強制 refresh 應足夠 | -| R5 | 大檔(> 200MB)multipart 上傳會超過 p95 SLA | 中 | Phase 1 接受(SLA 已調整為 5s @ 200MB、12s @ 500MB);若觀測到頻繁超 SLA,Phase 2 引入 chunked upload | -| R6 | docker-compose 本地開發時如何測 OAuth | 中 | 見 TDD §11(建議本地跑 Member Center docker-compose,或以 mock server)| -| R7 | Scheduler 同時承接多個 500MB multipart 上傳會吃光記憶體(`multer.memoryStorage()`)| 中 | Phase 1 依賴 `user_has_active_job` 鎖避免同 user 併發;若跨 user 併發成為瓶頸,改 `multer.diskStorage()` 或 streaming 上傳 | +| R1 | CONVERTER_API_KEY rotation 流程未自動化 | 低 | Phase 1 接受手動 rotation;外洩時雙端同步改 .env + redeploy | +| R2 | `/result` 高並發 stream 壓力 | 低 | NEF 通常小、visionA 是唯一 caller、QPS 可控;觀測後再加防護 | +| R3 | visionA 一旦被 compromise 可冒充任意 user_id | 中(接受)| 同 OAuth 模型,本質 trust boundary 不變;audit log + anomaly detection 為主要 mitigation | +| R4 | Sec C1 暫緩(.env 進 git history)| 中 | Phase 1 ready 後做 history rewrite + rotate 所有 secret,包括 CONVERTER_API_KEY | +| R5 | 大檔 multipart OOM(多 user 並發)| 中 | 既有 `user_has_active_job` 鎖 + `MAX_CONCURRENT_UPLOADS` semaphore | +| R6 | NEF 7 天過期後 client 重新轉檔 | 低 | API spec 已定義 410 `result_expired`,visionA 端處理 | --- -## 12. Phase 1 / Phase 2 切分(架構層) +## 10. Phase 0.8b 切分(架構層) -### Phase 1 必做 +### Phase 0.8b 必做 -- auth middleware + scope 檢查 -- OAuth client + token cache -- 新路由群 `/api/v1/*`(POST jobs、GET jobs、GET jobs/:id、POST promote) -- Redis 資料模型擴充(user_id、tenant_id、索引) -- 同使用者一個轉檔限制 -- Recovery 查詢 -- 部署分流(Nginx 雙 vhost) -- OpenAPI 文件 +- API key middleware(`auth/apiKeyMiddleware.js`) +- 砍 OAuth resource server(`auth/middleware.js` / `auth/jwks.js`) +- 保留 OAuth client(`auth/oauthClient.js`,promote 用) +- `/api/v1/jobs/:id/result` endpoint +- Config 變動(移除 OAuth resource server 相關、新增 `CONVERTER_API_KEY`) +- 4 個既有 endpoint 改掛 `requireApiKey()`(取代 `requireAuth(scope)`) +- README / OpenAPI / .env.example 同步更新 -### Phase 2 預留(不改契約) +### Phase 2 預留(不在本次範圍) -- `POST /api/v1/jobs/:id/download-tokens`(等 Member Center 補完) -- `DELETE /api/v1/jobs/:id`(回 501 Phase 1 → 實作 Phase 2/3) -- `POST /api/v1/jobs/:id/webhooks`(預留結構,可回 501) -- Async promote 模式(若觀測到 p95 > SLA) - -### 阻塞條件 - -- **Phase 1 阻塞**:見 R1(Member Center 註冊),必須在 kickoff 前解除。File Access Agent 側 Phase 1 只需 `PUT /files/{key}` S2S 支援(對方現有 API 已支援,無阻塞) -- **Phase 2 阻塞**:Member Center `POST /file-access/download-tokens` 實作 +- `DELETE /api/v1/jobs/:id`(仍回 501) +- `POST /api/v1/jobs/:id/download-tokens`(仍回 501,未來 MC 補完再啟用) +- Webhook +- 觀測強化(Prometheus / OpenTelemetry) ### 觸發條件 -- Phase 1:三方交叉審閱通過 + 使用者審核 Design Doc + TDD + R1 解除 -- Phase 2:Member Center endpoint 實作完成並提供測試環境 +- Phase 0.8b:三方交叉審閱通過 + 使用者審核 + Backend 完成實作 → Reviewer → Testing → 部署 +- Phase 2:取決於 visionA 後續需求 + MC owner 時程 --- -## 13. 後續步驟 +## 11. 後續步驟 1. 本 Design Doc 送 PM / Design 交叉審閱 2. 使用者審核最終版 -3. 跨團隊協調 R1(Member Center 註冊 audience / client / scope) -4. 工程師依 `TDD.md` 的任務清單(T1-T8)增量式開發 -5. 第二階段:Reviewer 每個任務把關 +3. Backend Agent 依 `TDD.md` 索引 + `auth.md` + `api/api-result.md` 的任務拆分(Phase A 6 個子任務 + Phase B 4 個子任務)增量開發 +4. Reviewer 每個任務把關 + Testing 整合測試 +5. 雙端對齊部署(converter 先 → visionA 後 → e2e) --- -**附註**:本 Design Doc 約 710 行,已超過建議拆分門檻(500 行)。本次更新聚焦內容修正,暫不拆分;下輪更新建議拆分為 `design-doc.md`(索引)+ 子模組。 +**附註**:本 Design Doc 約 470 行,未超過拆分門檻。詳細 TDD 內容拆分為 `TDD.md` 索引 + `auth.md` + `api/*.md` + `database.md` + `infra.md` + `performance.md` + `observability.md` 等子檔案。 diff --git a/docs/autoflow/04-architecture/infra.md b/docs/autoflow/04-architecture/infra.md new file mode 100644 index 0000000..859599a --- /dev/null +++ b/docs/autoflow/04-architecture/infra.md @@ -0,0 +1,306 @@ +# Infra 設計 + +> **狀態**:Phase 1 完工 — Phase 0.8b 只動 env,Nginx / docker-compose 結構不變。 +> +> **配套**:`design-doc.md` §7、`auth.md` §4(CONVERTER_API_KEY 管理)。 + +--- + +## 1. Nginx 雙 vhost 分流 + +維持 Phase 1 設計(**Phase 0.8b 不動**): + +- **public vhost**(443 對公網):只 proxy `/api/v1/*` + `/health` +- **internal vhost**(內部 IP 80):proxy `/jobs/*` + `/queues/stats` + Web UI + +``` + ┌─────────────────────────────────────────────────────────────┐ + │ Nginx(單一 process) │ + │ │ + │ ┌────────────────────────┐ ┌────────────────────────────┐ │ + │ │ server { │ │ server { │ │ + │ │ listen 443 ssl; │ │ listen 10.0.0.1:80; │ │ + │ │ server_name │ │ server_name │ │ + │ │ converter....com; │ │ converter-internal...; │ │ + │ │ │ │ │ │ + │ │ location /api/v1/ {} │ │ location /jobs {} │ │ + │ │ location = /health {} │ │ location /queues/stats {} │ │ + │ │ location / { │ │ location / { │ │ + │ │ return 404; │ │ proxy_pass web:3000; │ │ + │ │ } │ │ } │ │ + │ │ } │ │ } │ │ + │ │ (public vhost) │ │ (internal vhost, 內網 IP) │ │ + │ └───────────┬─────────────┘ └────────────┬────────────────┘ │ + └──────────────┼──────────────────────────────┼───────────────────┘ + │ │ + ▼ ▼ + ┌──────────────────────────────────────────────────┐ + │ Task Scheduler (:4000) │ + │ - /api/v1/* (API key 保護,僅 public vhost 轉入)│ + │ - /jobs/* (無 auth,僅 internal vhost 轉入) │ + │ - /jobs/*/events(SSE) │ + │ - /health, /queues/stats │ + └──────────────────────────────────────────────────┘ +``` + +--- + +## 2. Nginx 完整設定(不變) + +```nginx +# /etc/nginx/conf.d/converter.conf + +upstream scheduler_upstream { + server scheduler:4000; + keepalive 32; +} + +# Public vhost +server { + listen 443 ssl http2; + server_name converter.innovedus.com; + + ssl_certificate /etc/nginx/certs/fullchain.pem; + ssl_certificate_key /etc/nginx/certs/privkey.pem; + + location /api/v1/ { + proxy_pass http://scheduler_upstream; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + proxy_request_buffering off; # 大檔 stream + proxy_read_timeout 300s; + client_max_body_size 600M; # multipart 上限略大於 500MB + } + + location = /health { + proxy_pass http://scheduler_upstream; + } + + location / { + return 404 '{"error":{"code":"not_found","message":"Not found"}}'; + default_type application/json; + } +} + +# Internal vhost +server { + listen 10.0.0.1:80; + server_name converter-internal.innovedus.com; + + location /jobs { + proxy_pass http://scheduler_upstream; + proxy_http_version 1.1; + proxy_set_header Host $host; + proxy_buffering off; # SSE 需要 + } + + location /queues/stats { + proxy_pass http://scheduler_upstream; + } + + location / { + proxy_pass http://web:3000; + } +} +``` + +--- + +## 3. docker-compose.yml 環境變數變動 + +### 3.1 Phase 0.8b 移除 + +```yaml +# 對外 API auth 不再走 OAuth +- MEMBER_CENTER_ISSUER +- MEMBER_CENTER_JWKS_URL +- KNERON_CONVERTER_AUDIENCE +- JWKS_CACHE_MAX_AGE_MS +- JWKS_COOLDOWN_MS +- JWT_CLOCK_TOLERANCE_SEC +``` + +### 3.2 Phase 0.8b 新增 + +```yaml +- CONVERTER_API_KEY=${CONVERTER_API_KEY} # 64 hex chars from `openssl rand -hex 32` +``` + +### 3.3 保留不動(promote 需要) + +```yaml +- MEMBER_CENTER_TOKEN_URL=${MEMBER_CENTER_TOKEN_URL} +- KNERON_CONVERTER_CLIENT_ID=${KNERON_CONVERTER_CLIENT_ID} +- KNERON_CONVERTER_CLIENT_SECRET=${KNERON_CONVERTER_CLIENT_SECRET} +- FILE_ACCESS_AGENT_BASE_URL=${FILE_ACCESS_AGENT_BASE_URL} +- FILE_ACCESS_AGENT_AUDIENCE=${FILE_ACCESS_AGENT_AUDIENCE} +- PROMOTE_TIMEOUT_MS=${PROMOTE_TIMEOUT_MS:-300000} +- OAUTH_TOKEN_REFRESH_SKEW_MS=${OAUTH_TOKEN_REFRESH_SKEW_MS:-60000} +- OAUTH_TOKEN_TIMEOUT_MS=${OAUTH_TOKEN_TIMEOUT_MS:-10000} +``` + +### 3.4 既有(不動) + +```yaml +- PORT=4000 +- NODE_ENV=${NODE_ENV:-development} +- REDIS_URL=${REDIS_URL} +- STORAGE_BACKEND=minio +- MINIO_* +- CONVERTER_TENANT_ID=${CONVERTER_TENANT_ID:-} # Phase 0.8b 仍保留(promote 流程仍可能用) +- API_V1_RATE_LIMIT_WINDOW_MS=${API_V1_RATE_LIMIT_WINDOW_MS:-300000} +- API_V1_RATE_LIMIT_MAX=${API_V1_RATE_LIMIT_MAX:-300} +- MULTIPART_MODEL_MAX_BYTES=${MULTIPART_MODEL_MAX_BYTES:-524288000} +- MULTIPART_REF_IMAGE_MAX_BYTES=${MULTIPART_REF_IMAGE_MAX_BYTES:-10485760} +- MULTIPART_REF_IMAGES_MAX_COUNT=${MULTIPART_REF_IMAGES_MAX_COUNT:-100} +- MAX_CONCURRENT_UPLOADS=${MAX_CONCURRENT_UPLOADS:-5} +- UPLOAD_RETRY_AFTER_SECONDS=${UPLOAD_RETRY_AFTER_SECONDS:-30} +``` + +### 3.5 變動移除原因 + +| Env | 為什麼移除 | Phase 1 用途 | +|-----|----------|-------------| +| `MEMBER_CENTER_ISSUER` | API key 不需要驗 issuer | OAuth resource server 驗 iss claim | +| `MEMBER_CENTER_JWKS_URL` | API key 不需要 JWKS | OAuth JWT 簽章驗證 | +| `KNERON_CONVERTER_AUDIENCE` | API key 不需要驗 aud | OAuth 驗 token 是給自己的 | +| `JWKS_*` | 沒有 JWKS cache 了 | JWKS 內部 cache 參數 | +| `JWT_CLOCK_TOLERANCE_SEC` | 沒有 JWT 驗證了 | JWT exp 時鐘容忍 | + +--- + +## 4. `.env.example` 改動 + +### 4.1 移除段(OAuth resource server) + +```bash +# === OAuth (Member Center) === ← 整段移除 +MEMBER_CENTER_ISSUER=... +MEMBER_CENTER_JWKS_URL=... + +# === Converter identity (Resource Server) === ← 整段移除 +KNERON_CONVERTER_AUDIENCE=... + +# === JWKS cache === ← 整段移除 +JWKS_CACHE_MAX_AGE_MS=600000 +JWKS_COOLDOWN_MS=30000 +JWT_CLOCK_TOLERANCE_SEC=60 +``` + +### 4.2 新增段 + +```bash +# === Phase 0.8b: API key for visionA → converter === +# 用 `openssl rand -hex 32` 產 64 hex chars +# 雙端必須對齊:visionA `.env.stage` 的 VISIONA_CONVERTER_API_KEY 同值 +# 絕不進 git / log / Slack +CONVERTER_API_KEY= +``` + +### 4.3 保留段(不變,promote 用) + +```bash +# === Member Center token endpoint(converter → FAA promote 用)=== +MEMBER_CENTER_TOKEN_URL=https://auth.innovedus.com/oauth/token + +# === Converter identity (OAuth Client,promote 用) === +KNERON_CONVERTER_CLIENT_ID=kneron_converter +KNERON_CONVERTER_CLIENT_SECRET=change-me + +# === File Access Agent === +FILE_ACCESS_AGENT_BASE_URL=https://files.nas.internal +FILE_ACCESS_AGENT_AUDIENCE=file_access_api + +# === Promote / OAuth Client tunables === +PROMOTE_TIMEOUT_MS=300000 +OAUTH_TOKEN_REFRESH_SKEW_MS=60000 +OAUTH_TOKEN_TIMEOUT_MS=10000 + +# === Rate Limit === +API_V1_RATE_LIMIT_WINDOW_MS=300000 +API_V1_RATE_LIMIT_MAX=300 + +# === Multipart upload === +MULTIPART_MODEL_MAX_BYTES=524288000 +MULTIPART_REF_IMAGE_MAX_BYTES=10485760 +MULTIPART_REF_IMAGES_MAX_COUNT=100 +MAX_CONCURRENT_UPLOADS=5 +UPLOAD_RETRY_AFTER_SECONDS=30 +``` + +--- + +## 5. 部署順序(Phase 0.8b) + +**重要**:錯誤順序會讓 stage 整段 down。正確順序: + +``` +Step 1: converter 端先實作完 + deploy + - 砍 OAuth middleware、加 API key middleware + - 加 /result endpoint + - 設 CONVERTER_API_KEY env + - 此時 converter 對外只認 API key(OAuth 已移除) + - 但既有 visionA stage 還在用 OAuth → 會撞 401 + ⚠️ 此 Step 應在 visionA stage 跑得通 OAuth 之前先完成(既然 visionA OAuth 還沒整合通過、本來就 401) + +Step 2: 驗證 converter 新 endpoint 可用 + - curl 打 GET /api/v1/jobs/<某 completed job>/result 帶 Bearer + - 確認 200 + NEF binary stream + - curl 打 POST /api/v1/jobs 用同把 key + - 確認 201 + job_id + +Step 3: visionA backend deploy(已 ready、commit 9e29ebf) + - VISIONA_CONVERTER_API_KEY env 跟 CONVERTER_API_KEY 對齊 + - visionA 用 API key 打 converter、走新的 GetResult endpoint + +Step 4: e2e 驗證 + - User upload → init → poll → promote → download + - 全綠 = 完成 +``` + +### 5.1 注意:5/9 stage 狀態 + +Phase 1 OAuth 從未在 stage 跑通(MC scope 沒註冊)。所以 Phase 0.8b 切換對「實際 e2e」是 **net positive**(從未 work → 開始 work)。Stage 不會有「OAuth 過了改 API key 變成 401」的 regression。 + +--- + +## 6. 安全配置 + +### 6.1 CONVERTER_API_KEY + +詳見 `auth.md` §4。 + +重點: +- 每環境獨立(dev / stage / prod) +- 64 hex chars(`openssl rand -hex 32`) +- 雙端對齊(visionA + converter) +- 絕不進 git +- Rotation 流程:手動同步 .env + redeploy + +### 6.2 Sec C1 暫緩(既有風險、不變) + +`.env` 一度被 commit 進 git history(5/2 健檢發現),已加入 `.gitignore` 但 history 仍可追溯。 + +**Phase 0.8b 階段**: +- 新增 `CONVERTER_API_KEY` 時注意**不要進 git** +- Phase 1 ready 後做一次 git history rewrite + 全 secret rotate(包括新加的 CONVERTER_API_KEY、既有的 OAuth client_secret、MinIO 等) + +--- + +## 7. CI/CD 影響 + +**無需改 CI**: + +- 既有 GitHub Actions 設定不變 +- 新加 `CONVERTER_API_KEY` 到 stage / prod secrets manager(Vault / k8s secret / docker secret) +- dev 用 `.env`(gitignored) + +--- + +## 8. Phase 2 預留 + +- 多 instance 部署:rate limiter 需從 process-local memory 改 Redis store +- 多 caller:可考慮加回 OAuth resource server(API key + OAuth 並存模式) +- Secrets manager 自動 rotation:整合 HashiCorp Vault / AWS Secrets Manager diff --git a/docs/autoflow/04-architecture/observability.md b/docs/autoflow/04-architecture/observability.md new file mode 100644 index 0000000..5aa8906 --- /dev/null +++ b/docs/autoflow/04-architecture/observability.md @@ -0,0 +1,274 @@ +# Observability 設計 + +> **狀態**:Phase 1 完工 — Phase 0.8b 新增 `/result` endpoint 的 log + metrics。 +> +> **配套**:`security.md`(log 不含 secret 規則)、`performance.md`(SLO 量測)。 + +--- + +## 1. 三支柱 + +Phase 1 + 0.8b:**Logs only**(Metrics / Traces 留 Phase 2)。 + +### 1.1 Logs(結構化 JSON) + +全部走 stdout,由 docker / k8s collector 撈走(不 ship 到外部)。 + +每筆 log 必含: + +| 欄位 | 範例 | +|------|------| +| `timestamp` | ISO 8601 `2026-05-16T12:00:00.123Z` | +| `level` | INFO / WARN / ERROR | +| `service` | `task-scheduler` | +| `action` | `domain.event`(如 `result.success`、`auth.api_key.not_configured`)| +| `request_id` | UUIDv4(中介層自動帶)| + +按 endpoint 額外欄位見下方各章。 + +### 1.2 Metrics(Phase 2) + +預留 Prometheus exposition。Phase 0.8b 不實作。 + +### 1.3 Traces(Phase 2) + +預留 OpenTelemetry。Phase 0.8b 不實作。 + +--- + +## 2. 各 endpoint log 欄位 + +### 2.1 `POST /api/v1/jobs` + +```jsonc +{ + "level": "INFO", + "service": "task-scheduler", + "timestamp": "...", + "action": "jobs.created", // 或 jobs.create_failed + "request_id": "...", + "job_id": "...", + "user_id": "...", + "client_id": "visionA-service", + "model_filename": "model.onnx", // sanitized + "model_size_bytes": 204800000, + "ref_images_count": 0, + "platform": "520", + "duration_ms": 4231, + "error_code": null // or 'user_has_active_job' / 'file_too_large' etc +} +``` + +### 2.2 `GET /api/v1/jobs/:id` + +```jsonc +{ + "level": "INFO", + "action": "jobs.get_one", + "request_id": "...", + "job_id": "...", + "user_id": "...", + "client_id": "visionA-service", + "internal_status": "ONNX", // 內部大寫 + "external_status": "running", + "etag_match": false, + "duration_ms": 18 +} +``` + +### 2.3 `GET /api/v1/jobs` + +```jsonc +{ + "level": "INFO", + "action": "jobs.list", + "request_id": "...", + "user_id": "...", + "filter_status": "in_progress", + "result_count": 3, + "duration_ms": 25 +} +``` + +### 2.4 `POST /api/v1/jobs/:id/promote` + +```jsonc +{ + "level": "INFO", + "action": "promote.success", // 或 promote.idempotent_hit / promote.not_ready / promote.faa_put_failed + "request_id": "...", + "job_id": "...", + "client_id": "visionA-service", + "target_count": 1, + "duration_ms": 580, + "error_name": null // or 'FAAUnauthorizedError' / 'FAATimeoutError' etc +} +``` + +### 2.5 `GET /api/v1/jobs/:id/result`(Phase 0.8b 新增) + +```jsonc +{ + "level": "INFO", + "action": "result.success", // 或 result.not_available / result.minio_failed / result.stream_error / result.client_closed + "request_id": "...", + "job_id": "...", + "client_id": "visionA-service", + "nef_key": "jobs/.../output/out.nef", // server-controlled,不算敏感 + "size_bytes": 52428800, + "filename_sent": "yolov5s_kl720.nef", + "duration_ms": 1234, + "error_code": null, // or 'result_expired' / 'job_not_completed' / 'storage_unavailable' + "stream_completed": true // false if client closed mid-stream +} +``` + +**Result endpoint 特別注意**: + +- **不 log NEF binary 內容**(只 log object key + size) +- **stream_completed: false** 代表 client 中途斷線(可能正常、可能網路爛、可能 client bug) +- **error_code = stream_error**:headers 已送出後 stream 失敗,沒辦法回 4xx 給 client + +--- + +## 3. Auth 相關 log + +### 3.1 API key middleware + +```jsonc +{ + "level": "ERROR", + "action": "auth.api_key.not_configured", // env 未設定 + "message": "CONVERTER_API_KEY env not set; rejecting all requests" +} +``` + +```jsonc +{ + "level": "INFO", + "action": "config.api_key_enabled", // 啟動時印 + "message": "API key middleware enabled", + "api_key_length": 64, // 不印 key 本身 + "timestamp": "..." +} +``` + +**注意**:API key 驗證失敗(401)**不 log 個別 request**(每次失敗都 log 會:(1) 攻擊面被打就會 log 爆炸;(2) log injection 風險)。改 metrics 計數。 + +### 3.2 OAuth client(promote 取 FAA token) + +```jsonc +{ + "level": "INFO", + "service": "oauth-client", + "action": "oauth.token_obtained", + "scope": "files:upload.write", + "token_type": "Bearer", + "expires_in_sec": 3600, + "access_token_length": 1024 // 不印 token 本身 +} +``` + +```jsonc +{ + "level": "WARN", + "service": "oauth-client", + "action": "oauth.token_endpoint_error", + "scope": "files:upload.write", + "status": 401, + "error_code": "invalid_client" +} +``` + +--- + +## 4. 敏感資料保護 + +### 4.1 絕對不 log + +- `Authorization` header 完整內容(含 API key、JWT) +- `CONVERTER_API_KEY`、`KNERON_CONVERTER_CLIENT_SECRET`、MinIO secret +- File body / model 內容 +- JWT payload 完整 dump +- FAA error body(可能含內部 endpoint / region 等) +- MinIO error message(可能含 endpoint / region / bucket name) + +### 4.2 可以 log + +- `client_id`、`user_id`(API key 模式下 client_id 固定為 `visionA-service`) +- `tenant_id` +- `request_id` +- File metadata:`filename`(sanitized)、`size_bytes`、`mimetype` +- Object key(server controlled,例如 `jobs/{job_id}/output/out.nef`) +- Error 分類資訊:`error_code`、`error_name`、`status`(HTTP) +- Duration、timestamp + +### 4.3 條件 log + +- IP:log 仍記、GDPR 場景可能需要遮罩 +- `model_filename`:已 sanitized、通常不視為敏感 +- 失敗時的 `error_message`:截短 100 chars 且不含 secret 才 log + +--- + +## 5. 日誌等級 + +| Level | 用途 | +|-------|------| +| DEBUG | 不用(production 不開)| +| INFO | 正常事件(job created、result.success、token_obtained 等)| +| WARN | 可恢復異常(FAA 5xx 重試、token cooldown、rate limit hit)| +| ERROR | 不可恢復 / 需人工關注(MinIO down、API key 未配置、stream 中斷)| + +--- + +## 6. 告警策略(Phase 0.8b 規劃,Phase 2 實作) + +| 等級 | 條件 | 回應時間 | +|------|------|---------| +| P0 | Scheduler down / Redis down | 15 min | +| P1 | API 5xx 比例 > 5% / 持續 5min | 1 hr | +| P1 | `auth.api_key.not_configured` 出現(代表 env 漏設)| 1 hr | +| P2 | `result.stream_error` 比例 > 1% | 當日 | +| P2 | `promote.faa_put_failed` 重試後仍失敗 | 當日 | +| P3 | Token cache miss 突增 | 下個工作日 | + +--- + +## 7. Dashboard(Phase 2 設計) + +**全域 dashboard**: + +- 每 endpoint QPS / 5min +- p50 / p95 / p99 延遲 +- 4xx / 5xx 比例 +- API key 401 比例(應接近 0%,> 0.1% 告警) + +**Result endpoint dashboard**(Phase 0.8b 新增): + +- `/result` QPS +- `result.success` / `result.not_available`(10/404/409/410 分布) +- stream_completed: true vs false 比例 +- 平均 NEF size + +--- + +## 8. Phase 0.8b 變動總結 + +### 8.1 新增 + +- `result.*` action 系列 log(success / not_available / minio_failed / stream_error / client_closed) +- `auth.api_key.*` action 系列 log +- `config.api_key_*` 啟動 log + +### 8.2 移除 + +- `auth.verify_failed`(OAuth JWT 驗證失敗) +- `auth.middleware_unexpected_error`(OAuth middleware 兜底) +- JWKS-related log(沒有 JWKS 了) + +### 8.3 保留 + +- `jobs.created` / `jobs.get_one` / `jobs.list` +- `promote.*` 全系列 +- `oauth.token_*`(promote 用的 OAuth client log) diff --git a/docs/autoflow/04-architecture/performance.md b/docs/autoflow/04-architecture/performance.md new file mode 100644 index 0000000..26661ac --- /dev/null +++ b/docs/autoflow/04-architecture/performance.md @@ -0,0 +1,158 @@ +# Performance 設計 + +> **狀態**:Phase 1 完工 — Phase 0.8b 新增 `/result` endpoint 的延遲預算。 +> +> **配套**:`design-doc.md` §6、`api/api-result.md`。 + +--- + +## 1. SLO 總表 + +| 端點 | SLI | SLO | 量測方式 | +|------|-----|-----|---------| +| `/api/v1/*` 可用率 | 2xx+3xx / 總請求 | ≥ 99.5%(工作時段)| Nginx access log + structured app log | +| `GET /api/v1/jobs/:id` p95 | 回應時間 95 百分位 | < 200ms | structured log 的 duration_ms | +| `GET /api/v1/jobs` p95 | 回應時間 95 百分位 | < 500ms | 同上 | +| `POST /api/v1/jobs` p95(200MB)| multipart 上傳到 MinIO 寫完 | < 5s | 同上 | +| `POST /api/v1/jobs` p95(500MB)| 同上 | < 12s | 同上 | +| `POST /api/v1/jobs/:id/promote` p95 | 回應時間 95 百分位 | < 3s | 同上 | +| `GET /api/v1/jobs/:id/result` TTFB | First byte 時間 | < 500ms | Nginx access log + app log | +| `GET /api/v1/jobs/:id/result` 完整下載(50MB)| 完整 stream 結束時間 | < 2s @ 50MB/s 鏈路 | client 端量 | +| API key 驗證失敗率 | 401 / 總請求 | < 0.1%(同 caller 不該錯) | structured log | + +--- + +## 2. 延遲預算 + +### 2.1 `POST /api/v1/jobs`(200MB 檔案) + +| 階段 | 預算 | 備註 | +|------|------|------| +| Nginx ingress(含 multipart 前置)| 10ms | | +| API key constant-time compare | 1ms | 64 hex chars 比對 | +| Multer 接收(memory)| 4000ms | 200MB @ 50MB/s | +| Validation(欄位、mimetype、副檔名)| 20ms | | +| Upload concurrency semaphore | 0-1000ms | 高並發時可能等 | +| Redis 查 active_job | 10ms | | +| MinIO PutObject | 1000ms | 200MB @ 200MB/s | +| Redis 寫 job record + 索引(Lua)| 20ms | | +| Enqueue 到 Redis Stream | 10ms | | +| **總預算 (200MB)** | **~5s p95** | | +| **總預算 (500MB)** | **~12s p95** | 4 倍 multipart 時間 | + +### 2.2 `GET /api/v1/jobs/:id` + +| 階段 | 預算 | +|------|------| +| Nginx ingress | 5ms | +| API key compare | 1ms | +| Rate limiter check | 2ms | +| Redis GET job:{id} | 10ms | +| Client 隔離檢查 | 1ms | +| Status mapping + serialize | 5ms | +| ETag 計算 + compare | 2ms | +| **總預算** | **~30ms p50、< 200ms p95** | + +### 2.3 `POST /api/v1/jobs/:id/promote`(單 target,50MB NEF) + +| 階段 | 預算 | +|------|------| +| Nginx ingress | 5ms | +| API key compare | 1ms | +| Redis GET job | 10ms | +| 冪等性 check | 1ms | +| MinIO HEAD object | 50ms | +| OAuth token cache hit | 1ms | +| FAA PUT(50MB @ 100MB/s)| 500ms | +| Redis SET(markPromoted)| 10ms | +| **總預算** | **~600ms p50、< 3s p95** | + +OAuth token cache miss 會多 200-500ms(取 token 一次)。 + +### 2.4 `GET /api/v1/jobs/:id/result`(Phase 0.8b 新增) + +| 階段 | 預算 | +|------|------| +| Nginx ingress | 5ms | +| API key compare | 1ms | +| Rate limiter check | 2ms | +| Redis GET job | 10ms | +| Status / expires_at check | 1ms | +| MinIO GET stream init(含 HEAD)| 50ms | +| **TTFB** | **~70ms p50、< 500ms p95** | +| Stream NEF(50MB @ 50MB/s)| 1000ms(client 端量) | + +**TTFB**:headers 送出 + 第一個 byte 到達 client,這是 `/result` 的關鍵 SLO。完整下載時間取決於檔案大小和鏈路頻寬,不算 Scheduler 的 SLO。 + +--- + +## 3. Token cache 策略 + +| Cache | TTL | 退出條件 | +|-------|-----|---------| +| ~~JWKS~~ | ~~10 min~~ | ~~遇到未知 kid 強制 refresh~~ | **Phase 0.8b 移除** | +| FAA service token(promote 用)| `expires_in - 60s` | 遇到 401 強制 refresh | + +--- + +## 4. Rate Limit 策略 + +| 範圍 | 限制 | 動機 | +|------|------|------| +| 全局(IP)| 200 req / 15min | 防匿名流量 / DDoS(既有)| +| Per `client_id`(API key 模式下固定 `visionA-service`)| 300 req / 5min | 防 polling 暴衝 | + +**Phase 0.8b 思考**:API key 模式下只有 1 個 caller、`client_id` 固定值,per-client rate limit 等於全局 limit。仍保留 per-client 結構,未來真有多 caller 時自動分流。 + +--- + +## 5. Streaming 記憶體足跡 + +### 5.1 `/result` stream(Phase 0.8b 新增) + +NEF 50MB stream: + +- MinIO client(aws-sdk):通常 16KB-64KB internal buffer +- Node HTTP response:highWaterMark 預設 16KB +- 整段 stream 期間 Scheduler heap 增量:**< 200KB per request** + +→ 1000 並發 stream 估算 < 200MB heap,可接受。 + +### 5.2 `POST /api/v1/jobs` multipart memoryStorage(既有) + +500MB multipart 寫入 memory: + +- 單個 request:500MB peak heap +- 5 並發(`MAX_CONCURRENT_UPLOADS=5`):2.5GB peak +- 容器 RAM 應 ≥ 4GB + +--- + +## 6. 觀測 + +詳見 `observability.md`。 + +每個 endpoint 必 log: + +- `action`(如 `result.success`、`promote.success`) +- `request_id` +- `client_id`、`user_id`(如可取) +- `duration_ms`(從 handler start 到 response end) +- 失敗時:`error_code`、`error_name`(不 log error_message 內容、避免洩漏) + +--- + +## 7. 負載測試計畫 + +| 類型 | 持續 | 目的 | 負載 | +|------|------|------|------| +| Steady-state | 30 min | 基線驗證 | 預估 QPS(visionA polling 5 user × 1 req/2s = 2.5 QPS)| +| Step-load | 每 5min 增量 | 找擴展極限 | 逐步增加到 50 QPS | +| Spike | 瞬間 | 突發流量 | 100 QPS(5x 基線)| +| Soak | 6 小時 | 記憶體洩漏 | 5 QPS 穩定 | + +`/result` 特別測: + +- 大檔 stream(200MB NEF)— OOM 測試 +- 多並發 stream(20 個 client 同時下載)— 確認 Scheduler 不掛 +- Slow client(client 收得慢)— 確認 stream 不堆 buffer diff --git a/docs/autoflow/04-architecture/security.md b/docs/autoflow/04-architecture/security.md index 154237e..bbda8a3 100644 --- a/docs/autoflow/04-architecture/security.md +++ b/docs/autoflow/04-architecture/security.md @@ -1,17 +1,20 @@ -# Security Notes — Phase 1 +# Security Notes — Phase 1 + Phase 0.8b -> 本文件記錄 Phase 1 已知的安全設計決策、被接受的風險、以及對應的 mitigation 與 Phase 2 改進候補方案。 +> 本文件記錄 Phase 1 + 0.8b 已知的安全設計決策、被接受的風險、以及對應的 mitigation 與 Phase 2 改進候補方案。 > > **更新時機**:每次安全審查(Reviewer / Security Auditor)發現新風險或變更現有 trust assumption 時,必須更新此檔案。 +> +> **Phase 0.8b 主要變動**:visionA → converter 對外 auth 從 OAuth JWT 改 pre-shared API key(ADR-010)。Trust boundary 模型不變,但 auth mechanism 大幅簡化。詳見下方「Auth Security」與「API Key Management」章節。 ## 索引 | Section | 內容 | |---------|------| -| [Trust Boundary](#trust-boundary重要-design-risk) | user_id 來源信任問題(Phase 1 接受風險) | +| [Trust Boundary](#trust-boundary重要-design-risk) | user_id 來源信任問題(Phase 1 接受風險,Phase 0.8b 風險模型不變) | | [Input Validation](#input-validation) | 已落實的輸入驗證機制 | | [Storage Security](#storage-security) | MinIO object key 控制與 cleanup 策略 | -| [Auth Security](#auth-security) | JWT / JWKS 配置、algorithm pin | +| [Auth Security](#auth-security) | **Phase 0.8b 改寫**:API key + (保留) OAuth client for promote | +| [API Key Management](#api-key-management) | **Phase 0.8b 新增**:CONVERTER_API_KEY rotation / 部署 / 外洩處理 | | [Rate Limiting](#rate-limiting) | 雙層 rate limiter 設計 | | [Logging](#logging) | 結構化 log 與敏感資料保護 | | [Phase 2 候補方案清單](#phase-2-候補方案清單) | 已知待補強的設計 | @@ -24,36 +27,38 @@ #### 設計 -`POST /api/v1/jobs` 的 `user_id` 從 multipart form field 傳入,**不是**從 JWT claim derive。Converter 完全信任 visionA-backend 端把對的 `user_id` 傳進來。 +`POST /api/v1/jobs` 的 `user_id` 從 multipart form field 傳入,**不是**從 token claim derive。Converter 完全信任 visionA-backend 端把對的 `user_id` 傳進來。 ``` +Phase 0.8b 後: + visionA-backend Converter │ │ - ├── client_credentials ──────→│ (取得 access token) + ├── 帶 CONVERTER_API_KEY ──────│ (pre-shared、雙端對齊) │ │ ├── POST /api/v1/jobs ────────→│ Form-Data: - │ Authorization: Bearer … │ user_id: "alice" ← visionA 端決定 - │ │ model: - │ │ ... + │ Authorization: │ user_id: "alice" ← visionA 端決定 + │ Bearer + │ API_KEY> │ ... │ │ │ │ Converter 端: - │ │ - 用 token 驗 client(OK) + │ │ - constant-time compare API key(OK) │ │ - 信任 user_id 是「真正提交的 user」 - │ │ - 不再驗證 user_id 與 token 的關係 + │ │ - 不再驗證 user_id 與 caller 的關係 ``` -#### Trust assumption(Phase 1) +#### Trust assumption(Phase 1 + 0.8b) visionA-backend 端: 1. **程式碼安全** — 無 XSS / SSRF / RCE 漏洞,user_id 來源可信 2. **infra 安全** — network ACL、IP allow-list、TLS 確保只有 visionA-backend 能呼叫此 API -3. **credential 管理** — `client_secret` 不外洩、不放 git、不寫 log +3. **credential 管理** — `CONVERTER_API_KEY` 不外洩、不放 git、不寫 log(**Phase 0.8b 改**:原為 `client_secret`) 4. **audit log 健全** — visionA 端能追溯「哪個真實用戶觸發了哪次轉檔」 #### Risk(被接受) -visionA-backend **一旦被 compromise**,attacker 可用同一個合法 `client_credentials`: +visionA-backend **一旦被 compromise**,attacker 可用同一個合法 `CONVERTER_API_KEY`(Phase 0.8b 後;Phase 1 為 OAuth `client_credentials`): | 攻擊面 | 影響 | |-------|------| @@ -61,13 +66,22 @@ visionA-backend **一旦被 compromise**,attacker 可用同一個合法 `clien | 鎖定特定 user 7 天 | active_job conflict 機制被當武器(任意 user_id 一旦被鎖,正常請求也 409)| | 偽造的 job 計入 victim user_id 的 history | `user:{victim}:jobs` Set 被汙染,未來查 history 看到不是自己的紀錄 | | 累計 victim 的 job count(如有 quota / billing) | Phase 2 若引入 per-user quota / billing,會誤計到 victim 上 | +| 拿任意 NEF(Phase 0.8b 新增 `/result`)| API key 模式下 `/result` 沒 per-job authorization、attacker 可拿任意 jobID 的 NEF | -#### Phase 1 決策(2026-04-25 使用者裁決) +#### Phase 1 / 0.8b 決策(2026-04-25 + 2026-05-16 使用者裁決) **接受此風險。** 理由: + 1. visionA-backend 是內部受控系統(非 Internet-facing),compromise 機率低 2. Phase 1 重點是把核心 pipeline 跑通,安全強化排在 Phase 2 -3. 引入 HMAC / OBO 會增加 visionA 端的整合工作量,目前未取得對方確認 +3. Phase 0.8b(2026-05-16)改 API key 後,trust model **沒有比 OAuth 更不安全也沒有更安全**: + - OAuth 模型:client_credentials 一旦外洩,attacker 也能冒充 visionA 取 token + 建 job + - API key 模型:API key 一旦外洩,attacker 直接打 API、相同攻擊面 + - 唯一差別:OAuth 有 token expiry(短週期),API key 是長期 secret(更需要 rotation 流程) +4. 不引入 HMAC-signed user_id / OBO: + - HMAC 仍是 symmetric secret,被同樣的 compromise 場景突破 + - OBO 需要 MC 實作 token exchange、與 ADR-015 「砍跨團隊依賴」精神相反 + - 未來真有 multi-caller 需求再回頭加 #### Mitigations(Phase 1 已採用) @@ -233,34 +247,132 @@ handler 在 `writeInputToMinIO` 之前先廉價 GET `user:{userId}:active_job` ## Auth Security -### JWT Algorithm Pin(Sec m3) +### Phase 0.8b:分兩條軸 -`src/auth/jwks.js` 明確 pin 接受的 JWT signing algorithm: +| 軸 | 流向 | Auth mechanism | Phase 0.8b 狀態 | +|----|------|----------------|---------------| +| **對外 API** | visionA → converter | Pre-shared API key(`CONVERTER_API_KEY`)| **新(取代 OAuth)** | +| **Promote** | converter → FAA | OAuth client_credentials(既有)| **保留** | -```js -const ALLOWED_JWT_ALGS = ['RS256', 'ES256', 'PS256']; +### 1. 對外 API:API key middleware(Phase 0.8b 新) + +#### 設計 + +- **Header**:`Authorization: Bearer `(重用 Bearer 格式、client 不需改) +- **比對**:`crypto.timingSafeEqual` constant-time compare +- **長度**:64 hex chars(`openssl rand -hex 32` 產,128 bits 安全強度、遠超 NIST 推薦的 80 bits) +- **req.auth 設定**:通過後 `req.auth.clientId = 'visionA-service'`(固定值、給 rate limiter / log 用) +- **失敗行為**:401 `invalid_token` + 主動 `socket.destroy()`(沿用 OAuth M2 行為) +- **Fail-fast**:env 未設定 → 503 拒絕所有 request(不 silently allow) + +#### 為什麼安全 + +1. **Constant-time compare**:`crypto.timingSafeEqual` 避免 timing attack(即使 attacker 用 differential analysis 也無法推斷 key 內容) +2. **長度檢查在 timingSafeEqual 之前**:避免 throw `RangeError`,但長度本身公開(key 長度為固定值) +3. **64 hex chars** = 256 bits 隨機(`openssl rand -hex 32`),brute force 不可行 +4. **不 log key 內容**(任何方向:expected / received 都不 log) + +#### 失敗情境總表 + +詳見 `auth.md` §1.4。 + +### 2. Promote:OAuth client_credentials(保留) + +Converter 仍以自己的身分取 `files:upload.write` token、PUT 結果檔到 FAA。 + +#### 既有設計(不變) + +- **JWT Algorithm Pin**(既有 Sec m3):在 `auth/oauthClient.js` 解析 MC 回應時不需 verify JWT(取回後直接帶到 FAA);FAA 端負責 verify、與 converter 無關 +- **Token cache**:per-scope,distance to expiresAt > refreshSkewMs(預設 60s)算 valid +- **In-flight dedup**:同 scope 並發只發一次 token request +- **AbortController timeout**:預設 10s +- **錯誤分類**:`OAuthClientError`(4xx,不重試)/ `OAuthServerError`(5xx,可重試)/ `OAuthTimeoutError`(網路 / timeout,可重試) + +#### 401 處理 + +FAA PUT 回 401 → `oauthClient.invalidate(scope)` + retry 1 次;仍 401 → 503 `auth_service_unavailable`。 + +### 3. 砍除(Phase 0.8b) + +| 移除 | 原因 | +|------|------| +| `src/auth/middleware.js`(OAuth resource server)| visionA → converter 不再驗 JWT | +| `src/auth/jwks.js` | 不需要 JWKS cache | +| `MEMBER_CENTER_ISSUER` / `JWKS_URL` env | 不驗 iss / 不取 JWKS | +| `KNERON_CONVERTER_AUDIENCE` env | 不驗 aud | +| `JWKS_CACHE_MAX_AGE_MS` / `JWKS_COOLDOWN_MS` / `JWT_CLOCK_TOLERANCE_SEC` env | 沒有 JWKS / JWT 了 | +| ALLOWED_JWT_ALGS 演算法 pin | 沒有 JWT 驗證了(FAA 端有自己的演算法 pin) | + +### 4. 保留(promote 仍需) + +| 保留 | 用途 | +|------|------| +| `src/auth/oauthClient.js` | converter → FAA OAuth client | +| `MEMBER_CENTER_TOKEN_URL` env | token endpoint | +| `KNERON_CONVERTER_CLIENT_ID` / `_CLIENT_SECRET` env | converter 作為 OAuth client 的身分 | +| `FILE_ACCESS_AGENT_AUDIENCE` env | FAA 的 audience(取 token 時用)| +| `OAUTH_TOKEN_REFRESH_SKEW_MS` / `OAUTH_TOKEN_TIMEOUT_MS` env | token cache 行為 | + +--- + +## API Key Management + +### 1. 產生 + +```bash +openssl rand -hex 32 +# 輸出:64 個 hex chars +# 範例:a3f9b2c1d8e7f6a5b4c3d2e1f0987654321fedcba9876543210abcdef1234567 ``` -拒絕: -- `none`(jose 預設拒絕,但仍明確列出) -- `HS256` / `HS384` / `HS512`(HMAC,避免演算法混淆攻擊) +### 2. 部署位置 -### JWKS Cache +| 環境 | 位置 | +|------|------| +| dev | `apps/task-scheduler/.env`(gitignored) | +| stage | docker-compose env / k8s secret | +| prod | docker secret / k8s secret / cloud secrets manager | -- TTL 10 分鐘(`JWKS_CACHE_MAX_AGE_MS` env override) -- Cooldown 30 秒(避免 JWKS endpoint 失敗時 thundering herd) -- 模組層級 cache(同一個 jwksUrl 共用一個 RemoteJWKSet) +### 3. 雙端對齊 -### Token 驗證 +- visionA `.env.stage`:`VISIONA_CONVERTER_API_KEY=` +- converter `.env`:`CONVERTER_API_KEY=` +- **兩端必須完全相同字串** -| 檢查項 | jose 預設 | Converter 加碼 | -|-------|----------|----------------| -| signature | ✅ | — | -| exp | ✅ | clockTolerance 60s | -| nbf | ✅ | — | -| issuer | — | ✅(`MEMBER_CENTER_ISSUER`)| -| audience | — | ✅(`KNERON_CONVERTER_AUDIENCE`)| -| algorithm | 拒絕 none | ✅ pin to RS256/ES256/PS256(Sec m3)| +### 4. 每環境獨立 + +dev / stage / prod 各自 `openssl rand -hex 32`,絕不共用。 + +### 5. Rotation 流程 + +1. 雙端各自準備 stop deployment(或允許短暫 401 期) +2. `openssl rand -hex 32` 產新 key +3. 更新雙端 `.env` +4. converter 先 redeploy(接受新 key) +5. visionA 後 redeploy(用新 key call) +6. 驗證:`curl -i -H "Authorization: Bearer " https://converter.../api/v1/jobs?user_id=test&limit=1` + +**極小停機**做法(Phase 0.8b 不實作):暫時讓 converter 接受新舊兩把 key(middleware 拓展成 array compare)、visionA 切到新 key、再砍舊 key。 + +### 6. 外洩處理 + +- **立即** rotate 雙端 key +- 檢視 audit log:在 rotation 前是否有可疑請求(用 `request_id` + `user_id` 追蹤) +- 若有 anomalous activity(同 client_id 短期內 100+ 不同 user_id),通報 + forensics +- Post-mortem:分析洩漏來源(git history?CI log?slack?)並補強 + +### 7. 絕不做的事 + +- ❌ **絕不**進 git(`.gitignore` 已 exclude `.env`、verify 一次) +- ❌ **絕不**寫進 Slack / email / 對話記錄 +- ❌ **絕不**印 log(middleware 內 log 用 `api_key_length` 或 `api_key_set: true` boolean、不印 key 本身) +- ❌ **絕不**在 commit message / PR description 中引用具體值 + +### 8. Sec C1 暫緩(既有風險、Phase 0.8b 仍適用) + +`.env` 一度被 commit 進 git history。Phase 0.8b 新加的 `CONVERTER_API_KEY` 必須**極度**注意不要重蹈覆轍。 + +**Phase 1 ready 後**會做一次 git history rewrite + 強制 rotate 所有 secret(含 CONVERTER_API_KEY / 既有的 OAuth client_secret / MinIO secret)。 --- @@ -327,16 +439,22 @@ Request → IP-based limiter (200 req / 15min) ← app.js 全域 | # | 項目 | 優先級 | 預期任務 | |---|------|-------|---------| -| 1 | **HMAC-signed user_id 或 OBO token**(解決 Trust Boundary)| HIGH | Phase 2 — auth 強化 | -| 2 | **Git history rewrite**(清掉 .env 洩漏)| HIGH | Phase 1 ready 收尾 | -| 3 | **MULTIPART_MODEL_MAX_BYTES env 串接**(目前寫死 500MB)| MEDIUM | T10 | -| 4 | **MAX_CONCURRENT_UPLOADS semaphore**(防多 user 並發 OOM)| MEDIUM | T10 | -| 5 | **Stream storage 評估**(取代 memoryStorage,根本解決 OOM)| MEDIUM | Phase 2 — infra | -| 6 | **Rate limiter Redis store**(多 instance 部署前提)| MEDIUM | Phase 2 — infra | -| 7 | **Audit anomaly detection**(user_id pattern 異常告警)| LOW | Phase 2 — observability | -| 8 | **Filename Unicode normalization**(極端 unicode bypass)| LOW | Phase 2 — security 細修 | -| 9 | **Metadata prototype pollution 防護**(白名單 keys)| LOW | Phase 2 — security 細修 | -| 10 | **Token revocation list / JWT blacklist**(無此需求現在)| LOW | Phase 2 — auth | +| 1 | **OBO token / Token Exchange**(解決 Trust Boundary,**Phase 0.8b 已決策不做 HMAC 中繼方案**)| MEDIUM | Phase 2 — auth 強化(前提:MC 實作 Token Exchange RFC 8693) | +| 2 | **Git history rewrite**(清掉 .env 洩漏,**含 Phase 0.8b 新加的 CONVERTER_API_KEY**)| HIGH | Phase 1 ready 收尾 | +| 3 | **API key automatic rotation**(整合 secrets manager / Vault)| MEDIUM | Phase 2 — infra | +| 4 | **API key 並存模式**(讓 middleware 同時接受新舊兩把 key,支援極小停機 rotation)| LOW | Phase 2 — auth 細修 | +| 5 | **MULTIPART_MODEL_MAX_BYTES env 串接** | DONE | Phase 1 T10 | +| 6 | **MAX_CONCURRENT_UPLOADS semaphore** | DONE | Phase 1 T10 | +| 7 | **Stream storage 評估**(取代 memoryStorage)| MEDIUM | Phase 2 — infra | +| 8 | **Rate limiter + bandwidth quota + concurrent stream cap Redis store**(多 instance 部署前提)| **HIGH**(2026-05-17 升級:Phase B 後 `/result` 開放、attacker blast radius 放大、多 instance 部署前必補)| Phase 2 — infra(前提:Phase B 後若有 multi-instance 需求) | +| 9 | **Audit anomaly detection**(user_id pattern 異常告警)| LOW | Phase 2 — observability | +| 10 | **Filename Unicode normalization**(極端 unicode bypass)| LOW | Phase 2 — security 細修 | +| 11 | **Metadata prototype pollution 防護**(白名單 keys)| LOW | Phase 2 — security 細修 | +| 12 | **`/result` per-job authorization**(API key 模式下 attacker 可拿任意 jobID)| **MEDIUM**(A.7 follow-up §4 升級)| Phase 2 — auth;考慮 client_id 隔離;前提是 #13 完成(per-caller credential) | +| 13 | **加回 OAuth resource server 並存模式 / 多 caller credential**(多 caller 場景)| **MEDIUM**(A.7 follow-up §4 升級)| Phase 2 — 真有第二個 caller 時;#12 的前置工作 | +| 14 | **`/result` 404 vs 410 區分的 jobID enumeration risk** | LOW(marginal、待 #12 完成後可一併處理) | Phase 2 — auth 細修;#12 補後可考慮統一為 404 | +| 15 | **`/result` per-job auth 啟用時、4xx 統一回 404(不揭露 lifecycle)**(2026-05-17 Security review §1 Q5 新增)| LOW → 隨 #12 同步升級為 #12 的子任務 | Phase 2 — auth;當 #12 完成時、attacker 拿到的 key 不再給 full read access、區分 404/410 才變成真正的 leak、屆時應統一回 404 | +| 16 | **MinIO socketTimeout / connectTimeout 對齊 RESULT_STREAM_TIMEOUT_MS**(2026-05-17 Security review §1 Q1 / s2 新增)| LOW | Phase 2 — infra;確認 MinIO SDK timeout 設定、對齊 5 min stream timeout(建議 socketTimeout = STREAM_TIMEOUT - 30s、預留 server-side teardown 時間);避免 MinIO 端 timeout 在 response timeout 前觸發、attacker 看到 stream_error 而非 stream_timeout | --- @@ -345,3 +463,6 @@ Request → IP-based limiter (200 req / 15min) ← app.js 全域 | 日期 | 變更 | 觸發 | |------|------|------| | 2026-04-25 | 初版 | T5 Reviewer + Security Audit 修復 | +| 2026-05-16 | Phase 0.8b 重寫:auth 從 OAuth 改 API key;新增 API Key Management 章節;Trust boundary 風險模型與 Phase 1 一致(只是 secret 形式改變);Phase 2 候補清單更新(OBO 從 HIGH 降 MEDIUM、新增 API key rotation 相關項) | ADR-010 + ADR-011 | +| 2026-05-17 | Phase B 啟動前 streaming/range design review:候補 #12 / #13 升 MEDIUM;新增候補 #14(404 vs 410 區分的 jobID enumeration trade-off、當前 trust model 下 marginal risk、文件化保留 TODO-v2 §4.1 規格);`/result` 設計補充章節(rate limit / Range / audit log / source_filename)詳見 `api/api-result.md` §9-§14 | Security Auditor A.7 follow-up §4 + Phase B 啟動前 design review | +| 2026-05-17 | Phase B Security design review 第二輪採納(4 Major + 3 Minor):候補 #8 從 MEDIUM 升 **HIGH**(多 instance 部署前必補 Redis store、涵蓋 rate limit + bandwidth quota + concurrent cap 三軸);新增候補 #15(per-job auth 啟用時 4xx 統一回 404、隨 #12 同步)+ 候補 #16(MinIO socketTimeout 對齊 RESULT_STREAM_TIMEOUT_MS);`/result` 設計強化:rate limit 從 60 req/min single tier 改為 two-tier(5 req/10s burst + 20 req/min sustained)+ bandwidth quota(1 GB/hr + 6 GB/24hr);新增 stream response timeout 5 min + concurrent stream cap 10 per-instance;Range header 處理三件事(Accept-Ranges: none + silently ignore + log range_attempted);audit log 從 8 event 擴到 12 event + 強制 A.7 五欄 + /result 特有四欄;filename defense-in-depth(quote-escape + RFC 5987 + buildFilename assertion);Backend acceptance criteria 從 B1-B9 擴充到 AC-1 到 AC-12 + 6 個新 integration test;詳見 `api/api-result.md` §9 / §10 / §11 / §13.4a / §15 / §14 | Security Auditor Phase B streaming/range design review §1 Q1-Q6 + §2 修正 + §3 acceptance criteria |