jim800121chen aeaecb8c06 fix(compose): Phase 0.8b deploy blocker — env 透傳 + 命名規格
d8a9517 commit 漏改 docker-compose.yml:scheduler service environment block
沒透傳 Phase 0.8b 新 env、即使 stage .env 設了 container 也讀不到、
deploy 後 CONVERTER_API_KEY undefined 會啟動 503 reject all requests。

docker-compose.yml:
- 新增 10 個 Phase 0.8b env 透傳(CONVERTER_API_KEY 無 default fail-secure、
  其他用 ${VAR:-default} fail-soft)
- 砍 9 個已廢 OAuth resource-server env(MEMBER_CENTER_ISSUER / JWKS_URL /
  AUDIENCE / CONVERTER_TENANT_ID / SCOPE_* / JWKS_* / JWT_*)
- 保留 8 個 promote → FAA 用 env(MEMBER_CENTER_TOKEN_URL /
  KNERON_CONVERTER_CLIENT_ID/SECRET / FILE_ACCESS_AGENT_* /
  OAUTH_TOKEN_* / PROMOTE_TIMEOUT_MS)

docs/autoflow/04-architecture/api/api-result.md §16:
- 新增 Env Naming Reference Table(30 個 canonical env names)
- 拍板 source code 為 single source of truth、env.example 對齊
- 確認 /result 8 個 env + 其他 22 個的命名規格
- 留歷史記錄:Orchestrator 之前用過想像中縮寫名(_MAX / _HOURLY_QUOTA /
  RESULT_CONCURRENT_STREAM_MAX)造成命名混亂、§16 為未來 prompt 引用標準

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 01:01:59 +08:00

1613 lines
86 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# API: `GET /api/v1/jobs/:id/result`Phase 0.8b 新增、Phase B 設計 2026-05-17 強化)
> **狀態**Phase 0.8b 新增,取代原 Phase 2 delegated download token 設計Phase B 啟動前經 Security review 強化 streaming 攻擊面 mitigation。
> **配套**visionA repo `adr-016-download-via-converter.md` v1.0、`design-doc.md` §3.3 / ADR-011。
> **Phase B 設計強化來源**2026-05-17
> - Security review`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md`4 Major + 3 Minor + 2 Suggestion
> - Architect 採納範圍M1-M4rate limit + bandwidth quota + Range + stream timeout + concurrent cap+ m1-m3quote-escape + 429/503 status + audit log 欄位)
> - 詳見 §9rate limit + bandwidth quota、§10Range、§11audit log、§13.4afilename defense-in-depth、§15streaming resource limits、§14acceptance criteria AC-1 到 AC-12
---
## 1. 用途
visionA-backend 用此 endpoint 從 Converter Bucket 直接拿 NEF 結果檔streaming proxy
取代原本「visionA → 拿 delegated download token → FAA」路徑該路徑因 MC 沒實作 endpoint 而從未跑通)。
---
## 2. Request
```http
GET /api/v1/jobs/{id}/result HTTP/1.1
Host: converter.innovedus.com
Authorization: Bearer <CONVERTER_API_KEY>
X-Request-Id: <uuid> (optional)
```
### 2.1 Path params
| 欄位 | 類型 | 說明 |
|------|------|------|
| `id` | string (UUIDv4) | Job ID |
### 2.2 Query / Body
**無**streaming endpoint 不支援額外參數)。
### 2.3 Auth + Rate Limit + Bandwidth Quota + Concurrent Cap
- `Authorization: Bearer <CONVERTER_API_KEY>`API key middleware`auth.md` §1
- **Rate limit**(詳見 §9、Security 2026-05-17 review 後改):
- Burst5 req / 10 sec per `token_fingerprint`
- Sustained20 req / min per `token_fingerprint`
- **Bandwidth quota**(詳見 §9
- Hourly1 GB / hr per `token_fingerprint`
- Daily6 GB / 24hr per `token_fingerprint`
- **Concurrent stream cap**(詳見 §15max 10 同時 streamper-instance
- **Stream timeout**(詳見 §155 分鐘(超時 destroy connection
---
## 3. Response 200成功
```http
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: <NEF binary 大小>
Content-Disposition: attachment; filename="<source_filename_stem>_<chip>.nef"
X-Request-Id: <uuid>
```
### 3.1 Headers
| Header | 規則 |
|--------|------|
| `Content-Type` | `application/octet-stream`(或 MinIO HEAD 回傳的 `contentType`,預設 octet-stream |
| `Content-Length` | NEF 物件大小 bytes從 MinIO HEAD 取);**必須帶**visionA 端用來決定 timeout |
| `Content-Disposition` | `attachment; filename="<filename>"`filename 規則見 §3.2 |
| `X-Request-Id` | 沿用 request_id middleware 設定的 ID |
### 3.2 Filename 規則
**格式**`<source_filename_stem>_<chip>.nef`
| 輸入 | 結果 |
|------|------|
| `source_filename = yolov5s.onnx``platform = '720'` | `yolov5s_720.nef` |
| `source_filename = model.pt``platform = '530'` | `model_530.nef` |
| `source_filename` 缺失(極端情境)| `job_<job_id>.nef`fallback |
| `platform` 缺失(極端情境)| `job_<job_id>.nef`fallback |
**注意**`job.platform` 為 createJob validator 接受的數字字串(如 `'720'` / `'530'` / `'520'`、無 `KL` prefix`buildFilename` 透過 `.toLowerCase()` 標準化(對純數字字串無變化、保留同樣的標準化邏輯以兼容未來可能的字母混用 platform
**實作邏輯**
```javascript
function buildFilename(job) {
const sourceFilename = job.source_filename || '';
const platform = (job.platform || '').toLowerCase();
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
if (stem && platform) {
return `${stem}_${platform}.nef`;
}
return `job_${job.job_id || 'unknown'}.nef`;
}
```
**邊界情境**
- `source_filename` 含特殊字元(已 sanitized 由 `sanitizeFilename`)— 不再二次 sanitize
- `platform` 大小寫 — 統一 lower-case對齊 visionA `defaultDownloadFilename` 慣例)
### 3.3 Body
NEF binary streamNode Stream pipe
**不要 buffer 整個檔**。NEF 可能 < 50MB常見 數百 MB極端buffer OOM
---
## 4. Response 4xx / 5xx
統一格式
```json
{
"error": {
"code": "<error_code>",
"message": "<zh-TW message>",
"details": { /* 可選 */ },
"request_id": "<uuid>"
}
}
```
### 4.1 失敗情境
| HTTP | error.code | 情境 | 訊息範例 |
|------|-----------|------|---------|
| 401 | `invalid_token` | API key missing / 格式錯 / 不符 | API key 驗證失敗 |
| 404 | `job_not_found` | jobID 不存在 | Job {jobId} not found |
| 404 | `result_not_found` | job completed result_object_keys 內沒 NEF | Job {jobId} completed but no NEF result available |
| 409 | `job_not_completed` | job 還沒 completedstill running / failed | Job {jobId} is {status}; result only available after completion |
| 410 | `result_expired` | converter MinIO 已過期清除7 `expires_at` | Job {jobId} result expired at {expires_at}; re-convert to get a fresh result |
| 422 | `invalid_request` | path param 異常 | job id is required |
| **429** | **`rate_limit_exceeded`** | **req/min 或 burst 超限****Security 必補** | **請求頻率過高,請稍後再試** `limit_type: 'burst' \| 'sustained'` |
| **429** | **`bandwidth_quota_exceeded`** | **1hr/24hr bandwidth quota 超限****Security 必補**| **下載額度已用完,請稍後再試** `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'` |
| 502 | `storage_unavailable` | MinIO 連不上 / `getObjectStream` throw | 無法讀取結果檔請稍後重試 |
| **503** | **`service_busy`** | **Concurrent stream cap 達到上限****Security 必補**| **伺服器忙碌中,請稍後再試** `limit_type: 'concurrent'``Retry-After: 30`|
| **503** | **`stream_timeout`** | **response stream 超時5 分鐘)****Security 必補**| **下載逾時,請重試** |
| 503 | `service_unavailable` | API key 未配置 / 其他暫時性錯誤 | API key not configured |
### 4.2 status code 選擇邏輯
```
if (API key invalid) → 401
if (job not in Redis) → 404 job_not_found
if (job.status !== 'completed') → 409 job_not_completed
if (job.expires_at < now) → 410 result_expired
if (no nefKey extractable) → 404 result_not_found
if (minio.getObjectStream throw)
- if MinIO not found error → 410 result_expired
- else → 502 storage_unavailable
```
### 4.3 順序的重要性
**先檢查 status 再檢查 expires_at** job running 409 410 更精確resource 還在只是還沒完成)。
**最後檢查 nefKey extractable**404 `result_not_found` job 完成但沒 NEF的特殊情境應該不會發生因為 NEF 是最後一階段completed 就一定有但保險
### 4.4 Stream 中斷處理
Stream 開始後headers 已送出 MinIO stream 出錯
- **不能改 status code**headers 已發
- 唯一動作`res.destroy(streamErr)` + log ERROR + client 看到 `ECONNRESET`
- ClientvisionA應實作 retry 邏輯
Client 主動關連線`req.on('close')`
- 主動 `result.stream.destroy()` 釋放 MinIO connection
- Log INFO不算錯
---
## 5. 與既有 endpoint 的關係
### 5.1 Job lifecycle 對應
```
created
↓ (Worker 處理)
running (stage = onnx → bie → nef)
COMPLETED + result_object_keys.nef 有值
↓ ───→ GET /jobs/:id/result → 200 + NEF stream
(7 天後 expires_at 過了)
expiredNEF 在 MinIO 已被 lifecycle 清掉job record 可能還在 Redis
↓ ───→ GET /jobs/:id/result → 410 result_expired
```
### 5.2 與 `/promote` 的關係
`/result` `/promote` **獨立**
- `/promote` NEF Converter Bucket 搬到 FAA NAS Bucket長期儲存
- `/result` Converter Bucket streaming caller
visionA 可以同時打兩個promote NAS 有檔result 立即下載給 user)。
NEF Converter Bucket 7 天後過期清掉FAA NAS Bucket 永久 FAA lifecycle 管理)。**過期後 `/result` 410client 該重新轉檔**不應該 fallback FAA 那會繞回 delegated download token 死路)。
### 5.3 與 Phase 2 預留 `/download-tokens` 的關係
`POST /api/v1/jobs/:id/download-tokens` Phase 2 預留 501)。**不衝突**
- `/download-tokens`未來給 browser 直連 converter download 用的 short-TTL token
- `/result` visionA backend stream proxy
兩個用途不同可共存Phase 0.8b 不啟用 `/download-tokens`
---
## 6. 實作細節
### 6.1 NEF object key 解析(雙路徑)
對齊 promote 流程的 `getJobOutputKey`
```javascript
function extractNefObjectKey(job) {
// 新格式
if (job.result_object_keys
&& typeof job.result_object_keys === 'object'
&& typeof job.result_object_keys.nef === 'string'
&& job.result_object_keys.nef.length > 0) {
return job.result_object_keys.nef;
}
// 舊格式(向後相容)
if (job.output
&& typeof job.output === 'object'
&& typeof job.output.nef_path === 'string'
&& job.output.nef_path.length > 0) {
return job.output.nef_path;
}
return null;
}
```
### 6.2 Streaming 流程
```javascript
router.get('/', async (req, res, next) => {
try {
const jobId = req.params.id;
if (!jobId) return next(new ApiError(400, 'invalid_request', 'job id is required'));
// 1. 拿 job record
const job = await jobService.getJob(jobId);
if (!job) return next(new ApiError(404, 'job_not_found', `Job ${jobId} not found`));
// 2. 檢查 status
if (job.status !== 'COMPLETED') { // internal status 是大寫
return next(new ApiError(409, 'job_not_completed',
`Job ${jobId} is ${job.status}; result only available after completion`));
}
// 3. 檢查 expires_at
if (job.expires_at && new Date(job.expires_at) < new Date()) {
return next(new ApiError(410, 'result_expired',
`Job ${jobId} result expired at ${job.expires_at}`));
}
// 4. 解析 NEF object key
const nefKey = extractNefObjectKey(job);
if (!nefKey) {
return next(new ApiError(404, 'result_not_found',
`Job ${jobId} completed but no NEF result available`));
}
// 5. 從 MinIO 拿 stream
let result;
try {
result = await minioStorage.getObjectStream(nefKey);
} catch (err) {
logEvent({ level: 'ERROR', action: 'result.minio_failed', /* ... */ });
return next(new ApiError(502, 'storage_unavailable', /* ... */));
}
if (!result) {
return next(new ApiError(410, 'result_expired',
`Job ${jobId} NEF object not found in storage (likely expired)`));
}
// 6. 設 headers
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`);
// 7. Stream pipe
result.stream.on('error', (streamErr) => {
logEvent({ level: 'ERROR', action: 'result.stream_error', /* ... */ });
if (!res.destroyed) res.destroy(streamErr);
});
req.on('close', () => {
if (result.stream && typeof result.stream.destroy === 'function') {
result.stream.destroy();
}
});
result.stream.pipe(res);
} catch (err) {
return next(err);
}
});
```
### 6.3 為什麼用 mergeParams
router 掛在 `/jobs/:id/result`handler `/` path`mergeParams: true` 才能讀到 `:id`
```javascript
const router = express.Router({ mergeParams: true });
router.get('/', handler);
// in createV1Router:
router.use('/jobs/:id/result', requireApiKey(), perClientLimiter, createResultRouter({ ... }));
```
### 6.4 Log 規則
| 場景 | level | action |
|------|-------|--------|
| Happy path200| INFO | `result.success` job_idsize_bytesduration_ms |
| 404 / 409 / 410 | INFO | `result.not_available` reason |
| 502 MinIO 失敗 | ERROR | `result.minio_failed` error_nameerror_code log MinIO endpoint |
| Stream 中斷已送 headers| ERROR | `result.stream_error` |
| Client 主動斷線 | INFO | `result.client_closed` |
---
## 7. Test 範圍Backend 實作 + Testing 驗證)
### 7.1 Integration test必做
- Happy path200completed job + NEF + 不過期 完整 stream NEF binaryContent-Type / Content-Length / Content-Disposition 正確
- 401missing API key
- 401wrong API key
- 404 `job_not_found`jobID 不存在
- 404 `result_not_found`completed 但沒 NEF
- 409 `job_not_completed`status = ONNX / BIE / NEF / FAILED
- 410 `result_expired`expires_at 在過去
- 410 `result_expired`MinIO `getObjectStream` null
- 502 `storage_unavailable`MinIO throw
- 503 `service_unavailable`CONVERTER_API_KEY 未設定 但其實這在 middleware 走全部 endpoint 都會中
### 7.2 Unit test
- `extractNefObjectKey`新格式舊格式缺失 null
- `buildFilename`標準情境 source_filename platform副檔名變體.onnx / .pt / .tflite
- Stream error handlingmock stream emit error
- Client close handlingmock req emit close
### 7.3 Stress / 邊界 test選做
- 大檔 stream200MB NEF)— 確認記憶體不爆
- 多並發 stream10 client 同時下載)— 確認 Scheduler 不掛
- Slow clientclient 收得慢)— 確認 stream 不會無限堆 buffer
---
## 8. Curl 範例
```bash
# Happy path
curl -i \
-H "Authorization: Bearer $CONVERTER_API_KEY" \
https://converter.innovedus.com/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/result \
-o output.nef
# 預期:
# HTTP/1.1 200 OK
# Content-Type: application/octet-stream
# Content-Length: 12345678
# Content-Disposition: attachment; filename="yolov5s_720.nef"
```
```bash
# 過期情境
curl -i \
-H "Authorization: Bearer $CONVERTER_API_KEY" \
https://converter.innovedus.com/api/v1/jobs/expired-job-id/result
# 預期:
# HTTP/1.1 410 Gone
# Content-Type: application/json; charset=utf-8
# {"error":{"code":"result_expired","message":"...","request_id":"..."}}
```
---
## 9. Rate Limit + Bandwidth QuotaPhase B 設計,**Security 2026-05-17 review 後修正**
> **重要變更**:原 60 req/min single tier 設計被 Security review`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md` §1 Q3 / Major M4否定。
> **新設計**two-tier req-based limit + bandwidth quota two-tier。理由req-count 無法區分大檔/小檔,`/result` 核心攻擊面是**頻寬**不是 req count。
### 9.1 為什麼 `/result` 要獨立 rate limit + bandwidth quota
| | `/jobs` 寫入端點既有 | `/result` 下載端點Phase B |
|--------------|----------------------|---------------------------|
| 既有配額 | 300 req / 5 min per client_id | |
| 工作負載成本 | CPUmulter parse+ MinIO write | MinIO read + 持續 streaming可達 100MB+ / req |
| Blast radiusattacker 拿到 key | 占用 worker queue / 灌滿 MinIO | 流量放大鏡1 jobID = 100MB+ 下載快速耗盡頻寬 |
| 限流軸 | req count 為主 | **req count + bandwidth 雙軸**攻擊面在頻寬而非次數 |
Security 量化分析review §1 Q3
| 設計 | Normal userP95 120 req/min| Attacker req 100MB|
|------|-------------------------------|------------------------|
| 60 req/min | **過嚴**誤殺 retry burst| **過寬**6 GB/min = 800 Mbps8.6 TB/day = $770/day cloud egress|
| two-tier + bandwidth quota | 充分20 req/min sustained + 5 req/10s burst 覆蓋 retry pattern| 1 GB/hr ceiling 直接堵頻寬攻擊 |
`/result` 開放後是流量放大鏡」:attacker 拿到 key 後不在乎 req 次數在乎每次能拉多少 byte。**只擋次數不擋頻寬 = 沒有實質保護**。
### 9.2 限流軸總表
| 限制軸 | 數值 | 用途 | bucket key |
|--------|------|------|-----------|
| **Burst rate** | 5 req / 10 sec | 阻擋短時間 burst | `token_fingerprint` |
| **Sustained rate** | 20 req / min | 涵蓋 visionA P95 normal load120 req/min ÷ 10 caller 12 req/min/key 1.7× headroom阻擋持續 mass request | `token_fingerprint` |
| **Bandwidth hourly** | 1 GB / hr | 阻擋大量 NEF 下載attacker 撞滿 = 24 GB/day可控成本| `token_fingerprint` |
| **Bandwidth daily** | 6 GB / 24hr | 阻擋 attacker 每小時剛好 1 GB迴避 hourly limit | `token_fingerprint` |
**bucket key 用 `token_fingerprint`A.7 已實作 SHA-256**
- 不用 `clientId`當前 1:1 trust 下所有 caller 都是 `'visionA-service'`bucket 平坦化無區分力
- `token_fingerprint` Phase 0.8b 1:1 trust 下實際等同 caller idPhase 2 引入 per-caller credential 後自動對齊
- forensic 用途audit log 已記錄 `token_fingerprint`限流統計與 forensic key cross-correlate
### 9.3 為什麼 two-tier req limit
Single tier 處理不了 visionA exponential backoff retry pattern
| Retry 間隔 | 10 sec window req | 1 min window req |
|-----------|-----------------------|----------------------|
| 1s / 5s / 15svisionA `ConverterClient.GetResult` 預設| 1-2 | 3-4 |
- **Burst tier5 req / 10s**允許 retry burst不誤殺合法 retry
- **Sustained tier20 req / min**阻擋持續高頻 requestattacker 不靠 burst 而是穩定打
- 兩者**同時生效**任一觸發即 429
### 9.4 為什麼 bandwidth quota 必補
`req count` 無法區分 attack pattern
| 場景 | 20 req / min 是否擋住 | 實際 bandwidth |
|------|-------------------|--------------|
| Normal user 1 100MB NEF | 不擋合法| 100MB合法|
| Attacker 20 req/min × 6hr × 100MB | 不擋剛好踩線| **720 GB / 6hr ≈ $65 cloud egress / event** |
加上 1 GB/hr bandwidth quota
| 場景 | bandwidth quota 是否擋住 |
|------|----------------------|
| Normal user 1 100MB NEF | 不擋10 NEF/hr 內合法|
| Attacker 20 req/min size | 10-11 NEF 429 `bandwidth_quota_exceeded` |
| Attacker 每小時剛好 1 GB 迴避 hourly | 6 hr daily quota 觸發 |
### 9.5 設計:複用 + 新增 middleware
**Req-based rate limit複用既有 factory**
- 沿用 `src/middleware/perClientRateLimit.js` `createPerClientRateLimiter` factory
- 建立 **2 個獨立 limiter instance**burst + sustained都用 `token_fingerprint` bucket key**需注入新 `keyGenerator`** factory `req.auth.clientId`
- 不需改 factory 介面只需擴充 `keyGenerator` opts 注入點
**Bandwidth quota新增 middleware**
- 新檔 `src/middleware/resultBandwidthQuota.js`不複用 perClientRateLimit語意不同
- In-memory counterPhase 1 / Phase B instance 部署Map / 物件即可
- pre-check + post-stream incr 雙階段 §9.7 實作骨幹
- Phase 2 instance 部署前必須切 Redis候補 #8
### 9.6 Status code + response
**Req-based limit hit**
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 30
RateLimit-Limit: 20
RateLimit-Remaining: 0
RateLimit-Reset: 1700000000
Content-Type: application/json
```
**Bandwidth quota hit**
```http
HTTP/1.1 429 Too Many Requests
Retry-After: 3600
Content-Type: application/json
```
**為什麼 429 不是 503**
- 429RFC 6585= client request rate / quota 超標client 應降速 + 指數退避
- 503 = server 暫時不可用client retry-as-is不應降速
- 兩者語意不同visionA 端的 retry 邏輯必須依此 code 區分
### 9.7 Wire 順序 + 實作骨幹
```javascript
// src/routes/v1/index.js
const resultBurstLimiter = createPerClientRateLimiter({
windowMs: 10 * 1000, // 10 sec
max: 5, // 5 req / 10s
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown', // ← 新 keyGen
errorDetails: { limit_type: 'burst' },
});
const resultSustainedLimiter = createPerClientRateLimiter({
windowMs: 60 * 1000, // 1 min
max: 20, // 20 req / min
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown',
errorDetails: { limit_type: 'sustained' },
});
const resultBandwidthQuota = createResultBandwidthQuota({
hourlyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES) || 1 * 1024 * 1024 * 1024,
dailyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES) || 6 * 1024 * 1024 * 1024,
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown',
});
router.use('/jobs/:id/result',
requireApiKey(), // 1. auth 先過、未驗證 401
resultBurstLimiter, // 2. burst tier
resultSustainedLimiter, // 3. sustained tier
resultBandwidthQuota, // 4. bandwidth pre-check + post-stream incr
resultStreamSemaphore, // 5. concurrent stream cap見 §15
createResultRouter({ ... }));
```
**順序原則**auth 在最前避免未驗證流量耗 quota slotreq-based bandwidth 之前req limit bandwidth pre-check 廉價)。
### 9.8 Bandwidth quota 實作骨幹
```javascript
// src/middleware/resultBandwidthQuota.js新檔
function createResultBandwidthQuota({ hourlyLimitBytes, dailyLimitBytes, keyGenerator }) {
// In-memory counterPhase 2 切 Redis
// 結構Map<key, { hourlyBytes, hourlyResetAt, dailyBytes, dailyResetAt }>
const counters = new Map();
function getOrCreate(key) {
const now = Date.now();
let c = counters.get(key);
if (!c) {
c = { hourlyBytes: 0, hourlyResetAt: now + 3600_000,
dailyBytes: 0, dailyResetAt: now + 86_400_000 };
counters.set(key, c);
}
// window reset
if (now >= c.hourlyResetAt) { c.hourlyBytes = 0; c.hourlyResetAt = now + 3600_000; }
if (now >= c.dailyResetAt) { c.dailyBytes = 0; c.dailyResetAt = now + 86_400_000; }
return c;
}
return function middleware(req, res, next) {
const key = keyGenerator(req);
const c = getOrCreate(key);
// Pre-check用 Content-Length 估算(從 MinIO HEAD 拿、塞 req.estimatedSize
// 若 pre-check 不確定 size、conservatively 用最大 NEF size如 500MB
// 注意:實際 quota 觸發在 stream 結束時 incr、pre-check 用於避免「一口氣下載超 quota」
const estSize = req.estimatedResultSize || 0;
if (c.hourlyBytes + estSize > hourlyLimitBytes) {
const retryAfterSec = Math.ceil((c.hourlyResetAt - Date.now()) / 1000);
logAudit({ level: 'WARN', action: 'result.bandwidth_quota_exceeded',
limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec,
/* + A.7 五欄 + /result 四欄 */ });
res.setHeader('Retry-After', retryAfterSec);
return next(new ApiError(429, 'bandwidth_quota_exceeded',
'下載額度已用完,請稍後再試', { limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec }));
}
if (c.dailyBytes + estSize > dailyLimitBytes) {
const retryAfterSec = Math.ceil((c.dailyResetAt - Date.now()) / 1000);
// 同上、limit_type: 'bandwidth_daily'
// ...
}
// 在 res.on('finish' / 'close') 累計實際 stream 過的 bytes
res.once('finish', () => {
const bytesStreamed = res._bytesStreamed || 0; // handler 在 stream.on('data') 累計
c.hourlyBytes += bytesStreamed;
c.dailyBytes += bytesStreamed;
});
next();
};
}
```
**為什麼 pre-check + post-stream 雙階段**
- Pre-check 一次性過量」:若已用 950MB再來 200MB request 直接拒不浪費頻寬
- Post-stream incr 才是 ground truth實際 stream 過的 bytes含中斷 partial才算
- 兩階段組合在 worst caseattacker 同時打多個剛好不過 pre-check最多多放 N × max_sizeN = concurrent stream cap = 10、見 §15
### 9.9 Multi-instance 部署的限制
當前 in-memory storeper-process counter)。Phase 1 / 0.8b 部署是單 instance可接受
**Phase 2 多 instance 部署前必做**已升 HIGH security.md 候補 #8
- Redis storeperClientRateLimit factory 已有 `opts.store` 注入點bandwidth quota Redis `INCRBY` + `EXPIRE` counter
- 不然 quota 會被乘以 instance 放鬆
- 2 instance × 20 req/min = 40 req/min 實際 quota
- 2 instance × 1 GB/hr = 2 GB/hr 實際 bandwidth quota
- 同時影響 burst / sustained / hourly / daily 四軸不可只切其一
**Redis 切換時的 audit log**切換期間應記錄 `service.rate_limit_store_switched` 事件 from / to / timestamp)、forensic
### 9.10 與 Q4 audit log 的關係
每次限流命中都必寫 audit log §11 事件清單
- `result.rate_limited` `limit_type: 'burst' | 'sustained'``token_fingerprint``retry_after_seconds`
- `result.bandwidth_quota_exceeded` `limit_type: 'bandwidth_hourly' | 'bandwidth_daily'``token_fingerprint`累計 bytesretry_after_seconds
forensic 用途cluster fingerprint 的限流命中 識別 attack pattern / abuser key
---
## 10. Range Header / Partial Download 防護(**Security 2026-05-17 review 加強**
> **Security 必補三件事**review §1 Q2 / Major M3
> 1. Server **必須**在 response 加 `Accept-Ranges: none` header明示不支援、不是省略
> 2. 收到 Range header 時 server **silently ignore + 回 200 整段**(不回 416、不回 206
> 3. 收到 Range header 時 **必寫 audit log `result.range_attempted`**forensic 用、INFO level
### 10.1 攻擊面分析
HTTP `Range` requestRFC 7233 client 拿檔案的特定 byte range `/result` 這類大檔 streaming endpoint 是已知 attack vector
| 攻擊向量 | 描述 | 對本系統 risk |
|---------|------|--------------|
| **Existence probing** | Attacker `Range: bytes=0-0` 探測檔案存在取極小 byte 確認 200 vs 410/404 差異 | 即使有 410 / 404 區分attacker 已能 enumerate jobID Phase 0.8b 已接受拿到 key = 可下載任意 jobID」(security.md §Trust Boundary)、existence probing 的邊際 risk 接近 0 |
| **Range request DoS** | Attacker 發送大量小 Range request每個 1 byte)、每個都觸發 MinIO read overhead放大 server load | 有風險 §9 rate limit 60 req/min 上限了單一 client burst |
| **Overlapping range exhaustion** | Multiple ranges in single request`Range: bytes=0-100, 200-300, ...`)、parser 處理多 range merge 邏輯有 CVE Apache CVE-2011-3192 / Nginx CVE-2022-41741 | 若實作 Range 必須謹慎處理 multipart/byteranges response增加 attack surface |
| **Slow Range pattern** | Attacker 故意以慢速 Range 連線長時間占用 MinIO connection pool | TCP + Node Stream backpressure mitigate Range 多連線會放大 |
### 10.2 設計選擇
| 方案 | 描述 | 評估 |
|------|------|------|
| **A. 不支援 Range推薦** | 收到 Range header **silently ignore** 200 + 完整 stream | 簡單attack surface 最小 |
| B. 支援 Range + 防護 | 實作 single-range 解析reject multi-range chunk size minimumrate limit Range count | visionA 沒明確需求額外 ~200 code + test增加維護成本 |
| C. 支援 Range + 明確拒絕 multi-range | 收到 multi-range 416 Range Not Satisfiable | 部分 mitigation仍要實作 single-range parser |
### 10.3 推薦:方案 A不支援 Range
**理由**
1. **visionA 端不需要 Range**
- `docs/autoflow/04-architecture/conversion.md` v0.6.1 §2.3 ConverterClient.GetResult 為一次性 download不分段
- visionA backend 拿到 NEF 後立即 stream browser沒有 resume / seek 需求
2. **NEF size 落在 single-request stream 合理範圍**常見 < 100MB極端 < 500MBNode Stream + HTTP/1.1 chunked encoding 可穩定處理
3. **既有 `minio.getObjectStream` 預期回完整 stream**實作 Range 需要傳 byteRangeStart / byteRangeEnd MinIO client增加 API surface
4. **Attack surface 最小**不解析 Range header不需處理 multipart/byteranges response不需 CVE-history-aware parser
### 10.4 實作細節
**收到 Range header 時的行為**
```javascript
// src/routes/v1/result.js
router.get('/', async (req, res, next) => {
// ... 既有 1-5 步:拿 job / 檢查 status / expires / nefKey / MinIO stream
// 6. 設 headers
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`);
// 重要:明確不支援 Range request
//
// 設計Range header 收到時 silently ignore、回 200 + 完整 stream
// 不回 416避免 attacker 透過 416 (有 Range support) vs 200 (沒有) 差異探測
// 不設 'Accept-Ranges: bytes':避免暗示 client 可 retry with Range
res.setHeader('Accept-Ranges', 'none'); // RFC 7233 §2.3 明確標示 server 不支援
// 7. Stream pipe既有
result.stream.pipe(res);
});
```
**為什麼不回 416**
- 416 (Range Not Satisfiable) Range syntactically valid 但檔案範圍不符
- 如果 client Range header我們回 416attacker 知道 server **能 parse Range**只是這次拒絕
- 如果 client Range header我們 silently ignore + 200 完整 streamattacker 看不到 server 是否懂 Range
- 後者更安全feature detection 失敗+ well-behaved client 完全相容 Range 也能正常處理 200
**為什麼設 `Accept-Ranges: none` 而非省略此 header**
- RFC 7233 §2.3server **明示**支援狀態
- `Accept-Ranges: none` 等於明確告知 client不要試 Range
- 省略 header client 仍可能 speculatively RangeHTTP 預設假設可能支援
### 10.5 對 visionA 端的契約變動
**API spec 加註**
- visionA 端發 Range header 不會得到 206 Partial Contentserver 永遠回 200 完整 stream
- visionA 未來真有 resume / seek 需求需重新評估Phase 2 候補
**文件化到 §2.2 Query / Body**:「Range header 收到時 ignored
### 10.6 監控建議 + 必寫 audit event
收到 Range header handler **必須**寫獨立 audit event `result.range_attempted`不是只在 `result.requested` boolean flag
```javascript
// handler 進入後、處理 Range 之前:
if (req.headers && req.headers.range) {
logAudit({
level: 'INFO', // 不是 WARN——預期 attacker 會 probe、是 forensic baseline、不該觸發告警
action: 'result.range_attempted',
// A.7 五欄
source_ip: req.ip,
token_fingerprint: req.auth?.tokenFingerprint,
request_id: req.requestId,
http_method: 'GET',
http_path: req.originalUrl,
// /result 特有
job_id: req.params.id,
// event 特有
range_header_received: String(req.headers.range).slice(0, 100), // sanitize 截短 100 字、避免 log injection
});
}
```
**為什麼 INFO 而不是 WARN**
- Range header 本身**不是 attack**HTTP/1.1 standard許多 client 自動發
- Range header 出現在 `/result` **anomalous signal**visionA 端不會發)→ 值得記錄不該告警
- WARN 留給真正異常」(rate limited / stream timeout / minio failed
**正常 vs 異常 pattern**
- 正常情境`range_header_received` 欄位幾乎不出現visionA 不發 Range
- 異常 pattern突然出現大量 `result.range_attempted` token_fingerprint 可能有 attacker 試探 Range support / 試探不同 byte range
**Anomaly detection 候選**Phase 2
- alert fingerprint 1 小時內 > 10 次 `result.range_attempted` → 觸發人工 review
- alert同 fingerprint 短時間內試多個不同 Range value → 觸發 forensic snapshot
**注意:原 `result.requested` 已被新事件清單取代**(見 §11、改為 `result.streamed` / `result.stream_error` 等明確終態事件、不再有 `result.requested` 進入事件)。
---
## 11. Audit LogPhase B 沿用 A.7 pattern**Security 2026-05-17 review 後擴充**
> **Security 必補**review §1 Q4 / Minor m3
> - 補 3 個事件:`result.rate_limited`、`result.range_attempted`、`result.stream_timeout`、`result.bandwidth_quota_exceeded`(實作層 4 個)
> - 所有 `result.*` 事件**強制**含 A.7 五欄位 + /result 特有四欄位
> - 100% 寫(不 sample—— 流量低P95 < 1000 req/day、bandwidth quota forensic 需要完整資料
### 11.1 設計原則
對齊 `apiKeyMiddleware.js` A.7 audit log pattern
- 結構化 JSONstdout
- 統一用 `console.log` / `console.error`(與既有 audit log infra 一致)
- token 內容絕不寫fingerprint 由 `requireApiKey` middleware 處理、handler 從 `req.auth.tokenFingerprint` 讀取後寫入每個 audit event
- **每個事件必含 A.7 五欄位**(不可省略)+ **每個 `/result` 事件必含 4 個 endpoint-特有欄位**
### 11.2 A.7 五欄位(所有事件必含)
| 欄位 | 來源 | 為什麼必含 |
|------|------|---------|
| `source_ip` | `req.ip`trust proxy 已設定)| forensiccluster attacker IP |
| `token_fingerprint` | `req.auth.tokenFingerprint`A.7 已實作 SHA-256 | forensiccluster 同 key 攻擊 + rate limit / bandwidth quota bucket key 對齊 |
| `request_id` | `req.requestId`middleware 設定)| cross-event 追蹤(串 `auth.api_key.authenticated``result.*` |
| `http_method` | `'GET'`(固定)| A.7 對齊即使固定值也寫、log analysis 一致性)|
| `http_path` | `req.originalUrl` | A.7 對齊、forensic 確認 endpoint |
### 11.3 `/result` 特有四欄位(按事件類型必含或可選)
| 欄位 | 何時必含 | 何時可選 / 不適用 |
|------|--------|----------------|
| `job_id` | 所有事件(從 `req.params.id` 取)| — |
| `size_bytes` | `result.streamed`(成功)、`result.stream_error``result.client_closed``result.stream_timeout`(已 stream 多少)| 4xx 終態事件不適用(還沒開始 stream|
| `duration_ms` | 所有「終態」事件streamed / stream_error / client_closed / stream_timeout / not_*| `range_attempted` 不適用(不是終態)|
| `stream_completed` | `result.streamed`true`result.stream_error`false`result.client_closed`false`result.stream_timeout`false| 4xx 終態事件不適用 |
### 11.4 事件清單11 個事件、實作覆蓋 Security review Q4 + Architect 原設計)
| Action | Level | 觸發時機 | 必含欄位A.7 五欄 + /result 四欄之外) |
|--------|-------|---------|--------------------------------|
| `result.streamed` | INFO | Stream 完整送出(`stream.on('end')` 且 bytes = content_length| `content_length` |
| `result.stream_error` | ERROR | Stream 中途出錯MinIO disconnect / network| `error_type``error_message`(截短 100 chars |
| `result.client_closed` | INFO | Client 主動斷線(`req.on('close')` + bytes < content_length| |
| `result.stream_timeout` | **WARN** | response stream 5min timeout 觸發**Security 必補**| `timeout_ms``bytes_streamed_at_timeout` |
| `result.not_found` | WARN | 404 `job_not_found` / `result_not_found` | `reason: 'job_not_found' \| 'no_nef_key'` |
| `result.not_completed` | WARN | 409 `job_not_completed` | `current_status` |
| `result.expired` | WARN | 410 `result_expired` | `expires_at``expired_by_ms`now - expires_at|
| `result.storage_unavailable` | ERROR | 502 `storage_unavailable`MinIO 連不上 / throw| `error_name``error_code`**** MinIO endpoint URL|
| `result.rate_limited` | **WARN** | 429 rate limit hit**Security 必補**| `limit_type: 'burst' \| 'sustained'``retry_after_seconds` |
| `result.bandwidth_quota_exceeded` | **WARN** | 429 bandwidth quota hit**Security 必補**| `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'``bytes_used_in_window``retry_after_seconds` |
| `result.range_attempted` | **INFO** | Request Range header**Security 必補**、forensic baseline| `range_header_received`sanitize 截短 100 |
| `result.filename_assertion_failed` | ERROR | `buildFilename` assertion 失敗**defense-in-depth**、 §13| `expected_pattern``actual_filename` sanitize 截短 |
**為什麼移除 `result.requested`**
- 原設計用 `result.requested` 作為進入事件」+ `range_header_present: boolean` 表達 Range 偵測
- 新設計
- `result.range_attempted` 變成獨立的 forensic event更清楚的 anomaly signal
- `auth.api_key.authenticated`A.7 已寫已涵蓋caller 進來的紀錄`result.requested` 冗餘
- 每個 request 必有一個**終態事件**streamed / stream_error / client_closed / stream_timeout / not_* / rate_limited / bandwidth_quota_exceeded / range_attempted 之一)、 request_id 串接到 `auth.api_key.authenticated` 即可完整追蹤
- 減少 log volume request 1 個終態 vs 2 個進入+終態
### 11.5 `error_type` 分類stream 中斷)
| `error_type` | 觸發 | Level 例外 |
|-------------|------|-----------|
| `minio_disconnect` | MinIO stream emit error / socket reset | |
| `client_abort` | Client 端先斷 `client_closed` 區分client_closed req closestream_error stream emit error | |
| `network` | 其他 network 層錯誤DNS / TLS | |
| `partial_stream` | `streamCompleted=false` `res.on('finish')` 觸發的 race condition最可能是 `res.destroy()` underlying socket flush buffered chunk emit `finish`client 中斷下載 / network slow drain 的衍生情境)、 backpressure 異常 | **INFO**覆寫 §11.4 ERROR client-side expected behaviour不是 attack signal|
| `unknown` | 兜底 | |
**Level 例外處理規則**:§11.4 `result.stream_error` 預設為 ERROR `error_type = partial_stream` race condition expected client behaviour server 異常非攻擊訊號)、降為 INFO 以避免污染 ERROR alert pipeline實作位置`apps/task-scheduler/src/routes/v1/result.js``res.on('finish')` handler )。
### 11.6 audit log 範例
**Happy path成功 stream**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"auth.api_key.authenticated","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result"}
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:48Z","level":"INFO","action":"result.streamed","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":52428800,"duration_ms":3210,"stream_completed":true,"content_length":52428800}
```
**Rate limit hit**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.rate_limited","request_id":"def-456","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":2,"limit_type":"burst","retry_after_seconds":10}
```
**Bandwidth quota hit**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.bandwidth_quota_exceeded","request_id":"ghi-789","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":3,"limit_type":"bandwidth_hourly","bytes_used_in_window":1073741824,"retry_after_seconds":2847}
```
**Range probing**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"result.range_attempted","request_id":"jkl-012","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","range_header_received":"bytes=0-7"}
```
**Stream timeout**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.stream_timeout","request_id":"mno-345","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":1024,"duration_ms":300000,"stream_completed":false,"timeout_ms":300000,"bytes_streamed_at_timeout":1024}
```
**Expired**
```json
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.expired","request_id":"pqr-678","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":15,"expires_at":"2026-05-10T00:00:00Z","expired_by_ms":604800000}
```
### 11.7 不寫 log 的事
對齊 A.7 + `auth.md` §1.8 原則
- NEF binary 內容任何 byte
- Token 原文fingerprint 已由 `requireApiKey` middleware 處理
- 完整 MinIO endpoint URL避免 infra topology leak
- 完整 `Authorization` header value
- Stack trace截短 message 即可
- Range header 原文超過 100 截短 + `...`
### 11.8 Sample 策略
**全 100% 寫、不 sample**
- 流量低P95 < 1000 req/dayPhase B 估算
- bandwidth quota forensic 需要完整資料任何 byte 都進 quota counterlog 漏寫 = forensic
- anomalous eventsrate_limited / range_attempted / stream_timeout / bandwidth_quota_exceeded一律 100% cluster
**未來如流量上升到 > 100k/day**Phase 2 candidate
- 考慮 sample `result.streamed` 10%
- **保留 100% 4xx/5xx + 所有 anomalous events**
### 11.9 與 forensic 的關係
**cross-event 追蹤** `request_id` 串接 `auth.api_key.authenticated`middleware 已寫)→ `result.*`handler 寫終態兩個事件
**新增 cross-fingerprint 追蹤** `token_fingerprint` cluster caller 的所有事件
- 1 fingerprint + 大量 `result.rate_limited` identify abuser / mis-configured caller
- 1 fingerprint + 大量 `result.range_attempted` identify Range probing attempt
- 1 fingerprint + 大量 `result.bandwidth_quota_exceeded` identify mass-download attempt
- 1 fingerprint + 大量 `result.stream_timeout` identify slowloris attack
---
## 12. 404 vs 410 區分的 Security Trade-off
### 12.1 問題
§4.1 規定 4 找不到情境
| HTTP | error.code | 情境 |
|------|-----------|------|
| 404 | `job_not_found` | jobID 不存在 |
| 404 | `result_not_found` | job 完成但無 NEF |
| 410 | `result_expired` | NEF 已過期清除 |
**Security 觀點的疑慮**區分 404 vs 410 attacker 能偵測jobID 是否曾存在」:
- jobID X 404 `job_not_found` X **從未存在**
- jobID Y 410 `result_expired` Y **曾存在、但 NEF 已過期**
Attacker 可枚舉 jobID 空間區分unusedvsused-but-expired」、收集 victim activity pattern
### 12.2 評估:對本系統 risk 接近 0
**前提條件**Phase 0.8b 已接受 `security.md §Trust Boundary` 風險模型——**attacker 拿到 CONVERTER_API_KEY = 可下載任意 jobID NEF**per-job auth Phase 2 候補 #12)。
在這個前提下
| Attacker 能力 | 區分 404/410 帶來的 marginal risk |
|-------------|-------------------------------|
| 拿到 key知道某個有效 jobID | 直接下載 NEF404/410 區分不增加能力 |
| 拿到 key不知道有效 jobID | jobID UUIDv4128 bits)、暴力枚舉不可行即使區分 404/410 也救不了 attacker |
| 拿到 key + 某個 leaked / guessable jobID 列表 | 區分 404/410 確實讓 attacker 知道哪些 ID 曾存在**但他已能直接下載**知道過期或不過期的價值極低 |
**區分 404/410 的 marginal risk 在當前 trust model 下接近 0**
### 12.3 Trade-off 的另一面UX / debug 價值
保持 404 / 410 區分的好處
- visionA client 可區分user 給錯 jobID」(404提示 user 重新確認vsjob 已過期」(410提示 user 重新轉檔)—— UX 訊息精度差異大
- TODO-v2 §4.1 已寫定visionA backend ConverterClient.GetResult 已實作對應 mapping**改回統一 404 break visionA 端契約**
- debug 友善log 410 vs 404 能立即知道是 lifecycle 清除還是錯誤 jobID
### 12.4 決策:**保持 TODO-v2 §4.1 規格不變**404 / 410 分開)
**理由**
1. 在當前 trust model 區分帶來的 risk marginal 攻擊面已被 §Trust Boundary 接受
2. UX / debug 價值大 visionA 端契約已固定
3. Phase 2 候補 #12per-job auth才是根本解補了 per-job auth attacker 無法下載非自己的 job404/410 區分問題自動消失
### 12.5 同步 Phase 2 候補
`security.md` Phase 2 候補 #12 `/result` per-job authorization **MEDIUM** 優先級A.7 follow-up §4 已升級)。本決策在 #12 完成前**有效**完成後可考慮統一錯誤回應為 404移除 jobID enumeration 攻擊面 #12 形成 defense-in-depth)。
### 12.6 文件化的事
- TODO-v2 §4.1 規格**不變**
- trade-off 寫入 `security.md` 變更歷史Architect 下次更新 security.md 時補一行 entry
- audit log(§11 `result.not_found` / `result.expired` 區分依此規格
---
## 13. `source_filename` 寫入點調查 + Backend Acceptance Criteria
### 13.1 調查結果
`buildFilename(job)`(§3.2 `job.source_filename`。**Grep 結論**
```bash
grep -rn "source_filename" apps/task-scheduler/src
# 結果0 命中
```
**現況**`src/routes/v1/jobs.js` createJob handler `jobRecord` **完全沒寫** `source_filename` 欄位line 721-774 jobRecord 構造處)。Worker / Web UI legacy / API v1 全鏈條都沒有寫入點
**已寫入的相關欄位**
- `jobRecord.input.filename`line 741)— **已 sanitized** `safeFilename` `model.onnx` `model.onnx`特殊字元 stripped
- `safeModelFilename` 來自 `sanitizeFilename(modelFile.originalname || 'model')`validators/createJob.js:240
### 13.2 設計選擇
| 方案 | 描述 | 評估 |
|------|------|------|
| **A. Backend 在 createJob handler 加 `source_filename`** | jobRecord 新增 `source_filename: modelFile.originalname \|\| null`**原始未 sanitize**| §3.2 sourceFilename sanitized 不再二次 sanitize說法衝突存原始 originalname XSS / log injection 風險 |
| **B. Backend 寫 sanitized 的 stem**推薦 | jobRecord 新增 `source_filename: input.safeFilename` `input.filename` 同值是已 sanitized 的安全字串| 對齊 §3.2 假設無安全風險冗餘但語意清楚 |
| C. `buildFilename` 改讀 `job.input.filename` | 直接用既有欄位不新增 `source_filename`| 最少改動缺點`input.filename` input file safe name」、語意不是source filename for output」、未來如改用其他來源 metadata會散在多處 |
| D. Backend `buildFilename` 退化 fallback | `job.source_filename` 改讀 `job.input.filename`都缺才 fallback `job_<jobId>.nef`| 容錯性好但隱式相依造成除錯困難 |
### 13.3 推薦:方案 B + 容錯保留 D
**Backend B1 任務 acceptance criteria**
#### B1.1 createJob handler 補寫 `source_filename`
**檔案**`apps/task-scheduler/src/routes/v1/jobs.js` line 721-774jobRecord 構造
**改動** `input` 物件**之上**jobRecord 頂層新增 `source_filename`
```javascript
const jobRecord = {
job_id: jobId,
status: 'ONNX',
// ... 既有欄位
// Phase B 新增:給 GET /jobs/:id/result 構造 download filename 用
// 來源是已 sanitized 的 safeFilename與 input.filename 同值;冗餘但語意清楚)
// 為什麼不存原始 modelFile.originalname
// - originalname 可能含 XSS / 控制字元 / path traversal pattern
// - 即使 Content-Disposition header 不會被 browser render仍可能在 log / error message 處被 echo
// - sanitized 版本是 defense-in-depth
source_filename: input.safeFilename,
input: {
filename: input.safeFilename,
// ... 既有
},
// ... 其他既有欄位
};
```
#### B1.2 acceptance criteria checklist
- [ ] `jobRecord.source_filename` 寫入點存在line ~740 附近`input.safeFilename` 取得後
- [ ] 寫入的值**必須** sanitized 字串`input.safeFilename`不是 `modelFile.originalname`
- [ ] 寫入點在 `claimActiveAndCreate` 之前jobRecord 構造階段不要事後 update
- [ ] 既有 job `source_filename` 欄位讀取時 `buildFilename` fallback 仍可運作(§3.2 fallback 邏輯
- [ ] Unit test cover
- `source_filename` 寫入後讀取happy path
- `source_filename` 為空字串時 `buildFilename` fallback
- `source_filename` undefined `buildFilename` fallback向後相容既有 job
#### B1.3 `buildFilename` 容錯邏輯確認
§3.2 fallback 邏輯保留
```javascript
function buildFilename(job) {
const sourceFilename = job.source_filename || '';
const platform = (job.platform || '').toLowerCase();
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
if (stem && platform) {
return `${stem}_${platform}.nef`;
}
return `job_${job.job_id || 'unknown'}.nef`;
}
```
**向後相容**既有 job `source_filename` fallback `job_<jobId>.nef`不會 crash不會洩漏 jobID 以外資訊)。
#### B1.4 platform 欄位調查(同步處理)
`buildFilename` 也讀 `job.platform`Grep `apps/task-scheduler/src/`
```bash
grep -rn "platform" apps/task-scheduler/src/routes/v1/jobs.js | grep -v "//"
```
**Backend B1 需驗證**`job.platform` createJob handler 寫入點存在透過 `parameters.platform` 或頂層 `job.platform`)。本次 Architect 不在 grep range 內逐行驗證留給 Backend B1 任務內順手確認若也缺補同樣 acceptance criteria
### 13.4 安全考量
- **不存原始 `originalname`**原始檔名可能含 XSS payload / 控制字元 / RTL override / 超長字串
- **Sanitized 版本已 enforce**`safeFilename` 經過 `sanitizeFilename`白名單字元截長 200leading-dot removal `security.md §Input Validation`
- **Content-Disposition header 注入**filename 寫進 `Content-Disposition: attachment; filename="..."`若含未跳脫的 `"` 可能 break headersanitized 版本已禁止 `"` / `\`安全
### 13.4a Defense-in-DepthContent-Disposition Header Construction**Security 2026-05-17 review 必補**
> **Security 必補**review §1 Q6 / Minor m1即使 `sanitizeFilename` 已堵 `"` / `\`、`Content-Disposition` header 仍須**明確** quote-escape + RFC 5987 fallback + buildFilename assertion。這是 defense-in-depth防後續 sanitize 升級時意外引入 bug。
#### (1) Belt-and-suspenders quote-escape
即使 sanitize 應已堵 quotes / backslash`Content-Disposition` 寫入時**仍須明確 escape**
```javascript
// 在 setHeader 之前:
const filename = buildFilename(job);
const escapedFilename = filename.replace(/[\\"]/g, '\\$&'); // 雙重轉義 \ 和 "
```
**為什麼需要**
- Sanitize setHeader 在不同檔案未來 sanitize 升級時可能放寬允許某些字元如允許中文)、若沒在 setHeader escapeheader 注入風險回流
- Defense-in-depth 原則每一層都應該負責自己邊界的安全不依賴上游
#### (2) RFC 5987 `filename*` fallback為未來 unicode 支援預留)
Phase B 階段 `sanitizeFilename` 限制 ASCII alnum + `._-`不會有 non-ASCII header construction **預留 `filename*` extended syntax hook**
```javascript
res.setHeader('Content-Disposition',
`attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`);
```
**為什麼預留**
- Phase 2 若放寬 sanitize 允許 unicode如中文檔名 `模型_kl720.nef`)、必須補 `filename*` parameterRFC 5987才能讓 browser 正確顯示
- 預留 hook zero-costPhase 2 unicode 開放時不需改 header construction code
- 對當前 ASCII-only 場景無副作用`filename` `filename*` 同值browser 優先用 `filename*`如支援)、否則 fallback `filename`
#### (3) buildFilename assertion防 sanitize 升級時意外 bug
`buildFilename` 結尾加 sanitization re-check assertion
```javascript
function buildFilename(job) {
const sourceFilename = job.source_filename || '';
const platform = (job.platform || '').toLowerCase();
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
const candidate = (stem && platform)
? `${stem}_${platform}.nef`
: `job_${job.job_id || 'unknown'}.nef`;
// Defense-in-depth assertion
// 確保 buildFilename 結果仍符合白名單catch sanitize 升級時意外引入的 bug
if (!/^[A-Za-z0-9._-]+$/.test(candidate)) {
// 不該發生fail-securelog + fallback 到絕對安全的 jobID-only 命名
logAudit({
level: 'ERROR',
action: 'result.filename_assertion_failed',
// A.7 五欄 + /result 四欄
// ...
expected_pattern: '^[A-Za-z0-9._-]+$',
actual_filename: candidate.slice(0, 100), // 截短 100 字、避免 log injection
});
return `job_${job.job_id || 'unknown'}.nef`; // 絕對安全 fallbackUUIDv4 保證符合白名單)
}
return candidate;
}
```
**為什麼 fail-secure 而非 throw**
- `/result` 必須回給 visionA NEFassertion 失敗時 throw 會導致整個 request 502影響合法用戶
- Fallback `job_<jobId>.nef`UUIDv4 保證符合白名單 stream 仍能完成 audit log 記錄異常
- 異常頻率應為 0assertion 觸發即代表上游 sanitize bug需立即修復audit log 即告警入口
#### (4) Backend assertion test 必補
Backend 必補 unit test
```javascript
describe('buildFilename assertion', () => {
it('returns sanitized result when input is clean', () => {
expect(buildFilename({ source_filename: 'model.onnx', platform: 'kl720', job_id: 'uuid' }))
.toBe('model_kl720.nef');
});
it('falls back to job_<id>.nef when source_filename has invalid chars', () => {
// 模擬上游 sanitize bug傳入含 `"` 的 stem不該發生、但 defense-in-depth
const result = buildFilename({ source_filename: 'evil".onnx', platform: 'kl720', job_id: 'safe-uuid' });
expect(result).toBe('job_safe-uuid.nef');
// + assert audit log 含 result.filename_assertion_failed
});
it('falls back when buildFilename result somehow contains invalid char', () => {
// hypotheticalplatform 含非法字元(不該發生)
const result = buildFilename({ source_filename: 'model', platform: 'kl 720', job_id: 'safe-uuid' });
expect(result).toBe('job_safe-uuid.nef');
});
});
```
#### (5) `Accept-Ranges: none` header 同步設定
§10 已規範 Content-Disposition 一起在 §6.2 stream pipe setHeader
```javascript
// 在 stream.pipe(res) 之前:
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
res.setHeader('Content-Disposition',
`attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`);
res.setHeader('Accept-Ranges', 'none'); // 明示不支援 Range
```
### 13.5 對既有 job 的影響
`source_filename` 是新增欄位、**不需要 migration**
- 既有 jobPhase A 前建立`source_filename === undefined` `buildFilename` fallback `job_<jobId>.nef`
- Phase B 後建立的 job 都有 `source_filename` 正常 stem-based filename
- 兩種 job 同時存在的過渡期 ~7 既有 job 過期清掉後完全消失
---
## 15. Streaming Resource Limits**Security 2026-05-17 review 必補**
> **Security 必補**review §1 Q1 / Major M1 + M2streaming endpoint 特有的攻擊面 mitigation。Phase B 啟動必做、不可延後。
### 15.1 攻擊面與限制總表
| 攻擊面 | 限制 | 預設值 | 觸發行為 |
|--------|------|-------|---------|
| **Slowloris慢讀霸佔 connection** | Stream response timeout | **5 分鐘**300_000 ms | destroy res + destroy stream + log `result.stream_timeout` |
| **Connection exhaustion同時開大量 stream** | Concurrent stream capper-instance | **10 同時 stream** | 503 `service_busy` + `Retry-After: 30` + log `result.rate_limited`limit_type: concurrent|
| **Bandwidth abuse** | §9 bandwidth quota| 1 GB/hr + 6 GB/24hr | 429 `bandwidth_quota_exceeded` §9|
| **Disk I/O DoS** | concurrent stream cap 順帶 mitigate| | |
### 15.2 Stream Response TimeoutM1
#### 為什麼必補
Node http server 預設
- `server.timeout = 0`**無上限**
- `server.headersTimeout = 60_000ms`只管 header 接收
- `server.requestTimeout = 300_000ms`Node 18+、 request 完成
- **沒有 response write timeout**——attacker 連線後超慢讀1 byte/30s可霸佔 Node socket + MinIO upstream connection 數小時
#### 設計
`res.setTimeout(STREAM_TIMEOUT_MS)`預設 5 分鐘300_000 ms)。
**5 分鐘 rationale**量化
- NEF 最大 size500 MB合理上限實際 < 100 MB 為主
- 5 min 最低 throughput 容忍500 MB / 5 min = **100 MB/min ≈ 1.7 MB/s ≈ 13.3 Mbps**
- 合法 client 即使在中等網路10 Mbps也能在 5 min 內拿完整個 500MB
- Attacker < 13.3 Mbps = 5 min 內必被 timeout 切斷
- 即使 client 真的有正當原因網路慢如行動網路 3G)、5 min 上限仍足夠應付實際使用
**可由 env 覆寫**`RESULT_STREAM_TIMEOUT_MS`預設 300_000)。Phase 2 如有 ultra-large NEFGB 支援可調高
#### 實作骨幹
```javascript
// 在 setHeader 之後、stream.pipe(res) 之前:
const STREAM_TIMEOUT_MS = Number(process.env.RESULT_STREAM_TIMEOUT_MS) || 300_000;
let bytesStreamed = 0;
const streamStartAt = Date.now();
// 累計 stream 過的 bytes給 audit log + bandwidth quota incr 用)
result.stream.on('data', (chunk) => {
bytesStreamed += chunk.length;
});
// 設 timeout
res.setTimeout(STREAM_TIMEOUT_MS, () => {
logAudit({
level: 'WARN',
action: 'result.stream_timeout',
// A.7 五欄
source_ip: req.ip,
token_fingerprint: req.auth?.tokenFingerprint,
request_id: req.requestId,
http_method: 'GET',
http_path: req.originalUrl,
// /result 四欄
job_id: jobId,
size_bytes: bytesStreamed,
duration_ms: Date.now() - streamStartAt,
stream_completed: false,
// event 特有
timeout_ms: STREAM_TIMEOUT_MS,
bytes_streamed_at_timeout: bytesStreamed,
});
// 同步呼叫 req.setTimeout 確保兩端都被清理
if (result && result.stream && typeof result.stream.destroy === 'function') {
result.stream.destroy();
}
if (!res.destroyed) res.destroy(new Error('Response stream timeout'));
});
// 另外設 req.setTimeout 同樣值(兩端覆蓋)
req.setTimeout(STREAM_TIMEOUT_MS);
// 然後 stream.pipe(res)
result.stream.pipe(res);
```
### 15.3 Concurrent Stream CapM2
#### 為什麼必補 + 為什麼新寫不複用 `uploadConcurrency.js`
**Express + Node 預設無 per-process connection 上限**
- Node 預設 `server.maxConnections = Infinity`
- Attacker valid key 一次開 1000 `/result` connection瞬間耗光 fd tabletypical 1024-65536+ MinIO upstream connection
**為什麼不複用 `uploadConcurrency.js`**
| `uploadConcurrency.js` | `resultStreamConcurrency.js`新寫|
|----------------------|------------------------------|
| 限制同一 job_id 不能重複 upload」(per-job key| 限制server 整體最多 N stream」(global counter|
| 語意互斥 job 只能一個 upload| 語意容量server 同時最多服務 N download|
| 觸發時 409 conflict | 觸發時 503 service_busy + Retry-After |
兩者**語意不同**、不該複用同一支 middleware但可以**參考實作結構**lock acquire / release`res.once('close')` cleanup)。
#### 設計
| 參數 | | 理由 |
|------|---|------|
| `maxConcurrent` | **10** | 平衡normal loadP95 < 5 同時 stream §9.1 量化 2× headroomblast radiusattacker 10 slow connection 配合 M1 5 min timeout = 最多霸佔 50 min × fd × MinIO connection可控 |
| `retryAfterSeconds` | 30 | retry 間隔attacker 撞牆後 30s 內應已釋放部分 slot|
**Multi-instance scaling**當前單 instance10 是絕對上限Phase 2 instance 時可乘以 instance 10 × N)、或切 Redis 分散式 semaphore 維持全局上限
#### 實作骨幹
```javascript
// src/middleware/resultStreamConcurrency.js新檔
function createResultStreamConcurrencyLimiter({ maxConcurrent, retryAfterSeconds }) {
let activeStreams = 0;
return {
middleware(req, res, next) {
if (activeStreams >= maxConcurrent) {
logAudit({
level: 'WARN',
action: 'result.rate_limited',
// A.7 五欄
source_ip: req.ip,
token_fingerprint: req.auth?.tokenFingerprint,
request_id: req.requestId,
http_method: 'GET',
http_path: req.originalUrl,
// /result 四欄(只有 job_id 適用、其他不適用因為還沒開始 stream
job_id: req.params.id,
// event 特有
limit_type: 'concurrent',
retry_after_seconds: retryAfterSeconds,
active_streams_at_reject: activeStreams,
});
res.setHeader('Retry-After', retryAfterSeconds);
return next(new ApiError(503, 'service_busy',
'伺服器忙碌中,請稍後再試',
{ limit_type: 'concurrent', retry_after_seconds: retryAfterSeconds }));
}
// Acquire slot
activeStreams++;
let released = false;
const release = () => {
if (released) return;
released = true;
activeStreams--;
};
// 在 response finish / close / error 任一情境釋放 slot
res.once('finish', release);
res.once('close', release);
res.once('error', release);
next();
},
// 給 health check / monitoring 暴露 internal state不要直接寫
getActiveCount: () => activeStreams,
};
}
```
**Multi-instance 部署的限制**
- 當前 in-memory counterper-process instance 部署可接受
- Phase 2 instance 部署前必做已升 HIGH security.md 候補 #8 Redis 分散式 semaphore
- 不切N instance × 10 = N×10 實際上限blast radius 放大 N
### 15.4 限制間的協作
四個 mitigation 形成 defense-in-depth
| 攻擊 vector | M1 stream timeout | M2 concurrent cap | §9 rate limit | §9 bandwidth quota |
|-----------|------------------|------------------|--------------|-----------------|
| **Slowloris**10 slow connection × 6 hr| 5 min 切斷每個 | 10 個上限阻擋第 11 | | |
| **Connection exhaustion**1000 connection 不讀| 5 min 切斷 | 11 個立即 503 | | |
| **Mass download**20 req/min × 100MB| | | 20 req/min sustained 上限 | 1 GB/hr 撞牆 |
| **Bandwidth abuse** req 大檔| | | 不擋 | 1 GB/hr 撞牆 |
| **Burst attack**10 req/sec spike| | | 5 req/10s burst 上限 | |
**未列入的攻擊面**
- **HTTP/2 stream multiplexing**Phase 1 仍是 HTTP/1.1暫不阻擋Phase 2 H2 需顯式設 `http2.SETTINGS_MAX_CONCURRENT_STREAMS = 100`per TCP connection
- **Compression bomb**NEF 已是 binaryhelmet 預設不開 gzip on octet-stream確認 nginx / reverse proxy 端也不對 `Content-Type: application/octet-stream` gzip
- **MinIO socketTimeout 對齊**Phase 2 候補 #16新增 security.md
---
## 14. 給 Backend 的 Phase B Acceptance Criteria 總清單(**Security 2026-05-17 review 後重寫**
> **重要變更**:原 B1-B9 acceptance criteria 因 Security review 發現的 4 Major + 3 Minor 必須擴充。新清單採 **AC-1 到 AC-12** 編號(對齊 Security review §3 / 12 條 acceptance criteria是 Backend implementer 的 single source of truth。
> Reviewer 把這份當 checklist缺任一條 → PR 不接受。
### 14.1 Middleware 鏈AC-1 到 AC-4
順序`requireApiKey → resultBurstLimiter → resultSustainedLimiter → resultBandwidthQuota → resultStreamSemaphore → handler`quota / semaphore 必須在 auth 之後避免 unauth 流量擠占 slot
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **AC-1** | `/result` 套用 `requireApiKey()` middleware | §2.3 | wire v1/index.js jobs/promote 一致不通過 401 + 主動 socket.destroy() |
| **AC-2** | `/result` 套用 two-tier rate limit | §9.2、§9.3、§9.5 | **burst**5 req / 10 sec + **sustained**20 req / minbucket key `token_fingerprint`不是 clientId超限 429 `rate_limit_exceeded` + `Retry-After` + audit log `result.rate_limited` `limit_type: 'burst' \| 'sustained'` |
| **AC-3** | `/result` 套用 bandwidth quota | §9.4、§9.8 | **hourly**1 GB / hr + **daily**6 GB / 24hr per `token_fingerprint`in-memory counterPhase 2 Redispre-check + post-stream incr超限 429 `bandwidth_quota_exceeded` + `Retry-After` + audit log `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`累計 bytes|
| **AC-4** | `/result` 套用 concurrent stream cap | §15.3 | `MAX_CONCURRENT_RESULT_STREAMS = 10`env 可覆寫**新寫** `src/middleware/resultStreamConcurrency.js`****複用 `uploadConcurrency.js`語意不同release on `res.once('finish' / 'close' / 'error')`超限 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited` `limit_type: 'concurrent'` |
### 14.2 Range Header 處理AC-5、AC-6
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **AC-5** | Range header silently ignored明示 `Accept-Ranges: none` | §10.4 | response header 必含 `Accept-Ranges: none`不省略不設為 `bytes`不解析 Range不回 416不切片 MinIO request |
| **AC-6** | Range header audit log `result.range_attempted` | §10.6、§11.4 | request Range header 時必寫 audit eventINFO level不是 WARN `range_header_received`sanitize 截短 100 + A.7 五欄 + `job_id` |
### 14.3 Streaming Timeout / Connection 安全AC-7、AC-8
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **AC-7** | Stream response timeout 5 分鐘 | §15.2 | `res.setTimeout(STREAM_TIMEOUT_MS)`預設 300_000 msenv `RESULT_STREAM_TIMEOUT_MS` 可覆寫同步呼叫 `req.setTimeout` 確保兩端覆蓋timeout 觸發 `res.destroy()` + `result.stream.destroy()` + audit log `result.stream_timeout` |
| **AC-8** | Stream 結束 / 中斷 / client close cleanup | §4.4、§6.2 | `stream.on('error')`destroy + log `result.stream_error` `stream_completed: false`bytes < contentLength`req.on('close')`destroy stream + log `result.client_closed` `stream_completed: false``stream.on('end')` bytes = contentLength log `result.streamed` `stream_completed: true` |
### 14.4 Audit Log 完整性AC-9、AC-10
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **AC-9** | 所有 `result.*` event 必含 A.7 五欄 + /result 四欄 | §11.2、§11.3 | **A.7 五欄**`source_ip``token_fingerprint``request_id``http_method``http_path` event 必含**/result 四欄**`job_id`所有 event+ `size_bytes` / `duration_ms` / `stream_completed`按事件類型必含|
| **AC-10** | 12 audit event 全實作 | §11.4 | `result.streamed` / `result.stream_error` / `result.client_closed` / `result.stream_timeout` / `result.not_found` / `result.not_completed` / `result.expired` / `result.storage_unavailable` / `result.rate_limited` / `result.bandwidth_quota_exceeded` / `result.range_attempted` / `result.filename_assertion_failed` **12 個**Security review 必補 `rate_limited` / `bandwidth_quota_exceeded` / `range_attempted` / `stream_timeout` 4 + Architect 原設計 7 + filename assertion 1 |
### 14.5 Filename / Response HeaderAC-11、AC-12
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **AC-11** | `Content-Disposition` defense-in-depth | §13.4a | (1) quote-escape`filename.replace(/[\\"]/g, '\\$&')`(2) RFC 5987 fallback`filename*=UTF-8''${encodeURIComponent(filename)}`(3) buildFilename assertion`/^[A-Za-z0-9._-]+$/.test(candidate)`assertion 失敗 fail-secure fallback `job_<jobId>.nef` + audit log `result.filename_assertion_failed`(4) unit test cover assertion |
| **AC-12** | response 不設 `Accept-Ranges: bytes` | §10.4、§14.2 AC-5 | response header 必須明確 `Accept-Ranges: none`不可省略不可設為 `bytes` AC-5 重複強調|
### 14.6 Sub-acceptance既有 `source_filename` 寫入點B1
從原 B1 保留不算 Security 新增 Backend 仍需做
| # | 項目 | 章節 | Acceptance criteria |
|---|------|------|-------------------|
| **B1** | `source_filename` 寫入 createJob handler | §13.3 | `jobRecord.source_filename = input.safeFilename`line ~740+ unit test cover happy path / fallback |
| **B1.5** | 確認 `job.platform` 寫入點 | §13.3 | Grep 確認缺則補 |
### 14.7 Integration Test 場景(必補 6 個 + 原 happy path test
**Security review §3.6 必補 5 個 + 原 happy path 系列**
| # | 場景 | 對應 AC | 預期行為 |
|---|------|---------|---------|
| **IT-1** | Happy pathcompleted job + NEF + 不過期 | | 200 + 完整 stream + Content-Type/Length/Disposition 正確 + Accept-Ranges: none + audit `result.streamed`stream_completed: true|
| **IT-2** | Rate limit burst快速打 6 req / 10 sec | AC-2 | 6 個回 429 + `limit_type: 'burst'` + `Retry-After` + audit log |
| **IT-3** | Rate limit sustained穩定打 21 req / 1 min | AC-2 | 21 個回 429 + `limit_type: 'sustained'` + audit log |
| **IT-4** | Bandwidth quota hourly累積下載超 1 GB | AC-3 | 超限後回 429 `bandwidth_quota_exceeded` + `limit_type: 'bandwidth_hourly'` + audit log |
| **IT-5** | Range header probingrequest `Range: bytes=0-7` | AC-5AC-6 | 仍回 **200 整段**不是 206不是 416+ `Accept-Ranges: none` + audit log `result.range_attempted` `range_header_received: 'bytes=0-7'` |
| **IT-6** | Stream timeoutmock 慢讀 client 5s 1 byte| AC-7 | 5 min server destroy connection + audit log `result.stream_timeout` `timeout_ms: 300000` |
| **IT-7** | Concurrent stream cap同時打 11 streammock stream| AC-4 | 11 個立即回 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited`concurrent|
| **IT-8** | Audit log forensiccross-event 追蹤完整性 | AC-9AC-10 | 1 `request_id` 串接 `auth.api_key.authenticated` `result.*` 一個終態 eventevent A.7 五欄 + /result 四欄 |
| **IT-9** | filename assertion fallback模擬上游 sanitize bug 傳入含 `"` stem | AC-11 | 200 + filename `job_<jobId>.nef`不是 `evil".onnx_kl720.nef`+ audit log `result.filename_assertion_failed` |
原既有 test保留
- 401missing API key / wrong API key
- 404 `job_not_found`jobID 不存在
- 404 `result_not_found`completed 但沒 NEF
- 409 `job_not_completed`status = ONNX / BIE / NEF / FAILED
- 410 `result_expired`expires_at 在過去 / MinIO `getObjectStream` null
- 502 `storage_unavailable`MinIO throw
- 503 `service_unavailable`CONVERTER_API_KEY 未設定
### 14.8 不在 Phase B scope明確不做
| # | 項目 | 理由 | 對應 Phase 2 候補 |
|---|------|------|------------------|
| | per-job authorization檢查 caller 是否為 job 建立者 | 當前 1:1 trust + client_id 寫死per-job check 仍會通過需先做 per-caller credential | #12MEDIUM|
| | Range request support206 Partial Content| visionA 端無需求增加 attack surface | 真有需求再評估 |
| | HMAC user_id 簽章 | ADR-015 已決策不做 | |
| | caller credentialseparate API key per service| Phase 2 真有第二個 caller 時再做 | #13LOW|
| | Multi-instance rate limit / bandwidth quota / concurrent cap Redis store | 當前單 instance可接受 | #8**HIGH**、升級|
| | MinIO socketTimeout 對齊 stream timeout | Phase 2 evaluate | 候補 #16 |
| | 4xx 統一回 404不揭露 lifecycle | 需配合 #12 per-job auth 啟動 | 候補 #15 |
### 14.9 環境變數Backend + DevOps 同步、Phase B 新增)
| Env | 預設值 | 用途 |
|-----|-------|------|
| `RESULT_STREAM_TIMEOUT_MS` | 3000005 min| Stream response timeoutAC-7|
| `MAX_CONCURRENT_RESULT_STREAMS` | 10 | Concurrent stream capAC-4|
| `RESULT_RATE_LIMIT_BURST_PER_10S` | 5 | Burst rate limitAC-2|
| `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | 20 | Sustained rate limitAC-2|
| `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | 10737418241 GB| Hourly bandwidth quotaAC-3|
| `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | 64424509446 GB| Daily bandwidth quotaAC-3|
### 14.10 Reviewer 在 Phase B 完成後必驗證
| # | 驗證項 | 方法 |
|---|--------|------|
| R1 | `source_filename` 寫入點存在且為 sanitized 字串 | grep `apps/task-scheduler/src/routes/v1/jobs.js` |
| R2 | Token / NEF binary 不出現在任何 log statement | grep `console.log\|console.error` + 人工 review |
| R3 | Two-tier rate limitburst + sustained+ bandwidth quota + concurrent cap 四個 middleware 都掛在 `/result`wire 順序正確 | v1/index.js `router.use('/jobs/:id/result', ...)` |
| R4 | Range header 處理不解析不回 416 `Accept-Ranges: none` audit log `result.range_attempted` | handler header grep `416` 無命中 |
| R5 | Stream response timeout 5 min + audit log | grep `res.setTimeout` + grep `result.stream_timeout` |
| R6 | Concurrent stream cap = 10 + 新寫 middleware不複用 uploadConcurrency| grep `resultStreamConcurrency` |
| R7 | Audit log 12 action 全寫了 + 每個都含 A.7 五欄 + /result 四欄 | grep `action: 'result\.` 至少 12 distinct action match + 3 event 驗欄位完整 |
| R8 | Content-Disposition quote-escape + RFC 5987 + buildFilename assertion | setHeader / buildFilename + grep `result.filename_assertion_failed` |
| R9 | bandwidth quota bucket key `token_fingerprint`不是 clientId| keyGenerator 注入 |
| R10 | OpenAPI spec 更新含 429 / 503 等新 status code | `docs/openapi.yaml` |
| R11 | 6 個新 integration testIT-2 IT-7全寫 + 4xx 系列原 test 保留 | grep test |
| R12 | 6 個新 env README / deploy doc 有文件化 | README + `.env.example` |
---
## 16. Env Naming Reference Table**Architect 2026-05-18 拍板、避免三方命名漂移**
### 16.1 為什麼這節存在
Phase 0.8b Phase B deploy DevOps 發現 Orchestrator promptsource code`env.example` §17 三方 env 命名不一致Orchestrator `RESULT_RATE_LIMIT_SUSTAINED_MAX`source code `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + 額外 `_WINDOW_MS`)。此節提供**權威 canonical 命名清單**後續任何 deployment 設定docker-compose env 注入`.env*` 模板生成Orchestrator 派任務文件互引全部以本節為準
**拍板原則** source code 實際讀的命名為準 code 風險 > 改 deployment。本表已對照 `apps/task-scheduler/src/routes/v1/index.js` L116-169、`apps/task-scheduler/src/routes/v1/result.js` L60、`apps/task-scheduler/src/config.js` L112-296、`apps/task-scheduler/src/services/jobService.js` L69、`apps/task-scheduler/src/storage/{local,minio}.js``apps/task-scheduler/src/auth/apiKeyMiddleware.js``apps/task-scheduler/env.example` 三方確認。
### 16.2 Canonical 命名清單(全部 task-scheduler 讀的 env
#### 16.2.1 `/result` 端點專屬Phase 0.8b Phase B 新增、本檔 spec 範圍)
| Canonical env name | Required? | Default | Purpose | Source code reads |
|--------------------|-----------|---------|---------|-------------------|
| `RESULT_STREAM_TIMEOUT_MS` | optional | `300000`5 min| AC-7 stream response timeout§14.3 / §15.2| `src/routes/v1/result.js` L60`getStreamTimeoutMs` lazy 讀)|
| `MAX_CONCURRENT_RESULT_STREAMS` | optional | `10` | AC-4 concurrent stream cap§14.1 / §15.3per-instance counter超限 503 + `Retry-After: 30` | `src/routes/v1/index.js` L128`parseEnvInt` 解析後傳給 `createResultStreamConcurrencyLimiter`|
| `RESULT_RATE_LIMIT_BURST_PER_10S` | optional | `5` | AC-2 burst rate limit `max`§9.2 / §14.1per `token_fingerprint` | `src/routes/v1/index.js` L160 |
| `RESULT_RATE_LIMIT_BURST_WINDOW_MS` | optional | `10000`10 s| AC-2 burst rate limit `windowMs`;與 `_PER_10S` 成對 | `src/routes/v1/index.js` L159 |
| `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | optional | `20` | AC-2 sustained rate limit `max`§9.2 / §14.1per `token_fingerprint` | `src/routes/v1/index.js` L169 |
| `RESULT_RATE_LIMIT_SUSTAINED_WINDOW_MS` | optional | `60000`1 min| AC-2 sustained rate limit `windowMs`;與 `_PER_MIN` 成對 | `src/routes/v1/index.js` L167 |
| `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | optional | `1073741824`1 GB| AC-3 hourly bandwidth quota§9.4 / §14.1per `token_fingerprint` | `src/routes/v1/index.js` L116 |
| `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | optional | `6442450944`6 GB| AC-3 daily bandwidth quotaper `token_fingerprint` | `src/routes/v1/index.js` L119 |
**共 8 個 `RESULT_*` / `MAX_CONCURRENT_RESULT_STREAMS`。所有 8 個皆 optional無設值會走 source code 內 fallback default。**
#### 16.2.2 其他 task-scheduler 讀的 env非 `/result` 專屬、列出避免後續混淆)
| Canonical env name | Required? | Default | Purpose | Source code reads |
|--------------------|-----------|---------|---------|-------------------|
| `CONVERTER_API_KEY` | optionalwarn-only| `''`(空字串)| visionA → converter 對外 API 認證 pre-shared key未設時 `apiKeyMiddleware` 一律回 503 `service_unavailable` | `src/config.js` L167 + `src/auth/apiKeyMiddleware.js` L234 |
| `TRUST_PROXY` | optional | `'loopback'` | Express `app.set('trust proxy', ...)`;影響 `req.ip` 與 audit log `source_ip`接受值boolean / integer / 字串 keyword / CIDR| `src/config.js` L145 + `src/app.js` |
| `PORT` | optional | `4000` | HTTP listen port | server entry未經 config.js|
| `NODE_ENV` | optional | `'development'` | 影響 FAA URL 強制 HTTPS 等行為 | `src/config.js` L212 |
| `LOG_LEVEL` | optional | `'info'` | log 等級 | server entry未經 config.js|
| `REDIS_URL` | optional | `'redis://localhost:6379'` | Redis 連線字串 | `src/redis.js` L24 |
| `FRONTEND_URL` | optional | `'http://localhost:3000'` | CORS origin | `src/app.js` L59 |
| `JOB_DATA_DIR` | optional | `'/data/jobs'` | local storage 與 worker 共用 volume 路徑 | `src/services/jobService.js` L69、`src/storage/local.js` L23 |
| `STORAGE_BACKEND` | optional | `'local'` | `'local'` / `'minio'` | `src/app.js` L170、`src/storage/minio.js` L39、`src/routes/v1/jobs.js` L945 |
| `MINIO_ENDPOINT_URL` | conditional`STORAGE_BACKEND=minio` 必填)| `'http://192.168.0.130:9000'` | MinIO endpoint | `src/storage/minio.js` L40 |
| `MINIO_BUCKET` | conditional | `'convertet-working-space'` | MinIO bucket name | `src/storage/minio.js` L41 |
| `MINIO_ACCESS_KEY` | conditional | `'convuser'` | MinIO access key | `src/storage/minio.js` L42 |
| `MINIO_SECRET_KEY` | conditional | `''`(空字串)| MinIO secret key | `src/storage/minio.js` L43 |
| `MINIO_REGION` | conditional | `'us-east-1'` | MinIO region | `src/storage/minio.js` L44 |
| `MINIO_LIFECYCLE_DAYS` | optional | TBD 由 bucket lifecycle policy 設)| bucket lifecycle 天數orphan 清除 | env.example §6非 task-scheduler 讀、由 init script 用)|
| `MEMBER_CENTER_TOKEN_URL` | **required** | — | converter → FAA OAuth client token endpoint缺漏 → fail-fast| `src/config.js` L112 |
| `KNERON_CONVERTER_CLIENT_ID` | **required** | — | converter OAuth client_id缺漏 → fail-fast| `src/config.js` L116 |
| `KNERON_CONVERTER_CLIENT_SECRET` | **required** | — | converter OAuth client_secret缺漏 → fail-fast| `src/config.js` L117 |
| `FILE_ACCESS_AGENT_BASE_URL` | **required** | — | FAA base URLprod 強制 https | `src/config.js` L197 |
| `FILE_ACCESS_AGENT_AUDIENCE` | **required** | — | FAA OAuth audience | `src/config.js` L198 |
| `PROMOTE_TIMEOUT_MS` | optional | `300000`300 s| FAA PUT timeout | `src/config.js` L220 + `src/app.js` L125 |
| `OAUTH_TOKEN_REFRESH_SKEW_MS` | optional | `60000`60 s| OAuth token 距 expiresAt 還剩多少 ms 主動 refresh | `src/config.js` L225 |
| `OAUTH_TOKEN_TIMEOUT_MS` | optional | `10000`10 s| OAuth token endpoint timeout | `src/config.js` L228 |
| `MULTIPART_MODEL_MAX_BYTES` | optional | `524288000`500 MB| multer model file size 上限 | `src/config.js` L243 |
| `MULTIPART_REF_IMAGE_MAX_BYTES` | optional | `10485760`10 MB| 單張 ref_image 上限 | `src/config.js` L252 |
| `MULTIPART_REF_IMAGES_MAX_COUNT` | optional | `100` | ref_images 張數上限 | `src/config.js` L261 |
| `MAX_CONCURRENT_UPLOADS` | optional | `5` | 同時間最多進行幾個 upload超限 503 + `Retry-After` | `src/config.js` L285 |
| `UPLOAD_RETRY_AFTER_SECONDS` | optional | `30` | upload 超限的 `Retry-After` 秒數 | `src/config.js` L291 |
| `API_V1_RATE_LIMIT_WINDOW_MS` | optional | `300000`5 min| per-clientId rate limit window當前 clientId 寫死 `'visionA-service'` | env.example §15middleware 預設可被 deps 覆寫)|
| `API_V1_RATE_LIMIT_MAX` | optional | `300` | per-clientId rate limit max | env.example §15 |
### 16.3 命名規則約定(後續新增 env 必遵)
- **scope prefix**:與 `/result` 端點相關用 `RESULT_*`、與 upload 相關用 `UPLOAD_*` / `MULTIPART_*`、與 OAuth 相關用 `OAUTH_*` / `MEMBER_CENTER_*` / `KNERON_CONVERTER_*`、與 FAA 相關用 `FILE_ACCESS_AGENT_*`、與 MinIO 相關用 `MINIO_*`、與 API key 認證相關用 `CONVERTER_API_KEY`
- **單位後綴**:時間 = `_MS` / `_SECONDS`;空間 = `_BYTES`count = `_PER_<時間單位>` / `_MAX_COUNT`(不混用)
- **rate limit 命名**`_PER_<period>` 表單位時間最大次數;`_WINDOW_MS` 表 sliding window 大小;兩者成對出現(`_PER_10S` + `_BURST_WINDOW_MS=10000` 不算違規、單位來自 window
- **bandwidth quota 命名**`_QUOTA_PER_<period>_BYTES`period 用單字(`HOUR` / `DAY`、非 `1H` / `24H`
- **concurrent cap 命名**`MAX_CONCURRENT_<scope>``<scope>_CONCURRENT_MAX`(本檔已存在 `MAX_CONCURRENT_UPLOADS` + `MAX_CONCURRENT_RESULT_STREAMS`、不再變動)
### 16.4 已知非標準命名(接受、不要 rename
以下命名與 §16.3 規則不完全一致、但已散佈在 source code + tests + docs、rename 成本高、且語意清楚、**保留不變**
- `MAX_CONCURRENT_RESULT_STREAMS`(理論上應為 `RESULT_CONCURRENT_STREAM_MAX`、但 `MAX_CONCURRENT_UPLOADS` 已用此 pattern、保持一致
- `MAX_CONCURRENT_UPLOADS`(同上)
- `KNERON_CONVERTER_CLIENT_ID` / `_SECRET`(理論上應為 `OAUTH_CONVERTER_*`、但這是 converter 在 OAuth 體系中的身份名、有歷史脈絡)
- `MEMBER_CENTER_TOKEN_URL`service-name prefix 而非 scope prefix、但語意明確
未來新增 env 必依 §16.3、不再增加例外。
### 16.5 此節維護者責任
- **Architect**:拍板命名 + 維護本表Backend 想新增 env 必先過此表PR 改 `env.example` 時同步改本表)
- **Backend**source code 增減 `process.env.*` 讀取點時PR 必同步更新本表的 "Source code reads" 欄
- **DevOps**deployment / docker-compose env 注入以本表為唯一權威;不接受 source code 與本表不一致的部署設定
- **Orchestrator**:派任務給 Backend / DevOps 時env 命名必引用本表(不憑記憶寫)
### 16.6 三方對齊現況2026-05-18 deploy 時發現的不一致 + 解決狀態)
| 不一致案例deploy 時)| Orchestrator prompt 曾用過 | Source code 實際讀 | env.example §17 | 拍板 |
|----------------------|--------------------------|-------------------|-----------------|------|
| Sustained rate limit | `RESULT_RATE_LIMIT_SUSTAINED_MAX` | `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + `_WINDOW_MS` | `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + `_WINDOW_MS` | source code 為準(已對齊)|
| Bandwidth quota | `RESULT_BANDWIDTH_HOURLY_QUOTA` | `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` + `_PER_DAY_BYTES` | `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` + `_PER_DAY_BYTES` | source code 為準(已對齊)|
| Concurrent cap | `RESULT_CONCURRENT_STREAM_MAX` | `MAX_CONCURRENT_RESULT_STREAMS` | `MAX_CONCURRENT_RESULT_STREAMS` | source code 為準(已對齊)|
| Stream timeout | `RESULT_STREAM_TIMEOUT_MS` | `RESULT_STREAM_TIMEOUT_MS` | `RESULT_STREAM_TIMEOUT_MS` | 三方原本就一致 |
**結論**:實際 source code 與 `env.example` §17 已完全對齊(差異僅在 Orchestrator 派任務的 prompt 描述。Backend 與 DevOps 不需動 source code / docker-compose / env.exampleOrchestrator 後續派任務以本表為準即可。