d8a9517 commit 漏改 docker-compose.yml:scheduler service environment block
沒透傳 Phase 0.8b 新 env、即使 stage .env 設了 container 也讀不到、
deploy 後 CONVERTER_API_KEY undefined 會啟動 503 reject all requests。
docker-compose.yml:
- 新增 10 個 Phase 0.8b env 透傳(CONVERTER_API_KEY 無 default fail-secure、
其他用 ${VAR:-default} fail-soft)
- 砍 9 個已廢 OAuth resource-server env(MEMBER_CENTER_ISSUER / JWKS_URL /
AUDIENCE / CONVERTER_TENANT_ID / SCOPE_* / JWKS_* / JWT_*)
- 保留 8 個 promote → FAA 用 env(MEMBER_CENTER_TOKEN_URL /
KNERON_CONVERTER_CLIENT_ID/SECRET / FILE_ACCESS_AGENT_* /
OAUTH_TOKEN_* / PROMOTE_TIMEOUT_MS)
docs/autoflow/04-architecture/api/api-result.md §16:
- 新增 Env Naming Reference Table(30 個 canonical env names)
- 拍板 source code 為 single source of truth、env.example 對齊
- 確認 /result 8 個 env + 其他 22 個的命名規格
- 留歷史記錄:Orchestrator 之前用過想像中縮寫名(_MAX / _HOURLY_QUOTA /
RESULT_CONCURRENT_STREAM_MAX)造成命名混亂、§16 為未來 prompt 引用標準
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1613 lines
86 KiB
Markdown
1613 lines
86 KiB
Markdown
# API: `GET /api/v1/jobs/:id/result`(Phase 0.8b 新增、Phase B 設計 2026-05-17 強化)
|
||
|
||
> **狀態**:Phase 0.8b 新增,取代原 Phase 2 delegated download token 設計;Phase B 啟動前經 Security review 強化 streaming 攻擊面 mitigation。
|
||
> **配套**:visionA repo `adr-016-download-via-converter.md` v1.0、`design-doc.md` §3.3 / ADR-011。
|
||
> **Phase B 設計強化來源**(2026-05-17):
|
||
> - Security review:`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md`(4 Major + 3 Minor + 2 Suggestion)
|
||
> - Architect 採納範圍:M1-M4(rate limit + bandwidth quota + Range + stream timeout + concurrent cap)+ m1-m3(quote-escape + 429/503 status + audit log 欄位)
|
||
> - 詳見 §9(rate limit + bandwidth quota)、§10(Range)、§11(audit log)、§13.4a(filename defense-in-depth)、§15(streaming resource limits)、§14(acceptance criteria AC-1 到 AC-12)
|
||
|
||
---
|
||
|
||
## 1. 用途
|
||
|
||
visionA-backend 用此 endpoint 從 Converter Bucket 直接拿 NEF 結果檔(streaming proxy)。
|
||
|
||
取代原本「visionA → 拿 delegated download token → FAA」路徑(該路徑因 MC 沒實作 endpoint 而從未跑通)。
|
||
|
||
---
|
||
|
||
## 2. Request
|
||
|
||
```http
|
||
GET /api/v1/jobs/{id}/result HTTP/1.1
|
||
Host: converter.innovedus.com
|
||
Authorization: Bearer <CONVERTER_API_KEY>
|
||
X-Request-Id: <uuid> (optional)
|
||
```
|
||
|
||
### 2.1 Path params
|
||
|
||
| 欄位 | 類型 | 說明 |
|
||
|------|------|------|
|
||
| `id` | string (UUIDv4) | Job ID |
|
||
|
||
### 2.2 Query / Body
|
||
|
||
**無**(streaming endpoint 不支援額外參數)。
|
||
|
||
### 2.3 Auth + Rate Limit + Bandwidth Quota + Concurrent Cap
|
||
|
||
- `Authorization: Bearer <CONVERTER_API_KEY>`(API key middleware,見 `auth.md` §1)
|
||
- **Rate limit**(詳見 §9、Security 2026-05-17 review 後改):
|
||
- Burst:5 req / 10 sec per `token_fingerprint`
|
||
- Sustained:20 req / min per `token_fingerprint`
|
||
- **Bandwidth quota**(詳見 §9):
|
||
- Hourly:1 GB / hr per `token_fingerprint`
|
||
- Daily:6 GB / 24hr per `token_fingerprint`
|
||
- **Concurrent stream cap**(詳見 §15):max 10 同時 stream(per-instance)
|
||
- **Stream timeout**(詳見 §15):5 分鐘(超時 destroy connection)
|
||
|
||
---
|
||
|
||
## 3. Response 200(成功)
|
||
|
||
```http
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/octet-stream
|
||
Content-Length: <NEF binary 大小>
|
||
Content-Disposition: attachment; filename="<source_filename_stem>_<chip>.nef"
|
||
X-Request-Id: <uuid>
|
||
|
||
```
|
||
|
||
### 3.1 Headers
|
||
|
||
| Header | 規則 |
|
||
|--------|------|
|
||
| `Content-Type` | `application/octet-stream`(或 MinIO HEAD 回傳的 `contentType`,預設 octet-stream) |
|
||
| `Content-Length` | NEF 物件大小 bytes(從 MinIO HEAD 取);**必須帶**,visionA 端用來決定 timeout |
|
||
| `Content-Disposition` | `attachment; filename="<filename>"`,filename 規則見 §3.2 |
|
||
| `X-Request-Id` | 沿用 request_id middleware 設定的 ID |
|
||
|
||
### 3.2 Filename 規則
|
||
|
||
**格式**:`<source_filename_stem>_<chip>.nef`
|
||
|
||
| 輸入 | 結果 |
|
||
|------|------|
|
||
| `source_filename = yolov5s.onnx`、`platform = '720'` | `yolov5s_720.nef` |
|
||
| `source_filename = model.pt`、`platform = '530'` | `model_530.nef` |
|
||
| `source_filename` 缺失(極端情境)| `job_<job_id>.nef`(fallback) |
|
||
| `platform` 缺失(極端情境)| `job_<job_id>.nef`(fallback) |
|
||
|
||
**注意**:`job.platform` 為 createJob validator 接受的數字字串(如 `'720'` / `'530'` / `'520'`、無 `KL` prefix)、`buildFilename` 透過 `.toLowerCase()` 標準化(對純數字字串無變化、保留同樣的標準化邏輯以兼容未來可能的字母混用 platform)。
|
||
|
||
**實作邏輯**:
|
||
|
||
```javascript
|
||
function buildFilename(job) {
|
||
const sourceFilename = job.source_filename || '';
|
||
const platform = (job.platform || '').toLowerCase();
|
||
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
|
||
if (stem && platform) {
|
||
return `${stem}_${platform}.nef`;
|
||
}
|
||
return `job_${job.job_id || 'unknown'}.nef`;
|
||
}
|
||
```
|
||
|
||
**邊界情境**:
|
||
- `source_filename` 含特殊字元(已 sanitized 由 `sanitizeFilename`)— 不再二次 sanitize
|
||
- `platform` 大小寫 — 統一 lower-case(對齊 visionA `defaultDownloadFilename` 慣例)
|
||
|
||
### 3.3 Body
|
||
|
||
NEF binary stream(Node Stream pipe)。
|
||
|
||
**不要 buffer 整個檔**。NEF 可能 < 50MB(常見)至 數百 MB(極端),buffer 會 OOM。
|
||
|
||
---
|
||
|
||
## 4. Response 4xx / 5xx
|
||
|
||
統一格式:
|
||
|
||
```json
|
||
{
|
||
"error": {
|
||
"code": "<error_code>",
|
||
"message": "<zh-TW message>",
|
||
"details": { /* 可選 */ },
|
||
"request_id": "<uuid>"
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4.1 失敗情境
|
||
|
||
| HTTP | error.code | 情境 | 訊息範例 |
|
||
|------|-----------|------|---------|
|
||
| 401 | `invalid_token` | API key missing / 格式錯 / 不符 | API key 驗證失敗 |
|
||
| 404 | `job_not_found` | jobID 不存在 | Job {jobId} not found |
|
||
| 404 | `result_not_found` | job 已 completed 但 result_object_keys 內沒 NEF | Job {jobId} completed but no NEF result available |
|
||
| 409 | `job_not_completed` | job 還沒 completed(still running / failed) | Job {jobId} is {status}; result only available after completion |
|
||
| 410 | `result_expired` | converter MinIO 已過期清除(7 天 `expires_at` 後)| Job {jobId} result expired at {expires_at}; re-convert to get a fresh result |
|
||
| 422 | `invalid_request` | path param 異常 | job id is required |
|
||
| **429** | **`rate_limit_exceeded`** | **req/min 或 burst 超限**(**Security 必補**) | **請求頻率過高,請稍後再試**(含 `limit_type: 'burst' \| 'sustained'`) |
|
||
| **429** | **`bandwidth_quota_exceeded`** | **1hr/24hr bandwidth quota 超限**(**Security 必補**)| **下載額度已用完,請稍後再試**(含 `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`) |
|
||
| 502 | `storage_unavailable` | MinIO 連不上 / `getObjectStream` throw | 無法讀取結果檔,請稍後重試 |
|
||
| **503** | **`service_busy`** | **Concurrent stream cap 達到上限**(**Security 必補**)| **伺服器忙碌中,請稍後再試**(含 `limit_type: 'concurrent'`、`Retry-After: 30`)|
|
||
| **503** | **`stream_timeout`** | **response stream 超時(5 分鐘)**(**Security 必補**)| **下載逾時,請重試** |
|
||
| 503 | `service_unavailable` | API key 未配置 / 其他暫時性錯誤 | API key not configured |
|
||
|
||
### 4.2 status code 選擇邏輯
|
||
|
||
```
|
||
if (API key invalid) → 401
|
||
if (job not in Redis) → 404 job_not_found
|
||
if (job.status !== 'completed') → 409 job_not_completed
|
||
if (job.expires_at < now) → 410 result_expired
|
||
if (no nefKey extractable) → 404 result_not_found
|
||
if (minio.getObjectStream throw)
|
||
- if MinIO not found error → 410 result_expired
|
||
- else → 502 storage_unavailable
|
||
```
|
||
|
||
### 4.3 順序的重要性
|
||
|
||
**先檢查 status 再檢查 expires_at**:若 job 還 running,回 409 比 410 更精確(resource 還在、只是還沒完成)。
|
||
|
||
**最後檢查 nefKey extractable**:404 `result_not_found` 是「job 完成但沒 NEF」的特殊情境,應該不會發生(因為 NEF 是最後一階段、completed 就一定有),但保險。
|
||
|
||
### 4.4 Stream 中斷處理
|
||
|
||
Stream 開始後(headers 已送出)若 MinIO stream 出錯:
|
||
|
||
- **不能改 status code**(headers 已發)
|
||
- 唯一動作:`res.destroy(streamErr)` + log ERROR + client 看到 `ECONNRESET`
|
||
- Client(visionA)應實作 retry 邏輯
|
||
|
||
Client 主動關連線(`req.on('close')`):
|
||
|
||
- 主動 `result.stream.destroy()` 釋放 MinIO connection
|
||
- Log INFO(不算錯)
|
||
|
||
---
|
||
|
||
## 5. 與既有 endpoint 的關係
|
||
|
||
### 5.1 Job lifecycle 對應
|
||
|
||
```
|
||
created
|
||
↓ (Worker 處理)
|
||
running (stage = onnx → bie → nef)
|
||
↓
|
||
COMPLETED + result_object_keys.nef 有值
|
||
↓ ───→ GET /jobs/:id/result → 200 + NEF stream
|
||
↓
|
||
(7 天後 expires_at 過了)
|
||
↓
|
||
expired(NEF 在 MinIO 已被 lifecycle 清掉,job record 可能還在 Redis)
|
||
↓ ───→ GET /jobs/:id/result → 410 result_expired
|
||
```
|
||
|
||
### 5.2 與 `/promote` 的關係
|
||
|
||
`/result` 與 `/promote` **獨立**:
|
||
|
||
- `/promote`:把 NEF 從 Converter Bucket 搬到 FAA NAS Bucket(長期儲存)
|
||
- `/result`:從 Converter Bucket streaming 給 caller
|
||
|
||
→ visionA 可以同時打兩個(promote 後 NAS 有檔;result 立即下載給 user)。
|
||
|
||
NEF 在 Converter Bucket 7 天後過期清掉、FAA NAS Bucket 永久(由 FAA 端 lifecycle 管理)。**過期後 `/result` 回 410,client 該重新轉檔**(不應該 fallback 去 FAA 拿,那會繞回 delegated download token 死路)。
|
||
|
||
### 5.3 與 Phase 2 預留 `/download-tokens` 的關係
|
||
|
||
`POST /api/v1/jobs/:id/download-tokens` 在 Phase 2 預留(回 501)。**不衝突**:
|
||
|
||
- `/download-tokens`:未來給 browser 直連 converter download 用的 short-TTL token
|
||
- `/result`:給 visionA backend stream proxy 用
|
||
|
||
兩個用途不同,可共存。Phase 0.8b 不啟用 `/download-tokens`。
|
||
|
||
---
|
||
|
||
## 6. 實作細節
|
||
|
||
### 6.1 NEF object key 解析(雙路徑)
|
||
|
||
對齊 promote 流程的 `getJobOutputKey`:
|
||
|
||
```javascript
|
||
function extractNefObjectKey(job) {
|
||
// 新格式
|
||
if (job.result_object_keys
|
||
&& typeof job.result_object_keys === 'object'
|
||
&& typeof job.result_object_keys.nef === 'string'
|
||
&& job.result_object_keys.nef.length > 0) {
|
||
return job.result_object_keys.nef;
|
||
}
|
||
// 舊格式(向後相容)
|
||
if (job.output
|
||
&& typeof job.output === 'object'
|
||
&& typeof job.output.nef_path === 'string'
|
||
&& job.output.nef_path.length > 0) {
|
||
return job.output.nef_path;
|
||
}
|
||
return null;
|
||
}
|
||
```
|
||
|
||
### 6.2 Streaming 流程
|
||
|
||
```javascript
|
||
router.get('/', async (req, res, next) => {
|
||
try {
|
||
const jobId = req.params.id;
|
||
if (!jobId) return next(new ApiError(400, 'invalid_request', 'job id is required'));
|
||
|
||
// 1. 拿 job record
|
||
const job = await jobService.getJob(jobId);
|
||
if (!job) return next(new ApiError(404, 'job_not_found', `Job ${jobId} not found`));
|
||
|
||
// 2. 檢查 status
|
||
if (job.status !== 'COMPLETED') { // internal status 是大寫
|
||
return next(new ApiError(409, 'job_not_completed',
|
||
`Job ${jobId} is ${job.status}; result only available after completion`));
|
||
}
|
||
|
||
// 3. 檢查 expires_at
|
||
if (job.expires_at && new Date(job.expires_at) < new Date()) {
|
||
return next(new ApiError(410, 'result_expired',
|
||
`Job ${jobId} result expired at ${job.expires_at}`));
|
||
}
|
||
|
||
// 4. 解析 NEF object key
|
||
const nefKey = extractNefObjectKey(job);
|
||
if (!nefKey) {
|
||
return next(new ApiError(404, 'result_not_found',
|
||
`Job ${jobId} completed but no NEF result available`));
|
||
}
|
||
|
||
// 5. 從 MinIO 拿 stream
|
||
let result;
|
||
try {
|
||
result = await minioStorage.getObjectStream(nefKey);
|
||
} catch (err) {
|
||
logEvent({ level: 'ERROR', action: 'result.minio_failed', /* ... */ });
|
||
return next(new ApiError(502, 'storage_unavailable', /* ... */));
|
||
}
|
||
if (!result) {
|
||
return next(new ApiError(410, 'result_expired',
|
||
`Job ${jobId} NEF object not found in storage (likely expired)`));
|
||
}
|
||
|
||
// 6. 設 headers
|
||
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
|
||
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
|
||
res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`);
|
||
|
||
// 7. Stream pipe
|
||
result.stream.on('error', (streamErr) => {
|
||
logEvent({ level: 'ERROR', action: 'result.stream_error', /* ... */ });
|
||
if (!res.destroyed) res.destroy(streamErr);
|
||
});
|
||
req.on('close', () => {
|
||
if (result.stream && typeof result.stream.destroy === 'function') {
|
||
result.stream.destroy();
|
||
}
|
||
});
|
||
result.stream.pipe(res);
|
||
} catch (err) {
|
||
return next(err);
|
||
}
|
||
});
|
||
```
|
||
|
||
### 6.3 為什麼用 mergeParams
|
||
|
||
router 掛在 `/jobs/:id/result`、handler 用 `/` path、`mergeParams: true` 才能讀到 `:id`:
|
||
|
||
```javascript
|
||
const router = express.Router({ mergeParams: true });
|
||
router.get('/', handler);
|
||
// in createV1Router:
|
||
router.use('/jobs/:id/result', requireApiKey(), perClientLimiter, createResultRouter({ ... }));
|
||
```
|
||
|
||
### 6.4 Log 規則
|
||
|
||
| 場景 | level | action |
|
||
|------|-------|--------|
|
||
| Happy path(200)| INFO | `result.success`(含 job_id、size_bytes、duration_ms) |
|
||
| 404 / 409 / 410 | INFO | `result.not_available`(含 reason) |
|
||
| 502 MinIO 失敗 | ERROR | `result.minio_failed`(含 error_name、error_code,不 log MinIO endpoint) |
|
||
| Stream 中斷(已送 headers)| ERROR | `result.stream_error` |
|
||
| Client 主動斷線 | INFO | `result.client_closed` |
|
||
|
||
---
|
||
|
||
## 7. Test 範圍(Backend 實作 + Testing 驗證)
|
||
|
||
### 7.1 Integration test(必做)
|
||
|
||
- ✅ Happy path(200):completed job + 有 NEF + 不過期 → 完整 stream NEF binary、Content-Type / Content-Length / Content-Disposition 正確
|
||
- ❌ 401(missing API key)
|
||
- ❌ 401(wrong API key)
|
||
- ❌ 404 `job_not_found`(jobID 不存在)
|
||
- ❌ 404 `result_not_found`(completed 但沒 NEF)
|
||
- ❌ 409 `job_not_completed`(status = ONNX / BIE / NEF / FAILED)
|
||
- ❌ 410 `result_expired`(expires_at 在過去)
|
||
- ❌ 410 `result_expired`(MinIO `getObjectStream` 回 null)
|
||
- ❌ 502 `storage_unavailable`(MinIO throw)
|
||
- ❌ 503 `service_unavailable`(CONVERTER_API_KEY 未設定 — 但其實這在 middleware 層、走全部 endpoint 都會中)
|
||
|
||
### 7.2 Unit test
|
||
|
||
- `extractNefObjectKey`:新格式、舊格式、缺失 → null
|
||
- `buildFilename`:標準情境、缺 source_filename、缺 platform、副檔名變體(.onnx / .pt / .tflite)
|
||
- Stream error handling(mock stream emit error)
|
||
- Client close handling(mock req emit close)
|
||
|
||
### 7.3 Stress / 邊界 test(選做)
|
||
|
||
- 大檔 stream(200MB NEF)— 確認記憶體不爆
|
||
- 多並發 stream(10 個 client 同時下載)— 確認 Scheduler 不掛
|
||
- Slow client(client 收得慢)— 確認 stream 不會無限堆 buffer
|
||
|
||
---
|
||
|
||
## 8. Curl 範例
|
||
|
||
```bash
|
||
# Happy path
|
||
curl -i \
|
||
-H "Authorization: Bearer $CONVERTER_API_KEY" \
|
||
https://converter.innovedus.com/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/result \
|
||
-o output.nef
|
||
|
||
# 預期:
|
||
# HTTP/1.1 200 OK
|
||
# Content-Type: application/octet-stream
|
||
# Content-Length: 12345678
|
||
# Content-Disposition: attachment; filename="yolov5s_720.nef"
|
||
```
|
||
|
||
```bash
|
||
# 過期情境
|
||
curl -i \
|
||
-H "Authorization: Bearer $CONVERTER_API_KEY" \
|
||
https://converter.innovedus.com/api/v1/jobs/expired-job-id/result
|
||
|
||
# 預期:
|
||
# HTTP/1.1 410 Gone
|
||
# Content-Type: application/json; charset=utf-8
|
||
# {"error":{"code":"result_expired","message":"...","request_id":"..."}}
|
||
```
|
||
|
||
---
|
||
|
||
## 9. Rate Limit + Bandwidth Quota(Phase B 設計,**Security 2026-05-17 review 後修正**)
|
||
|
||
> **重要變更**:原 60 req/min single tier 設計被 Security review(`.autoflow/07-delivery/security-design-review-phase-b-2026-05-17.md` §1 Q3 / Major M4)否定。
|
||
> **新設計**:two-tier req-based limit + bandwidth quota two-tier。理由:req-count 無法區分大檔/小檔,`/result` 核心攻擊面是**頻寬**不是 req count。
|
||
|
||
### 9.1 為什麼 `/result` 要獨立 rate limit + bandwidth quota
|
||
|
||
| | `/jobs` 寫入端點(既有) | `/result` 下載端點(Phase B) |
|
||
|--------------|----------------------|---------------------------|
|
||
| 既有配額 | 300 req / 5 min per client_id | — |
|
||
| 工作負載成本 | CPU(multer parse)+ MinIO write | MinIO read + 持續 streaming(可達 100MB+ / req) |
|
||
| Blast radius(attacker 拿到 key) | 占用 worker queue / 灌滿 MinIO | 流量放大鏡:1 個 jobID = 100MB+ 下載;快速耗盡頻寬 |
|
||
| 限流軸 | req count 為主 | **req count + bandwidth 雙軸**(攻擊面在頻寬而非次數) |
|
||
|
||
Security 量化分析(review §1 Q3):
|
||
|
||
| 設計 | Normal user(P95 ≈ 120 req/min)| Attacker(每 req 100MB)|
|
||
|------|-------------------------------|------------------------|
|
||
| 原 60 req/min | **過嚴**(誤殺 retry burst)| **過寬**(6 GB/min = 800 Mbps、8.6 TB/day = $770/day cloud egress)|
|
||
| 新 two-tier + bandwidth quota | 充分(20 req/min sustained + 5 req/10s burst 覆蓋 retry pattern)| 1 GB/hr ceiling 直接堵頻寬攻擊 |
|
||
|
||
`/result` 開放後是「流量放大鏡」:attacker 拿到 key 後不在乎 req 次數、在乎每次能拉多少 byte。**只擋次數不擋頻寬 = 沒有實質保護**。
|
||
|
||
### 9.2 限流軸總表
|
||
|
||
| 限制軸 | 數值 | 用途 | bucket key |
|
||
|--------|------|------|-----------|
|
||
| **Burst rate** | 5 req / 10 sec | 阻擋短時間 burst | `token_fingerprint` |
|
||
| **Sustained rate** | 20 req / min | 涵蓋 visionA P95 normal load(120 req/min ÷ 10 caller ≈ 12 req/min/key、留 1.7× headroom);阻擋持續 mass request | `token_fingerprint` |
|
||
| **Bandwidth hourly** | 1 GB / hr | 阻擋大量 NEF 下載(attacker 撞滿 = 24 GB/day、可控成本)| `token_fingerprint` |
|
||
| **Bandwidth daily** | 6 GB / 24hr | 阻擋 attacker 用「每小時剛好 1 GB」迴避 hourly limit | `token_fingerprint` |
|
||
|
||
**bucket key 用 `token_fingerprint`(A.7 已實作 SHA-256)**:
|
||
|
||
- 不用 `clientId`:當前 1:1 trust 下所有 caller 都是 `'visionA-service'`、bucket 平坦化、無區分力
|
||
- `token_fingerprint` 在 Phase 0.8b 1:1 trust 下實際等同 caller id;Phase 2 引入 per-caller credential 後自動對齊
|
||
- forensic 用途:audit log 已記錄 `token_fingerprint`、限流統計與 forensic 同 key 可 cross-correlate
|
||
|
||
### 9.3 為什麼 two-tier req limit
|
||
|
||
Single tier 處理不了 visionA exponential backoff retry pattern:
|
||
|
||
| Retry 間隔 | 10 sec window 內 req 數 | 1 min window 內 req 數 |
|
||
|-----------|-----------------------|----------------------|
|
||
| 1s / 5s / 15s(visionA `ConverterClient.GetResult` 預設)| 1-2 個 | 3-4 個 |
|
||
|
||
- **Burst tier(5 req / 10s)**:允許 retry burst(不誤殺合法 retry)
|
||
- **Sustained tier(20 req / min)**:阻擋持續高頻 request(attacker 不靠 burst 而是穩定打)
|
||
- 兩者**同時生效**:任一觸發即 429
|
||
|
||
### 9.4 為什麼 bandwidth quota 必補
|
||
|
||
`req count` 無法區分 attack pattern:
|
||
|
||
| 場景 | 20 req / min 是否擋住 | 實際 bandwidth |
|
||
|------|-------------------|--------------|
|
||
| Normal user 拿 1 個 100MB NEF | ❌ 不擋(合法)| 100MB(合法)|
|
||
| Attacker 用 20 req/min × 6hr × 100MB | ❌ 不擋(剛好踩線)| **720 GB / 6hr ≈ $65 cloud egress / event** |
|
||
|
||
加上 1 GB/hr bandwidth quota 後:
|
||
|
||
| 場景 | bandwidth quota 是否擋住 |
|
||
|------|----------------------|
|
||
| Normal user 拿 1 個 100MB NEF | ❌ 不擋(10 個 NEF/hr 內合法)|
|
||
| Attacker 撞 20 req/min 但 size 大 | ✅ 第 10-11 個 NEF 後 429 `bandwidth_quota_exceeded` |
|
||
| Attacker 每小時剛好 1 GB 迴避 hourly | ✅ 第 6 hr 後 daily quota 觸發 |
|
||
|
||
### 9.5 設計:複用 + 新增 middleware
|
||
|
||
**Req-based rate limit(複用既有 factory)**:
|
||
|
||
- 沿用 `src/middleware/perClientRateLimit.js` 的 `createPerClientRateLimiter` factory
|
||
- 建立 **2 個獨立 limiter instance**(burst + sustained),都用 `token_fingerprint` 為 bucket key(**需注入新 `keyGenerator`**,原 factory 用 `req.auth.clientId`)
|
||
- 不需改 factory 介面、只需擴充 `keyGenerator` opts 注入點
|
||
|
||
**Bandwidth quota(新增 middleware)**:
|
||
|
||
- 新檔 `src/middleware/resultBandwidthQuota.js`(不複用 perClientRateLimit、語意不同)
|
||
- In-memory counter(Phase 1 / Phase B:單 instance 部署、Map / 物件即可)
|
||
- pre-check + post-stream incr 雙階段(見 §9.7 實作骨幹)
|
||
- Phase 2 多 instance 部署前必須切 Redis(候補 #8)
|
||
|
||
### 9.6 Status code + response
|
||
|
||
**Req-based limit hit**:
|
||
|
||
```http
|
||
HTTP/1.1 429 Too Many Requests
|
||
Retry-After: 30
|
||
RateLimit-Limit: 20
|
||
RateLimit-Remaining: 0
|
||
RateLimit-Reset: 1700000000
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**Bandwidth quota hit**:
|
||
|
||
```http
|
||
HTTP/1.1 429 Too Many Requests
|
||
Retry-After: 3600
|
||
Content-Type: application/json
|
||
|
||
```
|
||
|
||
**為什麼 429 不是 503**:
|
||
|
||
- 429(RFC 6585)= client 端 request rate / quota 超標、client 應降速 + 指數退避
|
||
- 503 = server 暫時不可用、client 應 retry-as-is、不應降速
|
||
- 兩者語意不同;visionA 端的 retry 邏輯必須依此 code 區分
|
||
|
||
### 9.7 Wire 順序 + 實作骨幹
|
||
|
||
```javascript
|
||
// src/routes/v1/index.js
|
||
const resultBurstLimiter = createPerClientRateLimiter({
|
||
windowMs: 10 * 1000, // 10 sec
|
||
max: 5, // 5 req / 10s
|
||
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown', // ← 新 keyGen
|
||
errorDetails: { limit_type: 'burst' },
|
||
});
|
||
const resultSustainedLimiter = createPerClientRateLimiter({
|
||
windowMs: 60 * 1000, // 1 min
|
||
max: 20, // 20 req / min
|
||
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown',
|
||
errorDetails: { limit_type: 'sustained' },
|
||
});
|
||
const resultBandwidthQuota = createResultBandwidthQuota({
|
||
hourlyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES) || 1 * 1024 * 1024 * 1024,
|
||
dailyLimitBytes: Number(process.env.RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES) || 6 * 1024 * 1024 * 1024,
|
||
keyGenerator: (req) => req.auth?.tokenFingerprint || 'unknown',
|
||
});
|
||
|
||
router.use('/jobs/:id/result',
|
||
requireApiKey(), // 1. auth 先過、未驗證 401
|
||
resultBurstLimiter, // 2. burst tier
|
||
resultSustainedLimiter, // 3. sustained tier
|
||
resultBandwidthQuota, // 4. bandwidth pre-check + post-stream incr
|
||
resultStreamSemaphore, // 5. concurrent stream cap(見 §15)
|
||
createResultRouter({ ... }));
|
||
```
|
||
|
||
**順序原則**:auth 在最前(避免未驗證流量耗 quota slot);req-based 在 bandwidth 之前(req limit 比 bandwidth pre-check 廉價)。
|
||
|
||
### 9.8 Bandwidth quota 實作骨幹
|
||
|
||
```javascript
|
||
// src/middleware/resultBandwidthQuota.js(新檔)
|
||
function createResultBandwidthQuota({ hourlyLimitBytes, dailyLimitBytes, keyGenerator }) {
|
||
// In-memory counter(Phase 2 切 Redis)
|
||
// 結構:Map<key, { hourlyBytes, hourlyResetAt, dailyBytes, dailyResetAt }>
|
||
const counters = new Map();
|
||
|
||
function getOrCreate(key) {
|
||
const now = Date.now();
|
||
let c = counters.get(key);
|
||
if (!c) {
|
||
c = { hourlyBytes: 0, hourlyResetAt: now + 3600_000,
|
||
dailyBytes: 0, dailyResetAt: now + 86_400_000 };
|
||
counters.set(key, c);
|
||
}
|
||
// window reset
|
||
if (now >= c.hourlyResetAt) { c.hourlyBytes = 0; c.hourlyResetAt = now + 3600_000; }
|
||
if (now >= c.dailyResetAt) { c.dailyBytes = 0; c.dailyResetAt = now + 86_400_000; }
|
||
return c;
|
||
}
|
||
|
||
return function middleware(req, res, next) {
|
||
const key = keyGenerator(req);
|
||
const c = getOrCreate(key);
|
||
|
||
// Pre-check:用 Content-Length 估算(從 MinIO HEAD 拿、塞 req.estimatedSize)
|
||
// 若 pre-check 不確定 size、conservatively 用最大 NEF size(如 500MB)估
|
||
// 注意:實際 quota 觸發在 stream 結束時 incr、pre-check 用於避免「一口氣下載超 quota」
|
||
const estSize = req.estimatedResultSize || 0;
|
||
if (c.hourlyBytes + estSize > hourlyLimitBytes) {
|
||
const retryAfterSec = Math.ceil((c.hourlyResetAt - Date.now()) / 1000);
|
||
logAudit({ level: 'WARN', action: 'result.bandwidth_quota_exceeded',
|
||
limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec,
|
||
/* + A.7 五欄 + /result 四欄 */ });
|
||
res.setHeader('Retry-After', retryAfterSec);
|
||
return next(new ApiError(429, 'bandwidth_quota_exceeded',
|
||
'下載額度已用完,請稍後再試', { limit_type: 'bandwidth_hourly', retry_after_seconds: retryAfterSec }));
|
||
}
|
||
if (c.dailyBytes + estSize > dailyLimitBytes) {
|
||
const retryAfterSec = Math.ceil((c.dailyResetAt - Date.now()) / 1000);
|
||
// 同上、limit_type: 'bandwidth_daily'
|
||
// ...
|
||
}
|
||
|
||
// 在 res.on('finish' / 'close') 累計實際 stream 過的 bytes
|
||
res.once('finish', () => {
|
||
const bytesStreamed = res._bytesStreamed || 0; // handler 在 stream.on('data') 累計
|
||
c.hourlyBytes += bytesStreamed;
|
||
c.dailyBytes += bytesStreamed;
|
||
});
|
||
|
||
next();
|
||
};
|
||
}
|
||
```
|
||
|
||
**為什麼 pre-check + post-stream 雙階段**:
|
||
|
||
- Pre-check 防「一次性過量」:若已用 950MB、再來 200MB request 直接拒、不浪費頻寬
|
||
- Post-stream incr 才是 ground truth:實際 stream 過的 bytes(含中斷、含 partial)才算
|
||
- 兩階段組合在 worst case(attacker 同時打多個剛好不過 pre-check)下、最多多放 N × max_size(N = concurrent stream cap = 10、見 §15)
|
||
|
||
### 9.9 Multi-instance 部署的限制
|
||
|
||
當前 in-memory store(per-process counter)。Phase 1 / 0.8b 部署是單 instance、可接受。
|
||
|
||
**Phase 2 多 instance 部署前必做**(已升 HIGH、見 security.md 候補 #8):
|
||
|
||
- 切 Redis store(perClientRateLimit factory 已有 `opts.store` 注入點、bandwidth quota 用 Redis `INCRBY` + `EXPIRE` counter)
|
||
- 不然 quota 會被「乘以 instance 數」放鬆:
|
||
- 2 instance × 20 req/min = 40 req/min 實際 quota
|
||
- 2 instance × 1 GB/hr = 2 GB/hr 實際 bandwidth quota
|
||
- 同時影響 burst / sustained / hourly / daily 四軸、不可只切其一
|
||
|
||
**Redis 切換時的 audit log**:切換期間應記錄 `service.rate_limit_store_switched` 事件(含 from / to / timestamp)、forensic 用。
|
||
|
||
### 9.10 與 Q4 audit log 的關係
|
||
|
||
每次限流命中都必寫 audit log(見 §11 事件清單):
|
||
|
||
- `result.rate_limited`(含 `limit_type: 'burst' | 'sustained'`、`token_fingerprint`、`retry_after_seconds`)
|
||
- `result.bandwidth_quota_exceeded`(含 `limit_type: 'bandwidth_hourly' | 'bandwidth_daily'`、`token_fingerprint`、累計 bytes、retry_after_seconds)
|
||
|
||
forensic 用途:cluster 同 fingerprint 的限流命中 → 識別 attack pattern / abuser key。
|
||
|
||
---
|
||
|
||
## 10. Range Header / Partial Download 防護(**Security 2026-05-17 review 加強**)
|
||
|
||
> **Security 必補三件事**(review §1 Q2 / Major M3):
|
||
> 1. Server **必須**在 response 加 `Accept-Ranges: none` header(明示不支援、不是省略)
|
||
> 2. 收到 Range header 時 server **silently ignore + 回 200 整段**(不回 416、不回 206)
|
||
> 3. 收到 Range header 時 **必寫 audit log `result.range_attempted`**(forensic 用、INFO level)
|
||
|
||
### 10.1 攻擊面分析
|
||
|
||
HTTP `Range` request(RFC 7233)讓 client 拿檔案的特定 byte range。對 `/result` 這類大檔 streaming endpoint 是已知 attack vector:
|
||
|
||
| 攻擊向量 | 描述 | 對本系統 risk |
|
||
|---------|------|--------------|
|
||
| **Existence probing** | Attacker 用 `Range: bytes=0-0` 探測檔案存在(取極小 byte 確認 200 vs 410/404 差異) | 即使有 410 / 404 區分、attacker 已能 enumerate jobID。但 Phase 0.8b 已接受「拿到 key = 可下載任意 jobID」(security.md §Trust Boundary)、existence probing 的邊際 risk 接近 0 |
|
||
| **Range request DoS** | Attacker 發送大量小 Range request(每個 1 byte)、每個都觸發 MinIO read overhead、放大 server load | 有風險、但 §9 rate limit 60 req/min 上限了單一 client 的 burst |
|
||
| **Overlapping range exhaustion** | Multiple ranges in single request(`Range: bytes=0-100, 200-300, ...`)、parser 處理多 range merge 邏輯有 CVE 史(Apache CVE-2011-3192 / Nginx CVE-2022-41741 等) | 若實作 Range 必須謹慎處理 multipart/byteranges response、增加 attack surface |
|
||
| **Slow Range pattern** | Attacker 故意以慢速 Range 連線、長時間占用 MinIO connection pool | TCP 層 + Node Stream backpressure 已 mitigate、但 Range 多連線會放大 |
|
||
|
||
### 10.2 設計選擇
|
||
|
||
| 方案 | 描述 | 評估 |
|
||
|------|------|------|
|
||
| **A. 不支援 Range(推薦)** | 收到 Range header 時 **silently ignore**、回 200 + 完整 stream | 簡單、attack surface 最小 |
|
||
| B. 支援 Range + 防護 | 實作 single-range 解析、reject multi-range、加 chunk size minimum、rate limit Range count | 對 visionA 沒明確需求、額外 ~200 行 code + test、增加維護成本 |
|
||
| C. 支援 Range + 明確拒絕 multi-range | 收到 multi-range → 416 Range Not Satisfiable | 部分 mitigation、仍要實作 single-range parser |
|
||
|
||
### 10.3 推薦:方案 A(不支援 Range)
|
||
|
||
**理由**:
|
||
|
||
1. **visionA 端不需要 Range**:
|
||
- `docs/autoflow/04-architecture/conversion.md` v0.6.1 §2.3 ConverterClient.GetResult 為一次性 download、不分段
|
||
- visionA backend 拿到 NEF 後立即 stream 給 browser、沒有 resume / seek 需求
|
||
2. **NEF size 落在 single-request stream 合理範圍**:常見 < 100MB、極端 < 500MB、Node Stream + HTTP/1.1 chunked encoding 可穩定處理
|
||
3. **既有 `minio.getObjectStream` 預期回完整 stream**:實作 Range 需要傳 byteRangeStart / byteRangeEnd 到 MinIO client、增加 API surface
|
||
4. **Attack surface 最小**:不解析 Range header、不需處理 multipart/byteranges response、不需 CVE-history-aware parser
|
||
|
||
### 10.4 實作細節
|
||
|
||
**收到 Range header 時的行為**:
|
||
|
||
```javascript
|
||
// src/routes/v1/result.js
|
||
router.get('/', async (req, res, next) => {
|
||
// ... 既有 1-5 步:拿 job / 檢查 status / expires / nefKey / MinIO stream
|
||
|
||
// 6. 設 headers
|
||
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
|
||
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
|
||
res.setHeader('Content-Disposition', `attachment; filename="${buildFilename(job)}"`);
|
||
|
||
// 重要:明確不支援 Range request
|
||
//
|
||
// 設計:Range header 收到時 silently ignore、回 200 + 完整 stream
|
||
// 不回 416:避免 attacker 透過 416 (有 Range support) vs 200 (沒有) 差異探測
|
||
// 不設 'Accept-Ranges: bytes':避免暗示 client 可 retry with Range
|
||
res.setHeader('Accept-Ranges', 'none'); // RFC 7233 §2.3 明確標示 server 不支援
|
||
|
||
// 7. Stream pipe(既有)
|
||
result.stream.pipe(res);
|
||
});
|
||
```
|
||
|
||
**為什麼不回 416**:
|
||
|
||
- 416 (Range Not Satisfiable) 是「Range syntactically valid 但檔案範圍不符」
|
||
- 如果 client 帶 Range header、我們回 416、attacker 知道 server **能 parse Range**(只是這次拒絕)
|
||
- 如果 client 帶 Range header、我們 silently ignore + 回 200 完整 stream、attacker 看不到 server 是否懂 Range
|
||
- 後者更安全(feature detection 失敗)+ 對 well-behaved client 完全相容(沒 Range 也能正常處理 200)
|
||
|
||
**為什麼設 `Accept-Ranges: none` 而非省略此 header**:
|
||
|
||
- RFC 7233 §2.3:server 應**明示**支援狀態
|
||
- `Accept-Ranges: none` 等於明確告知 client「不要試 Range」
|
||
- 省略 header 時 client 仍可能 speculatively 試 Range(HTTP 預設假設可能支援)
|
||
|
||
### 10.5 對 visionA 端的契約變動
|
||
|
||
**API spec 加註**:
|
||
- visionA 端發 Range header 不會得到 206 Partial Content;server 永遠回 200 完整 stream
|
||
- 若 visionA 未來真有 resume / seek 需求、需重新評估(Phase 2 候補)
|
||
|
||
**文件化到 §2.2 Query / Body**:「無;Range header 收到時 ignored」
|
||
|
||
### 10.6 監控建議 + 必寫 audit event
|
||
|
||
收到 Range header 時、handler **必須**寫獨立 audit event `result.range_attempted`(不是只在 `result.requested` 加 boolean flag):
|
||
|
||
```javascript
|
||
// handler 進入後、處理 Range 之前:
|
||
if (req.headers && req.headers.range) {
|
||
logAudit({
|
||
level: 'INFO', // 不是 WARN——預期 attacker 會 probe、是 forensic baseline、不該觸發告警
|
||
action: 'result.range_attempted',
|
||
// A.7 五欄
|
||
source_ip: req.ip,
|
||
token_fingerprint: req.auth?.tokenFingerprint,
|
||
request_id: req.requestId,
|
||
http_method: 'GET',
|
||
http_path: req.originalUrl,
|
||
// /result 特有
|
||
job_id: req.params.id,
|
||
// event 特有
|
||
range_header_received: String(req.headers.range).slice(0, 100), // sanitize 截短 100 字、避免 log injection
|
||
});
|
||
}
|
||
```
|
||
|
||
**為什麼 INFO 而不是 WARN**:
|
||
|
||
- Range header 本身**不是 attack**(HTTP/1.1 standard、許多 client 自動發)
|
||
- 但 Range header 出現在 `/result` 是 **anomalous signal**(visionA 端不會發)→ 值得記錄、不該告警
|
||
- WARN 留給「真正異常」(rate limited / stream timeout / minio failed)
|
||
|
||
**正常 vs 異常 pattern**:
|
||
|
||
- 正常情境:`range_header_received` 欄位幾乎不出現(visionA 不發 Range)
|
||
- 異常 pattern:突然出現大量 `result.range_attempted` 同 token_fingerprint → 可能有 attacker 試探 Range support / 試探不同 byte range
|
||
|
||
**Anomaly detection 候選**(Phase 2):
|
||
|
||
- alert:同 fingerprint 1 小時內 > 10 次 `result.range_attempted` → 觸發人工 review
|
||
- alert:同 fingerprint 短時間內試多個不同 Range value → 觸發 forensic snapshot
|
||
|
||
**注意:原 `result.requested` 已被新事件清單取代**(見 §11、改為 `result.streamed` / `result.stream_error` 等明確終態事件、不再有 `result.requested` 進入事件)。
|
||
|
||
---
|
||
|
||
## 11. Audit Log(Phase B 沿用 A.7 pattern,**Security 2026-05-17 review 後擴充**)
|
||
|
||
> **Security 必補**(review §1 Q4 / Minor m3):
|
||
> - 補 3 個事件:`result.rate_limited`、`result.range_attempted`、`result.stream_timeout`、`result.bandwidth_quota_exceeded`(實作層 4 個)
|
||
> - 所有 `result.*` 事件**強制**含 A.7 五欄位 + /result 特有四欄位
|
||
> - 100% 寫(不 sample)—— 流量低(P95 < 1000 req/day)、bandwidth quota forensic 需要完整資料
|
||
|
||
### 11.1 設計原則
|
||
|
||
對齊 `apiKeyMiddleware.js` A.7 audit log pattern:
|
||
- 結構化 JSON(stdout)
|
||
- 統一用 `console.log` / `console.error`(與既有 audit log infra 一致)
|
||
- token 內容絕不寫;fingerprint 由 `requireApiKey` middleware 處理、handler 從 `req.auth.tokenFingerprint` 讀取後寫入每個 audit event
|
||
- **每個事件必含 A.7 五欄位**(不可省略)+ **每個 `/result` 事件必含 4 個 endpoint-特有欄位**
|
||
|
||
### 11.2 A.7 五欄位(所有事件必含)
|
||
|
||
| 欄位 | 來源 | 為什麼必含 |
|
||
|------|------|---------|
|
||
| `source_ip` | `req.ip`(trust proxy 已設定)| forensic:cluster attacker IP |
|
||
| `token_fingerprint` | `req.auth.tokenFingerprint`(A.7 已實作 SHA-256) | forensic:cluster 同 key 攻擊 + rate limit / bandwidth quota bucket key 對齊 |
|
||
| `request_id` | `req.requestId`(middleware 設定)| cross-event 追蹤(串 `auth.api_key.authenticated` ↔ `result.*`) |
|
||
| `http_method` | `'GET'`(固定)| A.7 對齊(即使固定值也寫、log analysis 一致性)|
|
||
| `http_path` | `req.originalUrl` | A.7 對齊、forensic 確認 endpoint |
|
||
|
||
### 11.3 `/result` 特有四欄位(按事件類型必含或可選)
|
||
|
||
| 欄位 | 何時必含 | 何時可選 / 不適用 |
|
||
|------|--------|----------------|
|
||
| `job_id` | 所有事件(從 `req.params.id` 取)| — |
|
||
| `size_bytes` | `result.streamed`(成功)、`result.stream_error`、`result.client_closed`、`result.stream_timeout`(已 stream 多少)| 4xx 終態事件不適用(還沒開始 stream)|
|
||
| `duration_ms` | 所有「終態」事件(streamed / stream_error / client_closed / stream_timeout / not_*)| `range_attempted` 不適用(不是終態)|
|
||
| `stream_completed` | `result.streamed`(true)、`result.stream_error`(false)、`result.client_closed`(false)、`result.stream_timeout`(false)| 4xx 終態事件不適用 |
|
||
|
||
### 11.4 事件清單(11 個事件、實作覆蓋 Security review Q4 + Architect 原設計)
|
||
|
||
| Action | Level | 觸發時機 | 必含欄位(A.7 五欄 + /result 四欄之外) |
|
||
|--------|-------|---------|--------------------------------|
|
||
| `result.streamed` | INFO | Stream 完整送出(`stream.on('end')` 且 bytes = content_length)| `content_length` |
|
||
| `result.stream_error` | ERROR | Stream 中途出錯(MinIO disconnect / network)| `error_type`、`error_message`(截短 100 chars) |
|
||
| `result.client_closed` | INFO | Client 主動斷線(`req.on('close')` + bytes < content_length)| — |
|
||
| `result.stream_timeout` | **WARN** | response stream 5min timeout 觸發(**Security 必補**)| `timeout_ms`、`bytes_streamed_at_timeout` |
|
||
| `result.not_found` | WARN | 404 `job_not_found` / `result_not_found` | `reason: 'job_not_found' \| 'no_nef_key'` |
|
||
| `result.not_completed` | WARN | 409 `job_not_completed` | `current_status` |
|
||
| `result.expired` | WARN | 410 `result_expired` | `expires_at`、`expired_by_ms`(now - expires_at)|
|
||
| `result.storage_unavailable` | ERROR | 502 `storage_unavailable`(MinIO 連不上 / throw)| `error_name`、`error_code`(**不**含 MinIO endpoint URL)|
|
||
| `result.rate_limited` | **WARN** | 429 rate limit hit(**Security 必補**)| `limit_type: 'burst' \| 'sustained'`、`retry_after_seconds` |
|
||
| `result.bandwidth_quota_exceeded` | **WARN** | 429 bandwidth quota hit(**Security 必補**)| `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`、`bytes_used_in_window`、`retry_after_seconds` |
|
||
| `result.range_attempted` | **INFO** | Request 含 Range header(**Security 必補**、forensic baseline)| `range_header_received`(sanitize 截短 100 字)|
|
||
| `result.filename_assertion_failed` | ERROR | `buildFilename` assertion 失敗(**defense-in-depth**、見 §13)| `expected_pattern`、`actual_filename`(已 sanitize 截短) |
|
||
|
||
**為什麼移除 `result.requested`**:
|
||
|
||
- 原設計用 `result.requested` 作為「進入事件」+ 加 `range_header_present: boolean` 表達 Range 偵測
|
||
- 新設計:
|
||
- `result.range_attempted` 變成獨立的 forensic event(更清楚的 anomaly signal)
|
||
- `auth.api_key.authenticated`(A.7 已寫)已涵蓋「caller 進來」的紀錄、`result.requested` 冗餘
|
||
- 每個 request 必有一個**終態事件**(streamed / stream_error / client_closed / stream_timeout / not_* / rate_limited / bandwidth_quota_exceeded / range_attempted 之一)、用 request_id 串接到 `auth.api_key.authenticated` 即可完整追蹤
|
||
- 減少 log volume(每 request 1 個終態 vs 2 個進入+終態)
|
||
|
||
### 11.5 `error_type` 分類(stream 中斷)
|
||
|
||
| `error_type` | 觸發 | Level 例外 |
|
||
|-------------|------|-----------|
|
||
| `minio_disconnect` | MinIO stream emit error / socket reset | — |
|
||
| `client_abort` | Client 端先斷(與 `client_closed` 區分:client_closed 是 req close、stream_error 是 stream emit error) | — |
|
||
| `network` | 其他 network 層錯誤(DNS / TLS) | — |
|
||
| `partial_stream` | `streamCompleted=false` 且 `res.on('finish')` 觸發的 race condition:最可能是 `res.destroy()` 後 underlying socket 先 flush 完 buffered chunk 再 emit `finish`(client 中斷下載 / network slow drain 的衍生情境)、或 backpressure 異常 | **INFO**(覆寫 §11.4 的 ERROR、屬 client-side expected behaviour、不是 attack signal)|
|
||
| `unknown` | 兜底 | — |
|
||
|
||
**Level 例外處理規則**:§11.4 將 `result.stream_error` 預設為 ERROR、但 `error_type = partial_stream` 的 race condition 屬 expected client behaviour(非 server 異常、非攻擊訊號)、降為 INFO 以避免污染 ERROR alert pipeline。實作位置:`apps/task-scheduler/src/routes/v1/result.js`(`res.on('finish')` handler 內)。
|
||
|
||
### 11.6 audit log 範例
|
||
|
||
**Happy path(成功 stream)**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"auth.api_key.authenticated","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result"}
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:48Z","level":"INFO","action":"result.streamed","request_id":"abc-123","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":52428800,"duration_ms":3210,"stream_completed":true,"content_length":52428800}
|
||
```
|
||
|
||
**Rate limit hit**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.rate_limited","request_id":"def-456","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":2,"limit_type":"burst","retry_after_seconds":10}
|
||
```
|
||
|
||
**Bandwidth quota hit**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.bandwidth_quota_exceeded","request_id":"ghi-789","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":3,"limit_type":"bandwidth_hourly","bytes_used_in_window":1073741824,"retry_after_seconds":2847}
|
||
```
|
||
|
||
**Range probing**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"INFO","action":"result.range_attempted","request_id":"jkl-012","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","range_header_received":"bytes=0-7"}
|
||
```
|
||
|
||
**Stream timeout**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.stream_timeout","request_id":"mno-345","source_ip":"10.0.1.5","token_fingerprint":"sha256:b8e2...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","size_bytes":1024,"duration_ms":300000,"stream_completed":false,"timeout_ms":300000,"bytes_streamed_at_timeout":1024}
|
||
```
|
||
|
||
**Expired**:
|
||
```json
|
||
{"service":"task-scheduler","timestamp":"2026-05-17T01:23:45Z","level":"WARN","action":"result.expired","request_id":"pqr-678","source_ip":"10.0.1.5","token_fingerprint":"sha256:a3f9...","http_method":"GET","http_path":"/api/v1/jobs/job-xyz/result","job_id":"job-xyz","duration_ms":15,"expires_at":"2026-05-10T00:00:00Z","expired_by_ms":604800000}
|
||
```
|
||
|
||
### 11.7 不寫 log 的事
|
||
|
||
對齊 A.7 + `auth.md` §1.8 原則:
|
||
- ❌ NEF binary 內容(任何 byte)
|
||
- ❌ Token 原文(fingerprint 已由 `requireApiKey` middleware 處理)
|
||
- ❌ 完整 MinIO endpoint URL(避免 infra topology leak)
|
||
- ❌ 完整 `Authorization` header value
|
||
- ❌ Stack trace(截短 message 即可)
|
||
- ❌ Range header 原文超過 100 字(截短 + 標 `...`)
|
||
|
||
### 11.8 Sample 策略
|
||
|
||
**全 100% 寫、不 sample**:
|
||
|
||
- 流量低(P95 < 1000 req/day、Phase B 估算)
|
||
- bandwidth quota forensic 需要完整資料(任何 byte 都進 quota counter、log 漏寫 = forensic 漏)
|
||
- anomalous events(rate_limited / range_attempted / stream_timeout / bandwidth_quota_exceeded)一律 100% 寫、cluster 用
|
||
|
||
**未來如流量上升到 > 100k/day**(Phase 2 candidate):
|
||
- 考慮 sample `result.streamed` 到 10%
|
||
- 但**保留 100% 的 4xx/5xx + 所有 anomalous events**
|
||
|
||
### 11.9 與 forensic 的關係
|
||
|
||
**cross-event 追蹤**:用 `request_id` 串接 `auth.api_key.authenticated`(middleware 已寫)→ `result.*`(handler 寫終態)兩個事件。
|
||
|
||
**新增 cross-fingerprint 追蹤**:用 `token_fingerprint` cluster 同 caller 的所有事件:
|
||
|
||
- 1 個 fingerprint + 大量 `result.rate_limited` → identify abuser / mis-configured caller
|
||
- 1 個 fingerprint + 大量 `result.range_attempted` → identify Range probing attempt
|
||
- 1 個 fingerprint + 大量 `result.bandwidth_quota_exceeded` → identify mass-download attempt
|
||
- 1 個 fingerprint + 大量 `result.stream_timeout` → identify slowloris attack
|
||
|
||
---
|
||
|
||
## 12. 404 vs 410 區分的 Security Trade-off
|
||
|
||
### 12.1 問題
|
||
|
||
§4.1 規定 4 種「找不到」情境:
|
||
|
||
| HTTP | error.code | 情境 |
|
||
|------|-----------|------|
|
||
| 404 | `job_not_found` | jobID 不存在 |
|
||
| 404 | `result_not_found` | job 完成但無 NEF |
|
||
| 410 | `result_expired` | NEF 已過期清除 |
|
||
|
||
**Security 觀點的疑慮**:區分 404 vs 410 讓 attacker 能偵測「jobID 是否曾存在」:
|
||
|
||
- 帶 jobID X → 回 404 `job_not_found` → X **從未存在**
|
||
- 帶 jobID Y → 回 410 `result_expired` → Y **曾存在、但 NEF 已過期**
|
||
|
||
→ Attacker 可枚舉 jobID 空間、區分「unused」vs「used-but-expired」、收集 victim activity pattern。
|
||
|
||
### 12.2 評估:對本系統 risk 接近 0
|
||
|
||
**前提條件**:Phase 0.8b 已接受 `security.md §Trust Boundary` 風險模型——**attacker 拿到 CONVERTER_API_KEY = 可下載任意 jobID 的 NEF**(per-job auth 在 Phase 2 候補 #12)。
|
||
|
||
在這個前提下:
|
||
|
||
| Attacker 能力 | 區分 404/410 帶來的 marginal risk |
|
||
|-------------|-------------------------------|
|
||
| 拿到 key、知道某個有效 jobID | 直接下載 NEF;404/410 區分不增加能力 |
|
||
| 拿到 key、不知道有效 jobID | jobID 是 UUIDv4(128 bits)、暴力枚舉不可行(即使區分 404/410 也救不了 attacker) |
|
||
| 拿到 key + 某個 leaked / guessable jobID 列表 | 區分 404/410 確實讓 attacker 知道哪些 ID 曾存在;**但他已能直接下載**,知道過期或不過期的價值極低 |
|
||
|
||
→ **區分 404/410 的 marginal risk 在當前 trust model 下接近 0**。
|
||
|
||
### 12.3 Trade-off 的另一面:UX / debug 價值
|
||
|
||
保持 404 / 410 區分的好處:
|
||
|
||
- visionA 端 client 可區分「user 給錯 jobID」(404、提示 user 重新確認)vs「job 已過期」(410、提示 user 重新轉檔)—— UX 訊息精度差異大
|
||
- TODO-v2 §4.1 已寫定、visionA backend ConverterClient.GetResult 已實作對應 mapping,**改回統一 404 會 break visionA 端契約**
|
||
- debug 友善:log 看 410 vs 404 能立即知道是 lifecycle 清除還是錯誤 jobID
|
||
|
||
### 12.4 決策:**保持 TODO-v2 §4.1 規格不變**(404 / 410 分開)
|
||
|
||
**理由**:
|
||
|
||
1. 在當前 trust model 下、區分帶來的 risk 是 marginal 的;攻擊面已被 §Trust Boundary 接受
|
||
2. UX / debug 價值大、且 visionA 端契約已固定
|
||
3. Phase 2 候補 #12(per-job auth)才是根本解;補了 per-job auth 後、attacker 無法下載非自己的 job、404/410 區分問題自動消失
|
||
|
||
### 12.5 同步 Phase 2 候補
|
||
|
||
`security.md` Phase 2 候補 #12 「`/result` per-job authorization」標 **MEDIUM** 優先級(A.7 follow-up §4 已升級)。本決策在 #12 完成前**有效**;完成後可考慮統一錯誤回應為 404(移除 jobID enumeration 攻擊面、與 #12 形成 defense-in-depth)。
|
||
|
||
### 12.6 文件化的事
|
||
|
||
- TODO-v2 §4.1 規格**不變**
|
||
- 本 trade-off 寫入 `security.md` 變更歷史(Architect 下次更新 security.md 時補一行 entry)
|
||
- audit log(§11)的 `result.not_found` / `result.expired` 區分依此規格
|
||
|
||
---
|
||
|
||
## 13. `source_filename` 寫入點調查 + Backend Acceptance Criteria
|
||
|
||
### 13.1 調查結果
|
||
|
||
`buildFilename(job)`(§3.2)讀 `job.source_filename`。**Grep 結論**:
|
||
|
||
```bash
|
||
grep -rn "source_filename" apps/task-scheduler/src
|
||
# 結果:0 命中
|
||
```
|
||
|
||
**現況**:`src/routes/v1/jobs.js` createJob handler 在 `jobRecord` 內**完全沒寫** `source_filename` 欄位(line 721-774 jobRecord 構造處)。Worker / Web UI legacy / API v1 全鏈條都沒有寫入點。
|
||
|
||
**已寫入的相關欄位**:
|
||
- `jobRecord.input.filename`(line 741)— **已 sanitized** 的 `safeFilename`(如 `model.onnx` → `model.onnx`、特殊字元 stripped)
|
||
- `safeModelFilename` 來自 `sanitizeFilename(modelFile.originalname || 'model')`(validators/createJob.js:240)
|
||
|
||
### 13.2 設計選擇
|
||
|
||
| 方案 | 描述 | 評估 |
|
||
|------|------|------|
|
||
| **A. Backend 在 createJob handler 加 `source_filename`** | jobRecord 新增 `source_filename: modelFile.originalname \|\| null`(**原始未 sanitize**)| 與 §3.2 「sourceFilename 已 sanitized 不再二次 sanitize」說法衝突;存原始 originalname 有 XSS / log injection 風險 |
|
||
| **B. Backend 寫 sanitized 的 stem**(推薦) | jobRecord 新增 `source_filename: input.safeFilename`(與 `input.filename` 同值、是已 sanitized 的安全字串)| 對齊 §3.2 假設;無安全風險;冗餘但語意清楚 |
|
||
| C. `buildFilename` 改讀 `job.input.filename` | 直接用既有欄位、不新增 `source_filename`| 最少改動;缺點:`input.filename` 是「input file 的 safe name」、語意不是「source filename for output」、未來如改用其他來源(如 metadata)會散在多處 |
|
||
| D. Backend 改 `buildFilename` 退化 fallback | 若 `job.source_filename` 缺、改讀 `job.input.filename`;都缺才 fallback 到 `job_<jobId>.nef`| 容錯性好、但隱式相依造成除錯困難 |
|
||
|
||
### 13.3 推薦:方案 B + 容錯保留 D
|
||
|
||
**Backend B1 任務 acceptance criteria**:
|
||
|
||
#### B1.1 createJob handler 補寫 `source_filename`
|
||
|
||
**檔案**:`apps/task-scheduler/src/routes/v1/jobs.js` line 721-774(jobRecord 構造)
|
||
|
||
**改動**:在 `input` 物件**之上**(jobRecord 頂層)新增 `source_filename`:
|
||
|
||
```javascript
|
||
const jobRecord = {
|
||
job_id: jobId,
|
||
status: 'ONNX',
|
||
// ... 既有欄位
|
||
|
||
// Phase B 新增:給 GET /jobs/:id/result 構造 download filename 用
|
||
// 來源是已 sanitized 的 safeFilename(與 input.filename 同值;冗餘但語意清楚)
|
||
// 為什麼不存原始 modelFile.originalname:
|
||
// - originalname 可能含 XSS / 控制字元 / path traversal pattern
|
||
// - 即使 Content-Disposition header 不會被 browser render,仍可能在 log / error message 處被 echo
|
||
// - sanitized 版本是 defense-in-depth
|
||
source_filename: input.safeFilename,
|
||
|
||
input: {
|
||
filename: input.safeFilename,
|
||
// ... 既有
|
||
},
|
||
// ... 其他既有欄位
|
||
};
|
||
```
|
||
|
||
#### B1.2 acceptance criteria checklist
|
||
|
||
- [ ] `jobRecord.source_filename` 寫入點存在(line ~740 附近、`input.safeFilename` 取得後)
|
||
- [ ] 寫入的值**必須**是 sanitized 字串(`input.safeFilename`、不是 `modelFile.originalname`)
|
||
- [ ] 寫入點在 `claimActiveAndCreate` 之前(jobRecord 構造階段、不要事後 update)
|
||
- [ ] 既有 job(無 `source_filename` 欄位)讀取時 `buildFilename` fallback 仍可運作(§3.2 fallback 邏輯)
|
||
- [ ] Unit test cover:
|
||
- `source_filename` 寫入後讀取(happy path)
|
||
- `source_filename` 為空字串時 `buildFilename` fallback
|
||
- `source_filename` 為 undefined 時 `buildFilename` fallback(向後相容、既有 job)
|
||
|
||
#### B1.3 `buildFilename` 容錯邏輯確認
|
||
|
||
§3.2 fallback 邏輯保留:
|
||
|
||
```javascript
|
||
function buildFilename(job) {
|
||
const sourceFilename = job.source_filename || '';
|
||
const platform = (job.platform || '').toLowerCase();
|
||
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
|
||
if (stem && platform) {
|
||
return `${stem}_${platform}.nef`;
|
||
}
|
||
return `job_${job.job_id || 'unknown'}.nef`;
|
||
}
|
||
```
|
||
|
||
**向後相容**:既有 job 無 `source_filename` → fallback 到 `job_<jobId>.nef`(不會 crash、不會洩漏 jobID 以外資訊)。
|
||
|
||
#### B1.4 platform 欄位調查(同步處理)
|
||
|
||
`buildFilename` 也讀 `job.platform`。Grep `apps/task-scheduler/src/`:
|
||
|
||
```bash
|
||
grep -rn "platform" apps/task-scheduler/src/routes/v1/jobs.js | grep -v "//"
|
||
```
|
||
|
||
**Backend B1 需驗證**:`job.platform` 在 createJob handler 寫入點存在(透過 `parameters.platform` 或頂層 `job.platform`)。本次 Architect 不在 grep range 內逐行驗證、留給 Backend 在 B1 任務內順手確認。若也缺、補同樣 acceptance criteria。
|
||
|
||
### 13.4 安全考量
|
||
|
||
- **不存原始 `originalname`**:原始檔名可能含 XSS payload / 控制字元 / RTL override / 超長字串
|
||
- **Sanitized 版本已 enforce**:`safeFilename` 經過 `sanitizeFilename`、白名單字元、截長 200、leading-dot removal(見 `security.md §Input Validation`)
|
||
- **Content-Disposition header 注入**:filename 寫進 `Content-Disposition: attachment; filename="..."`、若含未跳脫的 `"` 可能 break header;sanitized 版本已禁止 `"` / `\`、安全
|
||
|
||
### 13.4a Defense-in-Depth:Content-Disposition Header Construction(**Security 2026-05-17 review 必補**)
|
||
|
||
> **Security 必補**(review §1 Q6 / Minor m1):即使 `sanitizeFilename` 已堵 `"` / `\`、`Content-Disposition` header 仍須**明確** quote-escape + RFC 5987 fallback + buildFilename assertion。這是 defense-in-depth:防後續 sanitize 升級時意外引入 bug。
|
||
|
||
#### (1) Belt-and-suspenders quote-escape
|
||
|
||
即使 sanitize 應已堵 quotes / backslash、`Content-Disposition` 寫入時**仍須明確 escape**:
|
||
|
||
```javascript
|
||
// 在 setHeader 之前:
|
||
const filename = buildFilename(job);
|
||
const escapedFilename = filename.replace(/[\\"]/g, '\\$&'); // 雙重轉義 \ 和 "
|
||
```
|
||
|
||
**為什麼需要**:
|
||
|
||
- Sanitize 與 setHeader 在不同檔案、未來 sanitize 升級時可能放寬允許某些字元(如允許中文)、若沒在 setHeader 端 escape、header 注入風險回流
|
||
- Defense-in-depth 原則:每一層都應該負責自己邊界的安全、不依賴上游
|
||
|
||
#### (2) RFC 5987 `filename*` fallback(為未來 unicode 支援預留)
|
||
|
||
Phase B 階段 `sanitizeFilename` 限制 ASCII alnum + `._-`、不會有 non-ASCII;但 header construction 仍**預留 `filename*` extended syntax hook**:
|
||
|
||
```javascript
|
||
res.setHeader('Content-Disposition',
|
||
`attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`);
|
||
```
|
||
|
||
**為什麼預留**:
|
||
|
||
- Phase 2 若放寬 sanitize 允許 unicode(如中文檔名 `模型_kl720.nef`)、必須補 `filename*` parameter(RFC 5987)才能讓 browser 正確顯示
|
||
- 預留 hook 是 zero-cost、Phase 2 unicode 開放時不需改 header construction code
|
||
- 對當前 ASCII-only 場景無副作用:`filename` 與 `filename*` 同值、browser 優先用 `filename*`(如支援)、否則 fallback `filename`
|
||
|
||
#### (3) buildFilename assertion(防 sanitize 升級時意外 bug)
|
||
|
||
`buildFilename` 結尾加 sanitization re-check assertion:
|
||
|
||
```javascript
|
||
function buildFilename(job) {
|
||
const sourceFilename = job.source_filename || '';
|
||
const platform = (job.platform || '').toLowerCase();
|
||
const stem = sourceFilename.replace(/\.(onnx|tflite|pb|h5|pt|pth)$/i, '');
|
||
const candidate = (stem && platform)
|
||
? `${stem}_${platform}.nef`
|
||
: `job_${job.job_id || 'unknown'}.nef`;
|
||
|
||
// Defense-in-depth assertion:
|
||
// 確保 buildFilename 結果仍符合白名單(catch sanitize 升級時意外引入的 bug)
|
||
if (!/^[A-Za-z0-9._-]+$/.test(candidate)) {
|
||
// 不該發生;fail-secure:log + fallback 到絕對安全的 jobID-only 命名
|
||
logAudit({
|
||
level: 'ERROR',
|
||
action: 'result.filename_assertion_failed',
|
||
// A.7 五欄 + /result 四欄
|
||
// ...
|
||
expected_pattern: '^[A-Za-z0-9._-]+$',
|
||
actual_filename: candidate.slice(0, 100), // 截短 100 字、避免 log injection
|
||
});
|
||
return `job_${job.job_id || 'unknown'}.nef`; // 絕對安全 fallback(UUIDv4 保證符合白名單)
|
||
}
|
||
|
||
return candidate;
|
||
}
|
||
```
|
||
|
||
**為什麼 fail-secure 而非 throw**:
|
||
|
||
- `/result` 必須回給 visionA 端 NEF;assertion 失敗時 throw 會導致整個 request 502、影響合法用戶
|
||
- Fallback 到 `job_<jobId>.nef`(UUIDv4 保證符合白名單)讓 stream 仍能完成、但 audit log 記錄異常
|
||
- 異常頻率應為 0(assertion 觸發即代表上游 sanitize 有 bug、需立即修復);audit log 即告警入口
|
||
|
||
#### (4) Backend assertion test 必補
|
||
|
||
Backend 必補 unit test:
|
||
|
||
```javascript
|
||
describe('buildFilename assertion', () => {
|
||
it('returns sanitized result when input is clean', () => {
|
||
expect(buildFilename({ source_filename: 'model.onnx', platform: 'kl720', job_id: 'uuid' }))
|
||
.toBe('model_kl720.nef');
|
||
});
|
||
|
||
it('falls back to job_<id>.nef when source_filename has invalid chars', () => {
|
||
// 模擬上游 sanitize bug:傳入含 `"` 的 stem(不該發生、但 defense-in-depth)
|
||
const result = buildFilename({ source_filename: 'evil".onnx', platform: 'kl720', job_id: 'safe-uuid' });
|
||
expect(result).toBe('job_safe-uuid.nef');
|
||
// + assert audit log 含 result.filename_assertion_failed
|
||
});
|
||
|
||
it('falls back when buildFilename result somehow contains invalid char', () => {
|
||
// hypothetical:platform 含非法字元(不該發生)
|
||
const result = buildFilename({ source_filename: 'model', platform: 'kl 720', job_id: 'safe-uuid' });
|
||
expect(result).toBe('job_safe-uuid.nef');
|
||
});
|
||
});
|
||
```
|
||
|
||
#### (5) `Accept-Ranges: none` header 同步設定
|
||
|
||
§10 已規範、與 Content-Disposition 一起在 §6.2 stream pipe 前 setHeader:
|
||
|
||
```javascript
|
||
// 在 stream.pipe(res) 之前:
|
||
res.setHeader('Content-Type', result.contentType || 'application/octet-stream');
|
||
if (result.contentLength) res.setHeader('Content-Length', String(result.contentLength));
|
||
res.setHeader('Content-Disposition',
|
||
`attachment; filename="${escapedFilename}"; filename*=UTF-8''${encodeURIComponent(filename)}`);
|
||
res.setHeader('Accept-Ranges', 'none'); // 明示不支援 Range
|
||
```
|
||
|
||
### 13.5 對既有 job 的影響
|
||
|
||
`source_filename` 是新增欄位、**不需要 migration**:
|
||
|
||
- 既有 job(Phase A 前建立)`source_filename === undefined` → `buildFilename` fallback 到 `job_<jobId>.nef`
|
||
- Phase B 後建立的 job 都有 `source_filename` → 正常 stem-based filename
|
||
- 兩種 job 同時存在的過渡期 ~7 天(既有 job 過期清掉後完全消失)
|
||
|
||
---
|
||
|
||
## 15. Streaming Resource Limits(**Security 2026-05-17 review 必補**)
|
||
|
||
> **Security 必補**(review §1 Q1 / Major M1 + M2):streaming endpoint 特有的攻擊面 mitigation。Phase B 啟動必做、不可延後。
|
||
|
||
### 15.1 攻擊面與限制總表
|
||
|
||
| 攻擊面 | 限制 | 預設值 | 觸發行為 |
|
||
|--------|------|-------|---------|
|
||
| **Slowloris(慢讀霸佔 connection)** | Stream response timeout | **5 分鐘**(300_000 ms) | destroy res + destroy stream + log `result.stream_timeout` |
|
||
| **Connection exhaustion(同時開大量 stream)** | Concurrent stream cap(per-instance) | **10 同時 stream** | 503 `service_busy` + `Retry-After: 30` + log `result.rate_limited`(limit_type: concurrent)|
|
||
| **Bandwidth abuse** | (見 §9 bandwidth quota)| 1 GB/hr + 6 GB/24hr | 429 `bandwidth_quota_exceeded`(見 §9)|
|
||
| **Disk I/O DoS** | (由 concurrent stream cap 順帶 mitigate)| — | — |
|
||
|
||
### 15.2 Stream Response Timeout(M1)
|
||
|
||
#### 為什麼必補
|
||
|
||
Node http server 預設:
|
||
|
||
- `server.timeout = 0`(**無上限**)
|
||
- `server.headersTimeout = 60_000ms`(只管 header 接收)
|
||
- `server.requestTimeout = 300_000ms`(Node 18+、管 request 完成)
|
||
- **沒有 response write timeout**——attacker 連線後超慢讀(1 byte/30s)可霸佔 Node socket + MinIO upstream connection 數小時
|
||
|
||
#### 設計
|
||
|
||
`res.setTimeout(STREAM_TIMEOUT_MS)`,預設 5 分鐘(300_000 ms)。
|
||
|
||
**5 分鐘 rationale**(量化):
|
||
|
||
- NEF 最大 size:500 MB(合理上限、實際 < 100 MB 為主)
|
||
- 5 min 最低 throughput 容忍:500 MB / 5 min = **100 MB/min ≈ 1.7 MB/s ≈ 13.3 Mbps**
|
||
- 合法 client 即使在中等網路(10 Mbps)也能在 5 min 內拿完整個 500MB
|
||
- Attacker 用 < 13.3 Mbps 拉 = 5 min 內必被 timeout 切斷
|
||
- 即使 client 真的有正當原因網路慢(如行動網路 3G)、5 min 上限仍足夠應付實際使用
|
||
|
||
**可由 env 覆寫**:`RESULT_STREAM_TIMEOUT_MS`(預設 300_000)。Phase 2 如有 ultra-large NEF(GB 級)支援、可調高。
|
||
|
||
#### 實作骨幹
|
||
|
||
```javascript
|
||
// 在 setHeader 之後、stream.pipe(res) 之前:
|
||
const STREAM_TIMEOUT_MS = Number(process.env.RESULT_STREAM_TIMEOUT_MS) || 300_000;
|
||
let bytesStreamed = 0;
|
||
const streamStartAt = Date.now();
|
||
|
||
// 累計 stream 過的 bytes(給 audit log + bandwidth quota incr 用)
|
||
result.stream.on('data', (chunk) => {
|
||
bytesStreamed += chunk.length;
|
||
});
|
||
|
||
// 設 timeout
|
||
res.setTimeout(STREAM_TIMEOUT_MS, () => {
|
||
logAudit({
|
||
level: 'WARN',
|
||
action: 'result.stream_timeout',
|
||
// A.7 五欄
|
||
source_ip: req.ip,
|
||
token_fingerprint: req.auth?.tokenFingerprint,
|
||
request_id: req.requestId,
|
||
http_method: 'GET',
|
||
http_path: req.originalUrl,
|
||
// /result 四欄
|
||
job_id: jobId,
|
||
size_bytes: bytesStreamed,
|
||
duration_ms: Date.now() - streamStartAt,
|
||
stream_completed: false,
|
||
// event 特有
|
||
timeout_ms: STREAM_TIMEOUT_MS,
|
||
bytes_streamed_at_timeout: bytesStreamed,
|
||
});
|
||
// 同步呼叫 req.setTimeout 確保兩端都被清理
|
||
if (result && result.stream && typeof result.stream.destroy === 'function') {
|
||
result.stream.destroy();
|
||
}
|
||
if (!res.destroyed) res.destroy(new Error('Response stream timeout'));
|
||
});
|
||
|
||
// 另外設 req.setTimeout 同樣值(兩端覆蓋)
|
||
req.setTimeout(STREAM_TIMEOUT_MS);
|
||
|
||
// 然後 stream.pipe(res)
|
||
result.stream.pipe(res);
|
||
```
|
||
|
||
### 15.3 Concurrent Stream Cap(M2)
|
||
|
||
#### 為什麼必補 + 為什麼新寫不複用 `uploadConcurrency.js`
|
||
|
||
**Express + Node 預設無 per-process connection 上限**:
|
||
|
||
- Node 預設 `server.maxConnections = Infinity`
|
||
- Attacker 用 valid key 一次開 1000 個 `/result` connection、瞬間耗光 fd table(typical 1024-65536)+ MinIO upstream connection
|
||
|
||
**為什麼不複用 `uploadConcurrency.js`**:
|
||
|
||
| `uploadConcurrency.js` | `resultStreamConcurrency.js`(新寫)|
|
||
|----------------------|------------------------------|
|
||
| 限制「同一 job_id 不能重複 upload」(per-job key)| 限制「server 整體最多 N 個 stream」(global counter)|
|
||
| 語意:互斥(同 job 只能一個 upload)| 語意:容量(server 同時最多服務 N 個 download)|
|
||
| 觸發時 409 conflict | 觸發時 503 service_busy + Retry-After |
|
||
|
||
兩者**語意不同**、不該複用同一支 middleware;但可以**參考實作結構**(lock acquire / release、`res.once('close')` cleanup)。
|
||
|
||
#### 設計
|
||
|
||
| 參數 | 值 | 理由 |
|
||
|------|---|------|
|
||
| `maxConcurrent` | **10** | 平衡:normal load(P95 < 5 同時 stream、見 §9.1 量化)的 2× headroom;blast radius(attacker 用 10 個 slow connection 配合 M1 5 min timeout = 最多霸佔 50 min × fd × MinIO connection)可控 |
|
||
| `retryAfterSeconds` | 30 | 短 retry 間隔(attacker 撞牆後 30s 內應已釋放部分 slot)|
|
||
|
||
**Multi-instance scaling**:當前單 instance、10 是絕對上限;Phase 2 多 instance 時可乘以 instance 數(10 × N)、或切 Redis 分散式 semaphore 維持全局上限。
|
||
|
||
#### 實作骨幹
|
||
|
||
```javascript
|
||
// src/middleware/resultStreamConcurrency.js(新檔)
|
||
function createResultStreamConcurrencyLimiter({ maxConcurrent, retryAfterSeconds }) {
|
||
let activeStreams = 0;
|
||
|
||
return {
|
||
middleware(req, res, next) {
|
||
if (activeStreams >= maxConcurrent) {
|
||
logAudit({
|
||
level: 'WARN',
|
||
action: 'result.rate_limited',
|
||
// A.7 五欄
|
||
source_ip: req.ip,
|
||
token_fingerprint: req.auth?.tokenFingerprint,
|
||
request_id: req.requestId,
|
||
http_method: 'GET',
|
||
http_path: req.originalUrl,
|
||
// /result 四欄(只有 job_id 適用、其他不適用因為還沒開始 stream)
|
||
job_id: req.params.id,
|
||
// event 特有
|
||
limit_type: 'concurrent',
|
||
retry_after_seconds: retryAfterSeconds,
|
||
active_streams_at_reject: activeStreams,
|
||
});
|
||
res.setHeader('Retry-After', retryAfterSeconds);
|
||
return next(new ApiError(503, 'service_busy',
|
||
'伺服器忙碌中,請稍後再試',
|
||
{ limit_type: 'concurrent', retry_after_seconds: retryAfterSeconds }));
|
||
}
|
||
|
||
// Acquire slot
|
||
activeStreams++;
|
||
let released = false;
|
||
const release = () => {
|
||
if (released) return;
|
||
released = true;
|
||
activeStreams--;
|
||
};
|
||
|
||
// 在 response finish / close / error 任一情境釋放 slot
|
||
res.once('finish', release);
|
||
res.once('close', release);
|
||
res.once('error', release);
|
||
|
||
next();
|
||
},
|
||
|
||
// 給 health check / monitoring 暴露 internal state(不要直接寫)
|
||
getActiveCount: () => activeStreams,
|
||
};
|
||
}
|
||
```
|
||
|
||
**Multi-instance 部署的限制**:
|
||
|
||
- 當前 in-memory counter(per-process);單 instance 部署可接受
|
||
- Phase 2 多 instance 部署前必做(已升 HIGH、見 security.md 候補 #8):切 Redis 分散式 semaphore
|
||
- 不切:N instance × 10 = N×10 實際上限、blast radius 放大 N 倍
|
||
|
||
### 15.4 限制間的協作
|
||
|
||
四個 mitigation 形成 defense-in-depth:
|
||
|
||
| 攻擊 vector | M1 stream timeout | M2 concurrent cap | §9 rate limit | §9 bandwidth quota |
|
||
|-----------|------------------|------------------|--------------|-----------------|
|
||
| **Slowloris**(10 個 slow connection × 6 hr)| ✅ 5 min 切斷每個 | ✅ 10 個上限阻擋第 11 個 | — | — |
|
||
| **Connection exhaustion**(1000 個 connection 不讀)| ✅ 5 min 切斷 | ✅ 第 11 個立即 503 | — | — |
|
||
| **Mass download**(20 req/min × 100MB)| — | — | ✅ 20 req/min sustained 上限 | ✅ 1 GB/hr 撞牆 |
|
||
| **Bandwidth abuse**(少 req 大檔)| — | — | ❌ 不擋 | ✅ 1 GB/hr 撞牆 |
|
||
| **Burst attack**(10 req/sec spike)| — | — | ✅ 5 req/10s burst 上限 | — |
|
||
|
||
**未列入的攻擊面**:
|
||
|
||
- **HTTP/2 stream multiplexing**:Phase 1 仍是 HTTP/1.1、暫不阻擋。Phase 2 上 H2 時、需顯式設 `http2.SETTINGS_MAX_CONCURRENT_STREAMS = 100`(per TCP connection)
|
||
- **Compression bomb**:NEF 已是 binary、helmet 預設不開 gzip on octet-stream;確認 nginx / reverse proxy 端也不對 `Content-Type: application/octet-stream` 開 gzip
|
||
- **MinIO socketTimeout 對齊**:Phase 2 候補 #16(新增、見 security.md)
|
||
|
||
---
|
||
|
||
## 14. 給 Backend 的 Phase B Acceptance Criteria 總清單(**Security 2026-05-17 review 後重寫**)
|
||
|
||
> **重要變更**:原 B1-B9 acceptance criteria 因 Security review 發現的 4 Major + 3 Minor 必須擴充。新清單採 **AC-1 到 AC-12** 編號(對齊 Security review §3 / 12 條 acceptance criteria),是 Backend implementer 的 single source of truth。
|
||
> Reviewer 把這份當 checklist;缺任一條 → PR 不接受。
|
||
|
||
### 14.1 Middleware 鏈(AC-1 到 AC-4)
|
||
|
||
順序:`requireApiKey → resultBurstLimiter → resultSustainedLimiter → resultBandwidthQuota → resultStreamSemaphore → handler`(quota / semaphore 必須在 auth 之後、避免 unauth 流量擠占 slot)
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **AC-1** | `/result` 套用 `requireApiKey()` middleware | §2.3 | wire 在 v1/index.js、與 jobs/promote 一致;不通過 → 401 + 主動 socket.destroy() |
|
||
| **AC-2** | `/result` 套用 two-tier rate limit | §9.2、§9.3、§9.5 | **burst**:5 req / 10 sec + **sustained**:20 req / min;bucket key 用 `token_fingerprint`(不是 clientId);超限 → 429 `rate_limit_exceeded` + `Retry-After` + audit log `result.rate_limited`(含 `limit_type: 'burst' \| 'sustained'`) |
|
||
| **AC-3** | `/result` 套用 bandwidth quota | §9.4、§9.8 | **hourly**:1 GB / hr + **daily**:6 GB / 24hr per `token_fingerprint`;in-memory counter(Phase 2 切 Redis);pre-check + post-stream incr;超限 → 429 `bandwidth_quota_exceeded` + `Retry-After` + audit log(含 `limit_type: 'bandwidth_hourly' \| 'bandwidth_daily'`、累計 bytes)|
|
||
| **AC-4** | `/result` 套用 concurrent stream cap | §15.3 | `MAX_CONCURRENT_RESULT_STREAMS = 10`(env 可覆寫);**新寫** `src/middleware/resultStreamConcurrency.js`(**不**複用 `uploadConcurrency.js`、語意不同);release on `res.once('finish' / 'close' / 'error')`;超限 → 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited`(含 `limit_type: 'concurrent'`) |
|
||
|
||
### 14.2 Range Header 處理(AC-5、AC-6)
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **AC-5** | Range header silently ignored、明示 `Accept-Ranges: none` | §10.4 | response header 必含 `Accept-Ranges: none`(不省略、不設為 `bytes`);不解析 Range;不回 416;不切片 MinIO request |
|
||
| **AC-6** | Range header 寫 audit log `result.range_attempted` | §10.6、§11.4 | request 含 Range header 時必寫 audit event(INFO level、不是 WARN);含 `range_header_received`(sanitize 截短 100 字)+ A.7 五欄 + `job_id` |
|
||
|
||
### 14.3 Streaming Timeout / Connection 安全(AC-7、AC-8)
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **AC-7** | Stream response timeout 5 分鐘 | §15.2 | `res.setTimeout(STREAM_TIMEOUT_MS)`(預設 300_000 ms、env `RESULT_STREAM_TIMEOUT_MS` 可覆寫);同步呼叫 `req.setTimeout` 確保兩端覆蓋;timeout 觸發 → `res.destroy()` + `result.stream.destroy()` + audit log `result.stream_timeout` |
|
||
| **AC-8** | Stream 結束 / 中斷 / client close cleanup | §4.4、§6.2 | `stream.on('error')`:destroy + log `result.stream_error`(含 `stream_completed: false`、bytes < contentLength);`req.on('close')`:destroy stream + log `result.client_closed`(含 `stream_completed: false`);`stream.on('end')` 且 bytes = contentLength → log `result.streamed`(含 `stream_completed: true`) |
|
||
|
||
### 14.4 Audit Log 完整性(AC-9、AC-10)
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **AC-9** | 所有 `result.*` event 必含 A.7 五欄 + /result 四欄 | §11.2、§11.3 | **A.7 五欄**:`source_ip`、`token_fingerprint`、`request_id`、`http_method`、`http_path`(每 event 必含);**/result 四欄**:`job_id`(所有 event)+ `size_bytes` / `duration_ms` / `stream_completed`(按事件類型必含)|
|
||
| **AC-10** | 12 個 audit event 全實作 | §11.4 | `result.streamed` / `result.stream_error` / `result.client_closed` / `result.stream_timeout` / `result.not_found` / `result.not_completed` / `result.expired` / `result.storage_unavailable` / `result.rate_limited` / `result.bandwidth_quota_exceeded` / `result.range_attempted` / `result.filename_assertion_failed` — 共 **12 個**(Security review 必補 `rate_limited` / `bandwidth_quota_exceeded` / `range_attempted` / `stream_timeout` 4 個 + Architect 原設計 7 個 + filename assertion 1 個)|
|
||
|
||
### 14.5 Filename / Response Header(AC-11、AC-12)
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **AC-11** | `Content-Disposition` defense-in-depth | §13.4a | (1) quote-escape:`filename.replace(/[\\"]/g, '\\$&')`;(2) RFC 5987 fallback:`filename*=UTF-8''${encodeURIComponent(filename)}`;(3) buildFilename assertion:`/^[A-Za-z0-9._-]+$/.test(candidate)`;assertion 失敗 → fail-secure fallback `job_<jobId>.nef` + audit log `result.filename_assertion_failed`;(4) unit test cover assertion |
|
||
| **AC-12** | response 不設 `Accept-Ranges: bytes` | §10.4、§14.2 AC-5 | response header 必須明確 `Accept-Ranges: none`;不可省略;不可設為 `bytes`(與 AC-5 重複強調)|
|
||
|
||
### 14.6 Sub-acceptance:既有 `source_filename` 寫入點(B1)
|
||
|
||
從原 B1 保留、不算 Security 新增、但 Backend 仍需做:
|
||
|
||
| # | 項目 | 章節 | Acceptance criteria |
|
||
|---|------|------|-------------------|
|
||
| **B1** | `source_filename` 寫入 createJob handler | §13.3 | `jobRecord.source_filename = input.safeFilename`(line ~740)+ unit test cover happy path / fallback |
|
||
| **B1.5** | 確認 `job.platform` 寫入點 | §13.3 | Grep 確認、缺則補 |
|
||
|
||
### 14.7 Integration Test 場景(必補 6 個 + 原 happy path test)
|
||
|
||
**Security review §3.6 必補 5 個 + 原 happy path 系列**:
|
||
|
||
| # | 場景 | 對應 AC | 預期行為 |
|
||
|---|------|---------|---------|
|
||
| **IT-1** | Happy path:completed job + 有 NEF + 不過期 | — | 200 + 完整 stream + Content-Type/Length/Disposition 正確 + Accept-Ranges: none + audit `result.streamed`(stream_completed: true)|
|
||
| **IT-2** | Rate limit burst:快速打 6 req / 10 sec | AC-2 | 第 6 個回 429 + `limit_type: 'burst'` + `Retry-After` + audit log |
|
||
| **IT-3** | Rate limit sustained:穩定打 21 req / 1 min | AC-2 | 第 21 個回 429 + `limit_type: 'sustained'` + audit log |
|
||
| **IT-4** | Bandwidth quota hourly:累積下載超 1 GB | AC-3 | 超限後回 429 `bandwidth_quota_exceeded` + `limit_type: 'bandwidth_hourly'` + audit log |
|
||
| **IT-5** | Range header probing:request 含 `Range: bytes=0-7` | AC-5、AC-6 | 仍回 **200 整段**(不是 206、不是 416)+ `Accept-Ranges: none` + audit log `result.range_attempted` 含 `range_header_received: 'bytes=0-7'` |
|
||
| **IT-6** | Stream timeout:mock 慢讀 client(每 5s 讀 1 byte)| AC-7 | 5 min 後 server destroy connection + audit log `result.stream_timeout` 含 `timeout_ms: 300000` |
|
||
| **IT-7** | Concurrent stream cap:同時打 11 個 stream(mock 慢 stream)| AC-4 | 第 11 個立即回 503 `service_busy` + `Retry-After: 30` + audit log `result.rate_limited`(concurrent)|
|
||
| **IT-8** | Audit log forensic:cross-event 追蹤完整性 | AC-9、AC-10 | 1 個 `request_id` 串接 `auth.api_key.authenticated` → `result.*` 一個終態 event;event 含 A.7 五欄 + /result 四欄 |
|
||
| **IT-9** | filename assertion fallback:模擬上游 sanitize bug 傳入含 `"` 的 stem | AC-11 | 回 200 + filename 是 `job_<jobId>.nef`(不是 `evil".onnx_kl720.nef`)+ audit log `result.filename_assertion_failed` |
|
||
|
||
原既有 test(保留):
|
||
|
||
- ❌ 401(missing API key / wrong API key)
|
||
- ❌ 404 `job_not_found`(jobID 不存在)
|
||
- ❌ 404 `result_not_found`(completed 但沒 NEF)
|
||
- ❌ 409 `job_not_completed`(status = ONNX / BIE / NEF / FAILED)
|
||
- ❌ 410 `result_expired`(expires_at 在過去 / MinIO `getObjectStream` 回 null)
|
||
- ❌ 502 `storage_unavailable`(MinIO throw)
|
||
- ❌ 503 `service_unavailable`(CONVERTER_API_KEY 未設定)
|
||
|
||
### 14.8 不在 Phase B scope(明確不做)
|
||
|
||
| # | 項目 | 理由 | 對應 Phase 2 候補 |
|
||
|---|------|------|------------------|
|
||
| ❌ | per-job authorization(檢查 caller 是否為 job 建立者) | 當前 1:1 trust + client_id 寫死、per-job check 仍會通過;需先做 per-caller credential | #12(MEDIUM)|
|
||
| ❌ | Range request support(206 Partial Content)| visionA 端無需求、增加 attack surface | 真有需求再評估 |
|
||
| ❌ | HMAC user_id 簽章 | ADR-015 已決策不做 | — |
|
||
| ❌ | 多 caller credential(separate API key per service)| Phase 2 真有第二個 caller 時再做 | #13(LOW)|
|
||
| ❌ | Multi-instance rate limit / bandwidth quota / concurrent cap Redis store | 當前單 instance、可接受 | #8(**HIGH**、升級)|
|
||
| ❌ | MinIO socketTimeout 對齊 stream timeout | Phase 2 evaluate | 候補 #16 |
|
||
| ❌ | 4xx 統一回 404(不揭露 lifecycle) | 需配合 #12 per-job auth 啟動 | 候補 #15 |
|
||
|
||
### 14.9 環境變數(Backend + DevOps 同步、Phase B 新增)
|
||
|
||
| Env | 預設值 | 用途 |
|
||
|-----|-------|------|
|
||
| `RESULT_STREAM_TIMEOUT_MS` | 300000(5 min)| Stream response timeout(AC-7)|
|
||
| `MAX_CONCURRENT_RESULT_STREAMS` | 10 | Concurrent stream cap(AC-4)|
|
||
| `RESULT_RATE_LIMIT_BURST_PER_10S` | 5 | Burst rate limit(AC-2)|
|
||
| `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | 20 | Sustained rate limit(AC-2)|
|
||
| `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | 1073741824(1 GB)| Hourly bandwidth quota(AC-3)|
|
||
| `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | 6442450944(6 GB)| Daily bandwidth quota(AC-3)|
|
||
|
||
### 14.10 Reviewer 在 Phase B 完成後必驗證
|
||
|
||
| # | 驗證項 | 方法 |
|
||
|---|--------|------|
|
||
| R1 | `source_filename` 寫入點存在且為 sanitized 字串 | grep `apps/task-scheduler/src/routes/v1/jobs.js` |
|
||
| R2 | Token / NEF binary 不出現在任何 log statement | grep `console.log\|console.error` + 人工 review |
|
||
| R3 | Two-tier rate limit(burst + sustained)+ bandwidth quota + concurrent cap 四個 middleware 都掛在 `/result`、wire 順序正確 | 看 v1/index.js `router.use('/jobs/:id/result', ...)` |
|
||
| R4 | Range header 處理:不解析、不回 416、設 `Accept-Ranges: none`、寫 audit log `result.range_attempted` | 看 handler header 段、grep `416` 無命中 |
|
||
| R5 | Stream response timeout 5 min + audit log | grep `res.setTimeout` + grep `result.stream_timeout` |
|
||
| R6 | Concurrent stream cap = 10 + 新寫 middleware(不複用 uploadConcurrency)| grep `resultStreamConcurrency` |
|
||
| R7 | Audit log 12 個 action 全寫了 + 每個都含 A.7 五欄 + /result 四欄 | grep `action: 'result\.` 至少 12 個 distinct action match + 抽 3 個 event 驗欄位完整 |
|
||
| R8 | Content-Disposition quote-escape + RFC 5987 + buildFilename assertion | 看 setHeader / buildFilename + grep `result.filename_assertion_failed` |
|
||
| R9 | bandwidth quota bucket key 用 `token_fingerprint`(不是 clientId)| 看 keyGenerator 注入 |
|
||
| R10 | OpenAPI spec 更新含 429 / 503 等新 status code | 看 `docs/openapi.yaml` |
|
||
| R11 | 6 個新 integration test(IT-2 到 IT-7)全寫 + 4xx 系列原 test 保留 | grep test 檔 |
|
||
| R12 | 6 個新 env 在 README / deploy doc 有文件化 | 看 README + `.env.example` |
|
||
|
||
---
|
||
|
||
## 16. Env Naming Reference Table(**Architect 2026-05-18 拍板、避免三方命名漂移**)
|
||
|
||
### 16.1 為什麼這節存在
|
||
|
||
Phase 0.8b Phase B deploy 時 DevOps 發現 Orchestrator prompt、source code、`env.example` §17 三方 env 命名不一致(例:Orchestrator 寫 `RESULT_RATE_LIMIT_SUSTAINED_MAX`、source code 讀 `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + 額外 `_WINDOW_MS`)。此節提供**權威 canonical 命名清單**,後續任何 deployment 設定、docker-compose env 注入、`.env*` 模板生成、Orchestrator 派任務、文件互引,全部以本節為準。
|
||
|
||
**拍板原則**:以 source code 實際讀的命名為準(改 code 風險 > 改 deployment)。本表已對照 `apps/task-scheduler/src/routes/v1/index.js` L116-169、`apps/task-scheduler/src/routes/v1/result.js` L60、`apps/task-scheduler/src/config.js` L112-296、`apps/task-scheduler/src/services/jobService.js` L69、`apps/task-scheduler/src/storage/{local,minio}.js`、`apps/task-scheduler/src/auth/apiKeyMiddleware.js` 與 `apps/task-scheduler/env.example` 三方確認。
|
||
|
||
### 16.2 Canonical 命名清單(全部 task-scheduler 讀的 env)
|
||
|
||
#### 16.2.1 `/result` 端點專屬(Phase 0.8b Phase B 新增、本檔 spec 範圍)
|
||
|
||
| Canonical env name | Required? | Default | Purpose | Source code reads |
|
||
|--------------------|-----------|---------|---------|-------------------|
|
||
| `RESULT_STREAM_TIMEOUT_MS` | optional | `300000`(5 min)| AC-7 stream response timeout(§14.3 / §15.2)| `src/routes/v1/result.js` L60(`getStreamTimeoutMs` lazy 讀)|
|
||
| `MAX_CONCURRENT_RESULT_STREAMS` | optional | `10` | AC-4 concurrent stream cap(§14.1 / §15.3);per-instance counter;超限 503 + `Retry-After: 30` | `src/routes/v1/index.js` L128(`parseEnvInt` 解析後傳給 `createResultStreamConcurrencyLimiter`)|
|
||
| `RESULT_RATE_LIMIT_BURST_PER_10S` | optional | `5` | AC-2 burst rate limit `max`(§9.2 / §14.1);per `token_fingerprint` | `src/routes/v1/index.js` L160 |
|
||
| `RESULT_RATE_LIMIT_BURST_WINDOW_MS` | optional | `10000`(10 s)| AC-2 burst rate limit `windowMs`;與 `_PER_10S` 成對 | `src/routes/v1/index.js` L159 |
|
||
| `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` | optional | `20` | AC-2 sustained rate limit `max`(§9.2 / §14.1);per `token_fingerprint` | `src/routes/v1/index.js` L169 |
|
||
| `RESULT_RATE_LIMIT_SUSTAINED_WINDOW_MS` | optional | `60000`(1 min)| AC-2 sustained rate limit `windowMs`;與 `_PER_MIN` 成對 | `src/routes/v1/index.js` L167 |
|
||
| `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` | optional | `1073741824`(1 GB)| AC-3 hourly bandwidth quota(§9.4 / §14.1);per `token_fingerprint` | `src/routes/v1/index.js` L116 |
|
||
| `RESULT_BANDWIDTH_QUOTA_PER_DAY_BYTES` | optional | `6442450944`(6 GB)| AC-3 daily bandwidth quota;per `token_fingerprint` | `src/routes/v1/index.js` L119 |
|
||
|
||
**共 8 個 `RESULT_*` / `MAX_CONCURRENT_RESULT_STREAMS`。所有 8 個皆 optional(無設值會走 source code 內 fallback default)。**
|
||
|
||
#### 16.2.2 其他 task-scheduler 讀的 env(非 `/result` 專屬、列出避免後續混淆)
|
||
|
||
| Canonical env name | Required? | Default | Purpose | Source code reads |
|
||
|--------------------|-----------|---------|---------|-------------------|
|
||
| `CONVERTER_API_KEY` | optional(warn-only)| `''`(空字串)| visionA → converter 對外 API 認證 pre-shared key;未設時 `apiKeyMiddleware` 一律回 503 `service_unavailable` | `src/config.js` L167 + `src/auth/apiKeyMiddleware.js` L234 |
|
||
| `TRUST_PROXY` | optional | `'loopback'` | Express `app.set('trust proxy', ...)`;影響 `req.ip` 與 audit log `source_ip`(接受值:boolean / integer / 字串 keyword / CIDR)| `src/config.js` L145 + `src/app.js` |
|
||
| `PORT` | optional | `4000` | HTTP listen port | server entry(未經 config.js)|
|
||
| `NODE_ENV` | optional | `'development'` | 影響 FAA URL 強制 HTTPS 等行為 | `src/config.js` L212 |
|
||
| `LOG_LEVEL` | optional | `'info'` | log 等級 | server entry(未經 config.js)|
|
||
| `REDIS_URL` | optional | `'redis://localhost:6379'` | Redis 連線字串 | `src/redis.js` L24 |
|
||
| `FRONTEND_URL` | optional | `'http://localhost:3000'` | CORS origin | `src/app.js` L59 |
|
||
| `JOB_DATA_DIR` | optional | `'/data/jobs'` | local storage 與 worker 共用 volume 路徑 | `src/services/jobService.js` L69、`src/storage/local.js` L23 |
|
||
| `STORAGE_BACKEND` | optional | `'local'` | `'local'` / `'minio'` | `src/app.js` L170、`src/storage/minio.js` L39、`src/routes/v1/jobs.js` L945 |
|
||
| `MINIO_ENDPOINT_URL` | conditional(`STORAGE_BACKEND=minio` 必填)| `'http://192.168.0.130:9000'` | MinIO endpoint | `src/storage/minio.js` L40 |
|
||
| `MINIO_BUCKET` | conditional | `'convertet-working-space'` | MinIO bucket name | `src/storage/minio.js` L41 |
|
||
| `MINIO_ACCESS_KEY` | conditional | `'convuser'` | MinIO access key | `src/storage/minio.js` L42 |
|
||
| `MINIO_SECRET_KEY` | conditional | `''`(空字串)| MinIO secret key | `src/storage/minio.js` L43 |
|
||
| `MINIO_REGION` | conditional | `'us-east-1'` | MinIO region | `src/storage/minio.js` L44 |
|
||
| `MINIO_LIFECYCLE_DAYS` | optional | (TBD 由 bucket lifecycle policy 設)| bucket lifecycle 天數;orphan 清除 | env.example §6(非 task-scheduler 讀、由 init script 用)|
|
||
| `MEMBER_CENTER_TOKEN_URL` | **required** | — | converter → FAA OAuth client token endpoint(缺漏 → fail-fast)| `src/config.js` L112 |
|
||
| `KNERON_CONVERTER_CLIENT_ID` | **required** | — | converter OAuth client_id(缺漏 → fail-fast)| `src/config.js` L116 |
|
||
| `KNERON_CONVERTER_CLIENT_SECRET` | **required** | — | converter OAuth client_secret(缺漏 → fail-fast)| `src/config.js` L117 |
|
||
| `FILE_ACCESS_AGENT_BASE_URL` | **required** | — | FAA base URL;prod 強制 https | `src/config.js` L197 |
|
||
| `FILE_ACCESS_AGENT_AUDIENCE` | **required** | — | FAA OAuth audience | `src/config.js` L198 |
|
||
| `PROMOTE_TIMEOUT_MS` | optional | `300000`(300 s)| FAA PUT timeout | `src/config.js` L220 + `src/app.js` L125 |
|
||
| `OAUTH_TOKEN_REFRESH_SKEW_MS` | optional | `60000`(60 s)| OAuth token 距 expiresAt 還剩多少 ms 主動 refresh | `src/config.js` L225 |
|
||
| `OAUTH_TOKEN_TIMEOUT_MS` | optional | `10000`(10 s)| OAuth token endpoint timeout | `src/config.js` L228 |
|
||
| `MULTIPART_MODEL_MAX_BYTES` | optional | `524288000`(500 MB)| multer model file size 上限 | `src/config.js` L243 |
|
||
| `MULTIPART_REF_IMAGE_MAX_BYTES` | optional | `10485760`(10 MB)| 單張 ref_image 上限 | `src/config.js` L252 |
|
||
| `MULTIPART_REF_IMAGES_MAX_COUNT` | optional | `100` | ref_images 張數上限 | `src/config.js` L261 |
|
||
| `MAX_CONCURRENT_UPLOADS` | optional | `5` | 同時間最多進行幾個 upload;超限 503 + `Retry-After` | `src/config.js` L285 |
|
||
| `UPLOAD_RETRY_AFTER_SECONDS` | optional | `30` | upload 超限的 `Retry-After` 秒數 | `src/config.js` L291 |
|
||
| `API_V1_RATE_LIMIT_WINDOW_MS` | optional | `300000`(5 min)| per-clientId rate limit window;當前 clientId 寫死 `'visionA-service'` | env.example §15(middleware 預設可被 deps 覆寫)|
|
||
| `API_V1_RATE_LIMIT_MAX` | optional | `300` | per-clientId rate limit max | env.example §15 |
|
||
|
||
### 16.3 命名規則約定(後續新增 env 必遵)
|
||
|
||
- **scope prefix**:與 `/result` 端點相關用 `RESULT_*`、與 upload 相關用 `UPLOAD_*` / `MULTIPART_*`、與 OAuth 相關用 `OAUTH_*` / `MEMBER_CENTER_*` / `KNERON_CONVERTER_*`、與 FAA 相關用 `FILE_ACCESS_AGENT_*`、與 MinIO 相關用 `MINIO_*`、與 API key 認證相關用 `CONVERTER_API_KEY`
|
||
- **單位後綴**:時間 = `_MS` / `_SECONDS`;空間 = `_BYTES`;count = `_PER_<時間單位>` / `_MAX_COUNT`(不混用)
|
||
- **rate limit 命名**:`_PER_<period>` 表單位時間最大次數;`_WINDOW_MS` 表 sliding window 大小;兩者成對出現(`_PER_10S` + `_BURST_WINDOW_MS=10000` 不算違規、單位來自 window)
|
||
- **bandwidth quota 命名**:`_QUOTA_PER_<period>_BYTES`,period 用單字(`HOUR` / `DAY`、非 `1H` / `24H`)
|
||
- **concurrent cap 命名**:`MAX_CONCURRENT_<scope>` 或 `<scope>_CONCURRENT_MAX`(本檔已存在 `MAX_CONCURRENT_UPLOADS` + `MAX_CONCURRENT_RESULT_STREAMS`、不再變動)
|
||
|
||
### 16.4 已知非標準命名(接受、不要 rename)
|
||
|
||
以下命名與 §16.3 規則不完全一致、但已散佈在 source code + tests + docs、rename 成本高、且語意清楚、**保留不變**:
|
||
|
||
- `MAX_CONCURRENT_RESULT_STREAMS`(理論上應為 `RESULT_CONCURRENT_STREAM_MAX`、但 `MAX_CONCURRENT_UPLOADS` 已用此 pattern、保持一致)
|
||
- `MAX_CONCURRENT_UPLOADS`(同上)
|
||
- `KNERON_CONVERTER_CLIENT_ID` / `_SECRET`(理論上應為 `OAUTH_CONVERTER_*`、但這是 converter 在 OAuth 體系中的身份名、有歷史脈絡)
|
||
- `MEMBER_CENTER_TOKEN_URL`(service-name prefix 而非 scope prefix、但語意明確)
|
||
|
||
未來新增 env 必依 §16.3、不再增加例外。
|
||
|
||
### 16.5 此節維護者責任
|
||
|
||
- **Architect**:拍板命名 + 維護本表;Backend 想新增 env 必先過此表(PR 改 `env.example` 時同步改本表)
|
||
- **Backend**:source code 增減 `process.env.*` 讀取點時,PR 必同步更新本表的 "Source code reads" 欄
|
||
- **DevOps**:deployment / docker-compose env 注入以本表為唯一權威;不接受 source code 與本表不一致的部署設定
|
||
- **Orchestrator**:派任務給 Backend / DevOps 時,env 命名必引用本表(不憑記憶寫)
|
||
|
||
### 16.6 三方對齊現況(2026-05-18 deploy 時發現的不一致 + 解決狀態)
|
||
|
||
| 不一致案例(deploy 時)| Orchestrator prompt 曾用過 | Source code 實際讀 | env.example §17 | 拍板 |
|
||
|----------------------|--------------------------|-------------------|-----------------|------|
|
||
| Sustained rate limit | `RESULT_RATE_LIMIT_SUSTAINED_MAX` | `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + `_WINDOW_MS` | `RESULT_RATE_LIMIT_SUSTAINED_PER_MIN` + `_WINDOW_MS` | source code 為準(已對齊)|
|
||
| Bandwidth quota | `RESULT_BANDWIDTH_HOURLY_QUOTA` | `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` + `_PER_DAY_BYTES` | `RESULT_BANDWIDTH_QUOTA_PER_HOUR_BYTES` + `_PER_DAY_BYTES` | source code 為準(已對齊)|
|
||
| Concurrent cap | `RESULT_CONCURRENT_STREAM_MAX` | `MAX_CONCURRENT_RESULT_STREAMS` | `MAX_CONCURRENT_RESULT_STREAMS` | source code 為準(已對齊)|
|
||
| Stream timeout | `RESULT_STREAM_TIMEOUT_MS` | `RESULT_STREAM_TIMEOUT_MS` | `RESULT_STREAM_TIMEOUT_MS` | 三方原本就一致 |
|
||
|
||
**結論**:實際 source code 與 `env.example` §17 已完全對齊(差異僅在 Orchestrator 派任務的 prompt 描述)。Backend 與 DevOps 不需動 source code / docker-compose / env.example;Orchestrator 後續派任務以本表為準即可。
|