Auth pillar 從 OAuth 2.0 resource server 改成 pre-shared API key (visionA ↔ converter 1:1 internal trust)。新增 GET /api/v1/jobs/:id/result streaming endpoint 給 visionA backend 中轉 NEF 下載。 Phase A(auth 切換): - 新增 apiKeyMiddleware(constant-time compare、tokenFingerprint、4 audit events) - 砍 OAuth middleware + JWKS(保留 oauthClient 供 promote → FAA 使用) - 4 個 endpoint 換掛 requireApiKey - 加 TRUST_PROXY env + Express trust proxy 設定(forensic source_ip) Phase B(/result endpoint): - streaming NEF download with 5min timeout + concurrent cap 10 - Two-tier rate limit(burst 5/10s + sustained 20/min) - Bandwidth quota(1 GB/hr + 6 GB/24hr)by token_fingerprint - Range header silently ignored + Accept-Ranges: none - filename quote-escape + RFC 5987 fallback + sanitize - 8 個 /result audit events(forensic 完整) 設計演進記錄:docs/TODO-visionA-integration-v2.md(5/2 OAuth → 5/16 API key → 5/16 download via converter;對應 visionA repo ADR-015/016) Tests: 597 → 666 (+69)、29 suites all pass Security: APPROVE WITH CONDITIONS(單 instance 部署、6 新 env、24hr 監控) npm audit: 3 vuln → 0(transitive AWS SDK xml chain) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
287 lines
12 KiB
Markdown
287 lines
12 KiB
Markdown
# Auth 設計(Phase 0.8b)
|
||
|
||
> **scope**:visionA → converter 的對外 API auth;converter → FAA 的 promote auth。
|
||
>
|
||
> **狀態**:Phase 0.8b 重寫 — visionA → converter 改用 API key;converter → FAA 仍走 OAuth client_credentials。
|
||
>
|
||
> **配套**:`design-doc.md` §3.2 / §3.3、`security.md` Trust Boundary 章節。
|
||
>
|
||
> **設計演進**:visionA repo `adr-015-server-to-server-api-key.md` v2.1(為什麼用 API key)。
|
||
|
||
---
|
||
|
||
## 1. visionA → Converter:API key middleware
|
||
|
||
### 1.1 設計概要
|
||
|
||
- **Header**:`Authorization: Bearer <CONVERTER_API_KEY>`(重用既有 Bearer 格式)
|
||
- **比對**:`crypto.timingSafeEqual` constant-time compare(防 timing attack)
|
||
- **長度**:64 hex chars(`openssl rand -hex 32`)
|
||
- **失敗行為**:401 `invalid_token` + 主動 `socket.destroy()`(沿用 OAuth middleware 的 M2 行為)
|
||
- **req.auth shape**:通過後設定固定值(無 scope check)
|
||
|
||
### 1.2 Middleware 介面
|
||
|
||
```javascript
|
||
// src/auth/apiKeyMiddleware.js
|
||
function requireApiKey(deps = {}) {
|
||
// deps.expectedApiKey 可注入測試;正式環境 lazy load from config
|
||
return function apiKeyMiddleware(req, res, next) { ... };
|
||
}
|
||
```
|
||
|
||
**使用方式**(取代既有 `requireAuth(scope)`):
|
||
|
||
```javascript
|
||
const { requireApiKey } = require('../../auth/apiKeyMiddleware');
|
||
|
||
// 取代 requireAuth(config.converter.scopeWrite)
|
||
router.post('/jobs', requireApiKey(), perClientLimiter, handler);
|
||
router.get('/jobs', requireApiKey(), perClientLimiter, handler);
|
||
router.get('/jobs/:id', requireApiKey(), perClientLimiter, handler);
|
||
router.post('/jobs/:id/promote', requireApiKey(), perClientLimiter, handler);
|
||
router.get('/jobs/:id/result', requireApiKey(), perClientLimiter, handler);
|
||
```
|
||
|
||
### 1.3 req.auth shape(通過後)
|
||
|
||
```javascript
|
||
req.auth = {
|
||
sub: 'visionA-service',
|
||
clientId: 'visionA-service',
|
||
tenantId: null,
|
||
scopes: ['converter:job.write', 'converter:job.read'], // implicit full access
|
||
raw: { authType: 'api_key' },
|
||
};
|
||
```
|
||
|
||
**為什麼這樣設計**:
|
||
- `clientId` 固定值讓既有 per-client rate limiter / log infra 無需修改
|
||
- `scopes` 列兩個值是「兼容性 placeholder」,下游 handler 不會再 check(middleware 已不做 scope check)
|
||
- `raw.authType: 'api_key'` 給 log / metrics 分類用,未來如果加回 OAuth 可從這個欄位區分
|
||
|
||
### 1.4 失敗情境
|
||
|
||
| 情境 | HTTP | error.code | 訊息 |
|
||
|------|------|-----------|------|
|
||
| Missing Authorization header | 401 | `invalid_token` | 缺少或格式錯誤的 Authorization header(需為 Bearer <token>) |
|
||
| Authorization 不是 Bearer 格式 | 401 | `invalid_token` | 同上 |
|
||
| Token 為空字串 | 401 | `invalid_token` | 同上 |
|
||
| Token 與 CONVERTER_API_KEY 不符(constant-time compare) | 401 | `invalid_token` | API key 驗證失敗 |
|
||
| `CONVERTER_API_KEY` env 未設定(fail-fast) | 503 | `service_unavailable` | API key not configured |
|
||
| 任何未預期 exception | 401 | `invalid_token` | API key 驗證失敗(兜底,避免 5xx 洩漏內部細節) |
|
||
|
||
### 1.5 Constant-time compare 實作
|
||
|
||
```javascript
|
||
function constantTimeEquals(a, b) {
|
||
if (typeof a !== 'string' || typeof b !== 'string') return false;
|
||
const bufA = Buffer.from(a, 'utf8');
|
||
const bufB = Buffer.from(b, 'utf8');
|
||
if (bufA.length !== bufB.length) return false; // 必須先比長度(timingSafeEqual 在長度不同時 throw)
|
||
return crypto.timingSafeEqual(bufA, bufB);
|
||
}
|
||
```
|
||
|
||
**注意**:
|
||
- 長度先比是必要的(`timingSafeEqual` 在長度不同時會 throw `RangeError`)
|
||
- 長度本身不算 secret(key 長度為公開資訊,本專案固定 64 chars)
|
||
- 比較完整 byte,不可截短
|
||
|
||
### 1.6 Destroy socket 行為(M2 沿用)
|
||
|
||
對齊既有 `auth/middleware.js` 的 `sendAuthError`:
|
||
|
||
1. 設 `Connection: close` header
|
||
2. `res.status(401).json({ error: {...} })` 寫 response
|
||
3. `res.once('finish', () => req.socket.destroy())` 在 response 寫完後主動斷線
|
||
|
||
**為什麼**:401 後 client 可能還在繼續上傳 500MB body,Node 會持續往 socket buffer 灌資料。destroy socket 防止這個情境吃光記憶體 / 頻寬。
|
||
|
||
### 1.7 Fail-fast 行為(CONVERTER_API_KEY 未設定)
|
||
|
||
```javascript
|
||
if (!expected || expected === '') {
|
||
// log 一次(不印 key)
|
||
console.error(JSON.stringify({
|
||
level: 'ERROR',
|
||
action: 'auth.api_key.not_configured',
|
||
message: 'CONVERTER_API_KEY env not set; rejecting all requests',
|
||
}));
|
||
// 503 拒絕,不要 silently allow
|
||
return sendApiKeyError(req, res, 503, 'service_unavailable', 'API key not configured');
|
||
}
|
||
```
|
||
|
||
**為什麼不 throw / process.exit**:
|
||
- 不想啟動時就 throw(Web UI legacy 路徑也跑在同 process、應該還能用)
|
||
- 但對外 API 必須擋(403 / 503 比 silently allow 安全)
|
||
|
||
### 1.8 Log 規則
|
||
|
||
| 場景 | log level | 欄位 |
|
||
|------|-----------|------|
|
||
| 啟動時 API key 已設定 | INFO | `action: 'config.api_key_enabled'`、`api_key_length`(不印 key 本身) |
|
||
| 啟動時 API key 未設定 | WARN | `action: 'config.api_key_not_set'` |
|
||
| Middleware 收到 request 但 API key 未配置 | ERROR | `action: 'auth.api_key.not_configured'` |
|
||
| Middleware 驗證失敗 | (不 log 個別失敗,避免 log injection;計入 metrics 即可)| — |
|
||
| Middleware 驗證成功 | (不 log;下游 handler 會 log request)| — |
|
||
| Middleware 兜底 exception | ERROR | `action: 'auth.api_key.unexpected_error'`、`error_message` 截短 100 chars |
|
||
|
||
**絕不 log**:
|
||
- API key 內容(含 expected 或 received 任何一邊)
|
||
- Authorization header 完整內容
|
||
- token / secret 字串
|
||
|
||
---
|
||
|
||
## 2. Converter → FAA:OAuth client_credentials(保留不動)
|
||
|
||
### 2.1 範圍
|
||
|
||
Promote 流程(`POST /api/v1/jobs/:id/promote`)中,Converter 以自己的身分取 `files:upload.write` token、PUT 結果檔到 FAA。**Phase 0.8b 完全不動**。
|
||
|
||
詳細 client 行為見既有 `apps/task-scheduler/src/auth/oauthClient.js`(**保留**),本節僅記架構決策。
|
||
|
||
### 2.2 設定
|
||
|
||
| 環境變數 | 用途 | Phase 0.8b 狀態 |
|
||
|---------|------|---------------|
|
||
| `MEMBER_CENTER_TOKEN_URL` | MC token endpoint | **保留** |
|
||
| `KNERON_CONVERTER_CLIENT_ID` | Converter 作為 client 的 ID | **保留** |
|
||
| `KNERON_CONVERTER_CLIENT_SECRET` | Converter client secret | **保留** |
|
||
| `FILE_ACCESS_AGENT_AUDIENCE` | FAA 的 audience(取 token 時用)| **保留** |
|
||
| `FILE_ACCESS_AGENT_BASE_URL` | FAA API base URL | **保留** |
|
||
| `PROMOTE_TIMEOUT_MS` | FAA PUT timeout | **保留** |
|
||
| `OAUTH_TOKEN_REFRESH_SKEW_MS` | Cache token 距 expiresAt 多少 ms 主動 refresh | **保留** |
|
||
| `OAUTH_TOKEN_TIMEOUT_MS` | 取 token 的網路 timeout | **保留** |
|
||
|
||
### 2.3 Client 行為(沿用既有)
|
||
|
||
- `grant_type=client_credentials`
|
||
- `Authorization: Basic base64(client_id:client_secret)`(RFC 6749 §2.3.1)
|
||
- `scope=files:upload.write`、`audience=<FAA aud>`
|
||
- Token cache:per-scope,distance to expiresAt > refreshSkewMs(預設 60s)算 valid
|
||
- In-flight Promise dedup(同 scope 並發只發一次 request)
|
||
- AbortController timeout(預設 10s)
|
||
- 錯誤分類:`OAuthClientError`(4xx,不重試)/ `OAuthServerError`(5xx,可重試)/ `OAuthTimeoutError`(網路 / timeout,可重試)
|
||
- FAA 回 401 → `invalidate(scope)` + retry 一次;仍 401 → 503 `auth_service_unavailable`
|
||
|
||
---
|
||
|
||
## 3. 砍除清單(Phase 0.8b 移除)
|
||
|
||
| 檔案 / 模組 | 處理 |
|
||
|------------|------|
|
||
| `src/auth/middleware.js`(OAuth resource server)| **砍** |
|
||
| `src/auth/jwks.js` | **砍** |
|
||
| `src/auth/middleware.test.js` | **砍** |
|
||
| `src/auth/jwks.test.js` | **砍** |
|
||
| `src/auth/oauthClient.js` | **保留**(promote 用) |
|
||
| `src/auth/oauthClient.test.js` | **保留** |
|
||
| `src/config.js` 內:`MEMBER_CENTER_ISSUER` / `MEMBER_CENTER_JWKS_URL` / `KNERON_CONVERTER_AUDIENCE` / `JWKS_*` / `JWT_CLOCK_TOLERANCE_SEC` | **砍** |
|
||
| `src/config.js` 內:`MEMBER_CENTER_TOKEN_URL` / `KNERON_CONVERTER_CLIENT_*` / `FILE_ACCESS_AGENT_*` / `PROMOTE_TIMEOUT_MS` / `OAUTH_*` | **保留** |
|
||
| `src/config.js` 新增:`CONVERTER_API_KEY` | **新增** |
|
||
| `.env.example` 移除 OAuth resource server 段、新增 `CONVERTER_API_KEY=` placeholder | **改** |
|
||
| `README.md` auth 章節(OAuth → API key)| **改** |
|
||
| `docs/openapi.yaml` security scheme(OAuth → bearer / api_key)| **改** |
|
||
|
||
### 3.1 砍除的 unit test 範圍
|
||
|
||
- JWT 過期 / 簽章錯 / aud 錯 / iss 錯 / kid 不存在 / scope 不足 / tenant_mismatch
|
||
- JWKS cache hit / miss / cooldown / 演算法 pin
|
||
- 既有 `routes/v1/jobs.test.js` 內驗 401 / 403 的部分 → 改測 API key 401
|
||
|
||
### 3.2 加入的 unit test 範圍
|
||
|
||
- API key middleware:
|
||
- Happy path(正確 key → next() + req.auth 正確)
|
||
- Missing Authorization header → 401
|
||
- Authorization 非 Bearer 格式 → 401
|
||
- Token 為空 → 401
|
||
- Token 不符 → 401(constant-time 比對行為驗證 — 不同 prefix 仍須完成比對)
|
||
- API key 未設定(env 缺)→ 503
|
||
- destroy socket 行為(response 寫完後 socket 確實被關)
|
||
|
||
---
|
||
|
||
## 4. CONVERTER_API_KEY 管理
|
||
|
||
### 4.1 產生
|
||
|
||
```bash
|
||
openssl rand -hex 32
|
||
# 輸出 64 hex chars(128 bits 安全強度,遠超 NIST 推薦的 80 bits)
|
||
```
|
||
|
||
### 4.2 部署位置
|
||
|
||
| 環境 | 位置 |
|
||
|------|------|
|
||
| dev | `apps/task-scheduler/.env`(gitignored) |
|
||
| stage | docker-compose env / k8s secret |
|
||
| prod | docker secret / k8s secret / cloud secrets manager |
|
||
|
||
### 4.3 雙端對齊
|
||
|
||
- visionA `.env.stage`:`VISIONA_CONVERTER_API_KEY=<same string>`
|
||
- converter `.env`:`CONVERTER_API_KEY=<same string>`
|
||
- **兩端必須完全相同字串**
|
||
|
||
### 4.4 Rotation 流程
|
||
|
||
1. 雙端各自 stop deployment(或允許短暫 401 期)
|
||
2. `openssl rand -hex 32` 產新 key
|
||
3. 更新雙端 `.env`
|
||
4. converter 先 redeploy(接受新 key)
|
||
5. visionA 後 redeploy(用新 key call)
|
||
6. 驗證:`curl -H "Authorization: Bearer <NEW_KEY>" https://converter.../api/v1/health`(雖然 /health 無 auth,但用其他 endpoint 驗)
|
||
|
||
**極小停機**(< 1 分鐘)做法:暫時讓 converter 接受新舊兩把 key(middleware 拓展成 array compare),visionA 切到新 key,再砍舊 key。Phase 0.8b 不實作此優化(接受短暫 401)。
|
||
|
||
### 4.5 外洩處理
|
||
|
||
- 立即 rotate 雙端 key
|
||
- 檢視 audit log:「在 rotation 前是否有可疑請求」(用 `request_id` + `user_id` 追蹤)
|
||
- 若有 anomalous activity(同 client_id 短期內 100+ 不同 user_id),通報
|
||
|
||
---
|
||
|
||
## 5. 與既有 promote 流程的關係
|
||
|
||
```
|
||
visionA-backend → converter:
|
||
POST /api/v1/jobs/:id/promote
|
||
Authorization: Bearer <CONVERTER_API_KEY> ← API key
|
||
↓
|
||
converter requireApiKey() middleware 過
|
||
↓
|
||
converter promote handler:
|
||
1. 讀 job from Redis
|
||
2. status === 'COMPLETED' ?
|
||
3. for each target:
|
||
a. minio.headObject(sourceKey)
|
||
b. oauthClient.getServiceToken('files:upload.write') ← OAuth client(保留)
|
||
c. faaClient.putFile(targetKey, stream, ...)
|
||
↓
|
||
回 200 + { promoted: [...] }
|
||
```
|
||
|
||
→ **API key 只在 converter 對外那一層**;converter 內部對 FAA 仍是 OAuth client_credentials。
|
||
|
||
---
|
||
|
||
## 6. 安全性檢查清單(Phase 0.8b)
|
||
|
||
- [x] 用 `crypto.timingSafeEqual` constant-time compare
|
||
- [x] 長度先比避免 throw
|
||
- [x] 不 log key 內容(含 expected / received)
|
||
- [x] Fail-fast:env 未設定不要 silently allow
|
||
- [x] Destroy socket 行為對齊既有 OAuth middleware
|
||
- [x] req.auth shape 對齊下游 handler 預期
|
||
- [x] OAuth client(promote)程式碼完全不動
|
||
- [x] Secret 不進 git(`.env` 已在 .gitignore,但 Sec C1 history 仍待 rewrite)
|
||
- [x] Log 結構化、不含 secret
|
||
- [ ] **Backend 實作時驗收**:tests cover 上述全部情境
|
||
- [ ] **Reviewer 驗收**:grep `CONVERTER_API_KEY` 不出現在任何 log statement
|