jim800121chen d8a9517c9d feat(task-scheduler): Phase 0.8b — API key auth + /result endpoint
Auth pillar 從 OAuth 2.0 resource server 改成 pre-shared API key
(visionA ↔ converter 1:1 internal trust)。新增 GET /api/v1/jobs/:id/result
streaming endpoint 給 visionA backend 中轉 NEF 下載。

Phase A(auth 切換):
- 新增 apiKeyMiddleware(constant-time compare、tokenFingerprint、4 audit events)
- 砍 OAuth middleware + JWKS(保留 oauthClient 供 promote → FAA 使用)
- 4 個 endpoint 換掛 requireApiKey
- 加 TRUST_PROXY env + Express trust proxy 設定(forensic source_ip)

Phase B(/result endpoint):
- streaming NEF download with 5min timeout + concurrent cap 10
- Two-tier rate limit(burst 5/10s + sustained 20/min)
- Bandwidth quota(1 GB/hr + 6 GB/24hr)by token_fingerprint
- Range header silently ignored + Accept-Ranges: none
- filename quote-escape + RFC 5987 fallback + sanitize
- 8 個 /result audit events(forensic 完整)

設計演進記錄:docs/TODO-visionA-integration-v2.md(5/2 OAuth → 5/16 API key
→ 5/16 download via converter;對應 visionA repo ADR-015/016)

Tests: 597 → 666 (+69)、29 suites all pass
Security: APPROVE WITH CONDITIONS(單 instance 部署、6 新 env、24hr 監控)
npm audit: 3 vuln → 0(transitive AWS SDK xml chain)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:47:28 +08:00

307 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Infra 設計
> **狀態**Phase 1 完工 — Phase 0.8b 只動 envNginx / docker-compose 結構不變。
>
> **配套**`design-doc.md` §7、`auth.md` §4CONVERTER_API_KEY 管理)。
---
## 1. Nginx 雙 vhost 分流
維持 Phase 1 設計(**Phase 0.8b 不動**
- **public vhost**443 對公網):只 proxy `/api/v1/*` + `/health`
- **internal vhost**(內部 IP 80proxy `/jobs/*` + `/queues/stats` + Web UI
```
┌─────────────────────────────────────────────────────────────┐
│ Nginx單一 process
│ │
│ ┌────────────────────────┐ ┌────────────────────────────┐ │
│ │ server { │ │ server { │ │
│ │ listen 443 ssl; │ │ listen 10.0.0.1:80; │ │
│ │ server_name │ │ server_name │ │
│ │ converter....com; │ │ converter-internal...; │ │
│ │ │ │ │ │
│ │ location /api/v1/ {} │ │ location /jobs {} │ │
│ │ location = /health {} │ │ location /queues/stats {} │ │
│ │ location / { │ │ location / { │ │
│ │ return 404; │ │ proxy_pass web:3000; │ │
│ │ } │ │ } │ │
│ │ } │ │ } │ │
│ │ (public vhost) │ │ (internal vhost, 內網 IP) │ │
│ └───────────┬─────────────┘ └────────────┬────────────────┘ │
└──────────────┼──────────────────────────────┼───────────────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────┐
│ Task Scheduler (:4000) │
│ - /api/v1/* API key 保護,僅 public vhost 轉入)│
│ - /jobs/* (無 auth僅 internal vhost 轉入) │
│ - /jobs/*/eventsSSE
│ - /health, /queues/stats │
└──────────────────────────────────────────────────┘
```
---
## 2. Nginx 完整設定(不變)
```nginx
# /etc/nginx/conf.d/converter.conf
upstream scheduler_upstream {
server scheduler:4000;
keepalive 32;
}
# Public vhost
server {
listen 443 ssl http2;
server_name converter.innovedus.com;
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
location /api/v1/ {
proxy_pass http://scheduler_upstream;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_request_buffering off; # 大檔 stream
proxy_read_timeout 300s;
client_max_body_size 600M; # multipart 上限略大於 500MB
}
location = /health {
proxy_pass http://scheduler_upstream;
}
location / {
return 404 '{"error":{"code":"not_found","message":"Not found"}}';
default_type application/json;
}
}
# Internal vhost
server {
listen 10.0.0.1:80;
server_name converter-internal.innovedus.com;
location /jobs {
proxy_pass http://scheduler_upstream;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_buffering off; # SSE 需要
}
location /queues/stats {
proxy_pass http://scheduler_upstream;
}
location / {
proxy_pass http://web:3000;
}
}
```
---
## 3. docker-compose.yml 環境變數變動
### 3.1 Phase 0.8b 移除
```yaml
# 對外 API auth 不再走 OAuth
- MEMBER_CENTER_ISSUER
- MEMBER_CENTER_JWKS_URL
- KNERON_CONVERTER_AUDIENCE
- JWKS_CACHE_MAX_AGE_MS
- JWKS_COOLDOWN_MS
- JWT_CLOCK_TOLERANCE_SEC
```
### 3.2 Phase 0.8b 新增
```yaml
- CONVERTER_API_KEY=${CONVERTER_API_KEY} # 64 hex chars from `openssl rand -hex 32`
```
### 3.3 保留不動promote 需要)
```yaml
- MEMBER_CENTER_TOKEN_URL=${MEMBER_CENTER_TOKEN_URL}
- KNERON_CONVERTER_CLIENT_ID=${KNERON_CONVERTER_CLIENT_ID}
- KNERON_CONVERTER_CLIENT_SECRET=${KNERON_CONVERTER_CLIENT_SECRET}
- FILE_ACCESS_AGENT_BASE_URL=${FILE_ACCESS_AGENT_BASE_URL}
- FILE_ACCESS_AGENT_AUDIENCE=${FILE_ACCESS_AGENT_AUDIENCE}
- PROMOTE_TIMEOUT_MS=${PROMOTE_TIMEOUT_MS:-300000}
- OAUTH_TOKEN_REFRESH_SKEW_MS=${OAUTH_TOKEN_REFRESH_SKEW_MS:-60000}
- OAUTH_TOKEN_TIMEOUT_MS=${OAUTH_TOKEN_TIMEOUT_MS:-10000}
```
### 3.4 既有(不動)
```yaml
- PORT=4000
- NODE_ENV=${NODE_ENV:-development}
- REDIS_URL=${REDIS_URL}
- STORAGE_BACKEND=minio
- MINIO_*
- CONVERTER_TENANT_ID=${CONVERTER_TENANT_ID:-} # Phase 0.8b 仍保留promote 流程仍可能用)
- API_V1_RATE_LIMIT_WINDOW_MS=${API_V1_RATE_LIMIT_WINDOW_MS:-300000}
- API_V1_RATE_LIMIT_MAX=${API_V1_RATE_LIMIT_MAX:-300}
- MULTIPART_MODEL_MAX_BYTES=${MULTIPART_MODEL_MAX_BYTES:-524288000}
- MULTIPART_REF_IMAGE_MAX_BYTES=${MULTIPART_REF_IMAGE_MAX_BYTES:-10485760}
- MULTIPART_REF_IMAGES_MAX_COUNT=${MULTIPART_REF_IMAGES_MAX_COUNT:-100}
- MAX_CONCURRENT_UPLOADS=${MAX_CONCURRENT_UPLOADS:-5}
- UPLOAD_RETRY_AFTER_SECONDS=${UPLOAD_RETRY_AFTER_SECONDS:-30}
```
### 3.5 變動移除原因
| Env | 為什麼移除 | Phase 1 用途 |
|-----|----------|-------------|
| `MEMBER_CENTER_ISSUER` | API key 不需要驗 issuer | OAuth resource server 驗 iss claim |
| `MEMBER_CENTER_JWKS_URL` | API key 不需要 JWKS | OAuth JWT 簽章驗證 |
| `KNERON_CONVERTER_AUDIENCE` | API key 不需要驗 aud | OAuth 驗 token 是給自己的 |
| `JWKS_*` | 沒有 JWKS cache 了 | JWKS 內部 cache 參數 |
| `JWT_CLOCK_TOLERANCE_SEC` | 沒有 JWT 驗證了 | JWT exp 時鐘容忍 |
---
## 4. `.env.example` 改動
### 4.1 移除段OAuth resource server
```bash
# === OAuth (Member Center) === ← 整段移除
MEMBER_CENTER_ISSUER=...
MEMBER_CENTER_JWKS_URL=...
# === Converter identity (Resource Server) === ← 整段移除
KNERON_CONVERTER_AUDIENCE=...
# === JWKS cache === ← 整段移除
JWKS_CACHE_MAX_AGE_MS=600000
JWKS_COOLDOWN_MS=30000
JWT_CLOCK_TOLERANCE_SEC=60
```
### 4.2 新增段
```bash
# === Phase 0.8b: API key for visionA → converter ===
# 用 `openssl rand -hex 32` 產 64 hex chars
# 雙端必須對齊visionA `.env.stage` 的 VISIONA_CONVERTER_API_KEY 同值
# 絕不進 git / log / Slack
CONVERTER_API_KEY=
```
### 4.3 保留段不變promote 用)
```bash
# === Member Center token endpointconverter → FAA promote 用)===
MEMBER_CENTER_TOKEN_URL=https://auth.innovedus.com/oauth/token
# === Converter identity (OAuth Clientpromote 用) ===
KNERON_CONVERTER_CLIENT_ID=kneron_converter
KNERON_CONVERTER_CLIENT_SECRET=change-me
# === File Access Agent ===
FILE_ACCESS_AGENT_BASE_URL=https://files.nas.internal
FILE_ACCESS_AGENT_AUDIENCE=file_access_api
# === Promote / OAuth Client tunables ===
PROMOTE_TIMEOUT_MS=300000
OAUTH_TOKEN_REFRESH_SKEW_MS=60000
OAUTH_TOKEN_TIMEOUT_MS=10000
# === Rate Limit ===
API_V1_RATE_LIMIT_WINDOW_MS=300000
API_V1_RATE_LIMIT_MAX=300
# === Multipart upload ===
MULTIPART_MODEL_MAX_BYTES=524288000
MULTIPART_REF_IMAGE_MAX_BYTES=10485760
MULTIPART_REF_IMAGES_MAX_COUNT=100
MAX_CONCURRENT_UPLOADS=5
UPLOAD_RETRY_AFTER_SECONDS=30
```
---
## 5. 部署順序Phase 0.8b
**重要**:錯誤順序會讓 stage 整段 down。正確順序
```
Step 1: converter 端先實作完 + deploy
- 砍 OAuth middleware、加 API key middleware
- 加 /result endpoint
- 設 CONVERTER_API_KEY env
- 此時 converter 對外只認 API keyOAuth 已移除)
- 但既有 visionA stage 還在用 OAuth → 會撞 401
⚠️ 此 Step 應在 visionA stage 跑得通 OAuth 之前先完成(既然 visionA OAuth 還沒整合通過、本來就 401
Step 2: 驗證 converter 新 endpoint 可用
- curl 打 GET /api/v1/jobs/<某 completed job>/result 帶 Bearer <CONVERTER_API_KEY>
- 確認 200 + NEF binary stream
- curl 打 POST /api/v1/jobs 用同把 key
- 確認 201 + job_id
Step 3: visionA backend deploy已 ready、commit 9e29ebf
- VISIONA_CONVERTER_API_KEY env 跟 CONVERTER_API_KEY 對齊
- visionA 用 API key 打 converter、走新的 GetResult endpoint
Step 4: e2e 驗證
- User upload → init → poll → promote → download
- 全綠 = 完成
```
### 5.1 注意5/9 stage 狀態
Phase 1 OAuth 從未在 stage 跑通MC scope 沒註冊)。所以 Phase 0.8b 切換對「實際 e2e」是 **net positive**(從未 work → 開始 work。Stage 不會有「OAuth 過了改 API key 變成 401」的 regression。
---
## 6. 安全配置
### 6.1 CONVERTER_API_KEY
詳見 `auth.md` §4。
重點:
- 每環境獨立dev / stage / prod
- 64 hex chars`openssl rand -hex 32`
- 雙端對齊visionA + converter
- 絕不進 git
- Rotation 流程:手動同步 .env + redeploy
### 6.2 Sec C1 暫緩(既有風險、不變)
`.env` 一度被 commit 進 git history5/2 健檢發現),已加入 `.gitignore` 但 history 仍可追溯。
**Phase 0.8b 階段**
- 新增 `CONVERTER_API_KEY` 時注意**不要進 git**
- Phase 1 ready 後做一次 git history rewrite + 全 secret rotate包括新加的 CONVERTER_API_KEY、既有的 OAuth client_secret、MinIO 等)
---
## 7. CI/CD 影響
**無需改 CI**
- 既有 GitHub Actions 設定不變
- 新加 `CONVERTER_API_KEY` 到 stage / prod secrets managerVault / k8s secret / docker secret
- dev 用 `.env`gitignored
---
## 8. Phase 2 預留
- 多 instance 部署rate limiter 需從 process-local memory 改 Redis store
- 多 caller可考慮加回 OAuth resource serverAPI key + OAuth 並存模式)
- Secrets manager 自動 rotation整合 HashiCorp Vault / AWS Secrets Manager