jim800121chen 4d381c0b50 feat(task-scheduler): Phase 1 — modularize server + add OAuth/JWKS + /api/v1/* routes
Refactor server.js (647 → 99 lines) into 30+ modules under src/:
- auth/: JWKS validation, JWT middleware, OAuth client_credentials
- routes/v1/: jobs (POST/GET/:id) + promote with input validation
- routes/legacy.js: existing /jobs multipart path (backward compatible)
- services/: jobService, healthService, sseService, statusMapper,
  doneListener
- middleware/: requestId, errorHandler, perClientRateLimit,
  uploadConcurrency, upload (multer + storage)
- redis/: Lua scripts for atomic claim/release_active_job
- storage/: local + minio adapters; fileAccessAgent/: PUT promote client
- config.js: env var validation with fail-fast

Phase 1 features (T1–T11):
- T1 Auth middleware + JWKS (Member Center OAuth2 resource server)
- T2 OAuth client (Member Center client_credentials, Basic auth)
- T3 /api/v1/* router skeleton
- T4 server.js refactor (legacy endpoints fully preserved, real-Redis
  regression verified — existing worker consumer group untouched)
- T5 POST /api/v1/jobs (multipart, OWASP-audited, 2 Critical / 6 Major
  fixed; Risk-A/B documented as accepted)
- T6 GET /api/v1/jobs + GET /:id (cursor pagination, ETag, IDOR-safe)
- T7 POST /jobs/:id/promote (FAA PUT with own service token, 300s
  timeout, fail-fast on missing FAA URL)
- T8 /health upgrade (healthy/degraded/unhealthy + 30s background cache)
- T9 stage_timings (release_active_job in terminal states)
- T10 env + Docker integration (MULTIPART_* + concurrency limiter)
- T11 README (498 lines) + OpenAPI 3.0 spec (1588 lines)

Tests: 630 pass across 29 suites. Updated Dockerfile + .dockerignore +
docker-compose.yml env passthrough (no hardcoded secrets, fail-fast on
missing required vars).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:55:05 +08:00

201 lines
8.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

###############################################################################
# Task Scheduler 環境變數範本Phase 1 完整版T10 收斂)
#
# 三類分區(依顯示順序):
# 1. 必填production 必須設真實值)— 缺漏會 fail-fastprocess exit code 1
# 2. 可選(合理預設)— 不設會用程式內 default
# 3. 開發 placeholder — 用 RFC 2606 `.invalid` TLD 確保不會誤連到真實服務
#
# 部署準則:
# - 切勿 commit `.env`(已在 .gitignore歷史 commit 待 D7 處理)
# - production 用 secret managerVault / AWS Secrets Manager不要直接設環境變數
# - 任何含 `REPLACE-ME` 字樣或 `.invalid` TLD 的值,部署前必須替換
###############################################################################
# =============================================================================
# 1. 應用基本設定
# =============================================================================
# 監聽 port必填但有合理預設
PORT=4000
# Node 環境development / staging / production
# - production 時 FILE_ACCESS_AGENT_BASE_URL 強制 HTTPS
NODE_ENV=development
# Log 等級debug / info / warn / error
LOG_LEVEL=info
# =============================================================================
# 2. Redis必填
# =============================================================================
# - 不設會用 default但實際部署需指向真實 Redis
# - 帶 passwordredis://:password@host:6379
REDIS_URL=redis://localhost:6379
# =============================================================================
# 3. Job 資料目錄local storage 用)
# =============================================================================
# - STORAGE_BACKEND=local 時,此目錄為 worker / scheduler 共用 volume
# - STORAGE_BACKEND=minio 時,仍會用此目錄存暫時檔(如 health check
JOB_DATA_DIR=/data/jobs
# =============================================================================
# 4. CORS必填
# =============================================================================
FRONTEND_URL=http://localhost:3000
# =============================================================================
# 5. Storage backend必填
# =============================================================================
# - "local":用 JOB_DATA_DIR 共用 volume單機開發 / docker-compose
# - "minio":用 MinIO / S3-compatibleproduction 推薦POST /api/v1/jobs 必須 minio
STORAGE_BACKEND=local
# =============================================================================
# 6. MinIO / S3 設定
# =============================================================================
# - STORAGE_BACKEND=minio 時為必填
# - STORAGE_BACKEND=local 時可留空
# - 注意production 不要把真實 secret 寫在這裡,改用 secret manager
MINIO_ENDPOINT_URL=http://192.168.0.130:9000
MINIO_BUCKET=convertet-working-space
MINIO_ACCESS_KEY=convuser
MINIO_SECRET_KEY=REPLACE-ME-IN-PRODUCTION
MINIO_REGION=us-east-1
# bucket lifecycle— 上傳後 N 天自動清,避免 orphan 累積
MINIO_LIFECYCLE_DAYS=7
# =============================================================================
# 7. OAuth / Member Center必填
# =============================================================================
#
# ⚠️ 下方 `*.invalid` 主機名都是 RFC 2606 保留 TLDDNS 永不解析。
# 本地開發跑「不需 OAuth 的 legacy /jobs 流程」可直接照抄;
# production 部署前務必替換為真實 Member Center URL否則 token 驗證 / 取得會 DNS 失敗。
#
# 三組 URL 通常來自同一個 Member Center 服務:
# - ISSUERJWT 的 iss claim 比對基準
# - JWKS_URL取公鑰用做 JWT 簽章驗證
# - TOKEN_URLConverter 自己取 token 用client_credentials grant
MEMBER_CENTER_ISSUER=https://auth.example.invalid
MEMBER_CENTER_JWKS_URL=https://auth.example.invalid/.well-known/jwks
MEMBER_CENTER_TOKEN_URL=https://auth.example.invalid/oauth/token
# =============================================================================
# 8. Converter 身份(必填)
# =============================================================================
#
# Converter 同時是:
# - Resource Server接收 visionA-backend 的 tokenaudience 必須為 KNERON_CONVERTER_AUDIENCE
# - OAuth Client自己去 Member Center 取 token 打 File Access Agent身份用 client_id / secret
KNERON_CONVERTER_AUDIENCE=kneron_converter_api
KNERON_CONVERTER_CLIENT_ID=kneron_converter_dev
KNERON_CONVERTER_CLIENT_SECRET=REPLACE-ME-IN-PRODUCTION
# 若需 tenant 隔離,設此值;空字串代表不檢查 tenant claim
CONVERTER_TENANT_ID=
# =============================================================================
# 9. Scope 命名(可選,預設對齊 TDD §8
# =============================================================================
# 通常不需改;除非 Member Center 端命名不一樣
# CONVERTER_SCOPE_WRITE=converter:job.write
# CONVERTER_SCOPE_READ=converter:job.read
# =============================================================================
# 10. File Access Agent必填
# =============================================================================
#
# Promote 時 Converter 把產出 stream PUT 到 FAA。
# - URL 必須是合法 http(s) URLNODE_ENV=production 強制 https
# - 本地開發可用 placeholder.invalid TLD不影響非 promote 流程
FILE_ACCESS_AGENT_BASE_URL=https://files.example.invalid
FILE_ACCESS_AGENT_AUDIENCE=file_access_api
# =============================================================================
# 11. Promote 行為(可選)
# =============================================================================
# 單檔 PUT timeout毫秒。預設 300000300s覆蓋 500MB @ 5MB/s 最壞)。
# 部署環境檔案普遍較小可調低GB 級檔案可調高。
# PROMOTE_TIMEOUT_MS=300000
# =============================================================================
# 12. JWKS / JWT 行為(可選)
# =============================================================================
# 預設值對齊 TDD §5.1。
# JWKS_CACHE_MAX_AGE_MS=600000 # JWKS cache 有效期10 分鐘)
# JWKS_COOLDOWN_MS=30000 # 同 kid 連續 miss 的 cooldown30 秒)
# JWT_CLOCK_TOLERANCE_SEC=60 # 時鐘偏差容忍(秒)
# =============================================================================
# 13. OAuth Client cache可選
# =============================================================================
# OAUTH_TOKEN_REFRESH_SKEW_MS=60000 # token 距 expiresAt 還剩多少 ms 主動 refresh
# OAUTH_TOKEN_TIMEOUT_MS=10000 # 取 token timeout10s
# =============================================================================
# 14. Multipart 上傳上限可選T10 修 D5
# =============================================================================
#
# 為什麼用 env
# 不同部署環境記憶體配額差異大dev 容器 2GB / prod 16GB固定 500MB 不夠彈性。
# 調這些值不需改原始碼。
#
# 三個值都必須 > 0非法值會 fail-fast。
# MULTIPART_MODEL_MAX_BYTES=524288000 # 500MBmodel 檔案上限)
# MULTIPART_REF_IMAGE_MAX_BYTES=10485760 # 10MB單張 ref_image 上限)
# MULTIPART_REF_IMAGES_MAX_COUNT=100 # ref_images 張數上限
# =============================================================================
# 15. Upload concurrency可選T10 修 D5
# =============================================================================
#
# 為什麼需要:
# multer memoryStorage 把整份 multipart load 進 buffer每個並發 upload 吃掉
# model size 大小的 heap。5 並發 × 500MB ≈ 2.5GB heap4GB 容器有風險。
# per-process counter 限制同時間 multipart parse + handler 進行中的請求數量。
#
# 超過上限時:直接 503 service_busy + Retry-After header不 queue讓 client 主動 backoff。
# MAX_CONCURRENT_UPLOADS=5 # 同時間最多 5 個 upload 進行中
# UPLOAD_RETRY_AFTER_SECONDS=30 # 503 response 的 Retry-After
# =============================================================================
# 16. Per-client_id rate limit可選T3 起)
# =============================================================================
# 對 /api/v1/* 套用window 內每個 client_id 最多 max 個 request。
# 預設 5min / 300 req對齊 TDD §1.1)。
# API_V1_RATE_LIMIT_WINDOW_MS=300000
# API_V1_RATE_LIMIT_MAX=300