Three layered optimizations targeting Gemini-style 5MB base64 payloads where
RSS could balloon to tens of GB under concurrent load:
1. Byte-based param override (relay/common/override.go)
- Switch legacy/operations hot paths from common.Marshal round-trips and
map[string]any conversions to gjson/sjson on []byte directly.
- Avoids cloning 5MB strings during each Set/Delete operation.
2. strings.Builder for Gemini response markdown (relay/channel/gemini/relay-gemini.go)
- Replace string concatenation + strings.Join when assembling
"" content for inline image responses.
- Pre-allocates capacity from inline_data byte sizes.
3. Outbound BodyStorage + streaming Decoder (this commit's core)
- New relay/common/outbound_body.go helper wraps marshaled upstream bodies
in common.BodyStorage, allowing disk-cache mode to offload jsonData to
a temp file while waiting for upstream TTFB. The original []byte can
then be GC'd, removing ~5MB/req of heap residency during the longest
window of a request.
- All 7 relay handlers (gemini/claude/responses/embedding/image/compatible/
rerank) plus chat_completions_via_responses adopt the helper with
defer closer.Close() and explicit jsonData = nil.
- relay/common/relay_info.go: new UpstreamRequestBodySize so
relay/channel/api_request.go can populate req.ContentLength (lost when
body becomes a type-erased io.Reader).
- common/gin.go UnmarshalBodyReusable: when storage is disk-backed and
content-type is JSON, decode via DecodeJson(storage) instead of
storage.Bytes()+Unmarshal, removing one transient 5MB copy per request.
memory mode and form/multipart paths unchanged.
When proxying through another new-api instance, the upstream
X-Oneapi-Request-Id was overwriting the local one in client responses.
This adds a new `upstream_request_id` field to the logs table, captures
the upstream ID during relay, and filters it from being copied back to
the client. Frontend gains search/filter and detail display support.
This commit introduces a major architectural refactoring to improve quota management, centralize logging, and streamline the relay handling logic.
Key changes:
- **Pre-consume Quota:** Implements a new mechanism to check and reserve user quota *before* making the request to the upstream provider. This ensures more accurate quota deduction and prevents users from exceeding their limits due to concurrent requests.
- **Unified Relay Handlers:** Refactors the relay logic to use generic handlers (e.g., `ChatHandler`, `ImageHandler`) instead of provider-specific implementations. This significantly reduces code duplication and simplifies adding new channels.
- **Centralized Logger:** A new dedicated `logger` package is introduced, and all system logging calls are migrated to use it, moving this responsibility out of the `common` package.
- **Code Reorganization:** DTOs are generalized (e.g., `dalle.go` -> `openai_image.go`) and utility code is moved to more appropriate packages (e.g., `common/http.go` -> `service/http.go`) for better code structure.
- Add new context keys for user-related information
- Modify user cache and authentication middleware to populate context
- Refactor quota and notification services to use context-based user data
- Remove redundant database queries by leveraging context information
- Update various components to use new context-based user retrieval methods