Three layered optimizations targeting Gemini-style 5MB base64 payloads where
RSS could balloon to tens of GB under concurrent load:
1. Byte-based param override (relay/common/override.go)
- Switch legacy/operations hot paths from common.Marshal round-trips and
map[string]any conversions to gjson/sjson on []byte directly.
- Avoids cloning 5MB strings during each Set/Delete operation.
2. strings.Builder for Gemini response markdown (relay/channel/gemini/relay-gemini.go)
- Replace string concatenation + strings.Join when assembling
"" content for inline image responses.
- Pre-allocates capacity from inline_data byte sizes.
3. Outbound BodyStorage + streaming Decoder (this commit's core)
- New relay/common/outbound_body.go helper wraps marshaled upstream bodies
in common.BodyStorage, allowing disk-cache mode to offload jsonData to
a temp file while waiting for upstream TTFB. The original []byte can
then be GC'd, removing ~5MB/req of heap residency during the longest
window of a request.
- All 7 relay handlers (gemini/claude/responses/embedding/image/compatible/
rerank) plus chat_completions_via_responses adopt the helper with
defer closer.Close() and explicit jsonData = nil.
- relay/common/relay_info.go: new UpstreamRequestBodySize so
relay/channel/api_request.go can populate req.ContentLength (lost when
body becomes a type-erased io.Reader).
- common/gin.go UnmarshalBodyReusable: when storage is disk-backed and
content-type is JSON, decode via DecodeJson(storage) instead of
storage.Bytes()+Unmarshal, removing one transient 5MB copy per request.
memory mode and form/multipart paths unchanged.
Add detection, MIME type mapping, and dimension parsing for HEIC/HEIF
images via ISOBMFF ftyp brand inspection and ispe box parsing. Update
Gemini relay to accept these formats and refactor getImageConfig to
properly retry decoders using buffered data.
- Add StreamStatus type (relay/common) to track stream end reason
(done/timeout/client_gone/scanner_error/eof/panic/ping_fail) and
accumulate soft errors during streaming via sync.Once + sync.Mutex.
- Add StreamResult (relay/helper) as the callback interface: adapters
call sr.Error() for soft errors, sr.Stop() for fatal, sr.Done() for
normal completion. No early-return problem — multiple errors per chunk
are naturally supported.
- Refactor StreamScannerHandler callback from func(string) bool to
func(string, *StreamResult). All 9 channel adapters updated.
- Write stream_status into log other JSON field (admin-only) with
status ok/error, end_reason, error_count, and error messages.
- Frontend: display stream status in log detail expansion for admins.
- Introduced a new rule for the Billing Expression System, emphasizing the importance of reading `pkg/billingexpr/expr.md` for dynamic billing.
- Updated the billing expression logic to support new variables and improved handling of image and audio tokens.
- Enhanced the tiered billing functionality with versioning support for expressions and refined quota calculations.
- Added tests to validate the new billing expression features and ensure correctness in pricing calculations.
Use the native Gemini Models API (/v1beta/models) instead of the OpenAI-compatible
path when listing models for Gemini channels, improving compatibility with
third-party Gemini-format providers that don't implement OpenAI routes.
- Add paginated model listing with timeout and optional proxy support
- Select an enabled key for multi-key Gemini channels
- Introduced new OpenAI text models in `common/model.go`.
- Added `IsOpenAITextModel` function to check for OpenAI text models.
- Refactored token estimation methods across various channels to use estimated prompt tokens instead of direct prompt token counts.
- Updated related functions and structures to accommodate the new token estimation approach, enhancing overall token management.
Previously thoughtSignature was only attached to messages with function
calls. This change extends the feature to also attach thoughtSignature
to the first text part of assistant/model messages when no tool_calls
are present, ensuring compatibility with Gemini thinking models in
regular conversation scenarios.