Three layered optimizations targeting Gemini-style 5MB base64 payloads where
RSS could balloon to tens of GB under concurrent load:
1. Byte-based param override (relay/common/override.go)
- Switch legacy/operations hot paths from common.Marshal round-trips and
map[string]any conversions to gjson/sjson on []byte directly.
- Avoids cloning 5MB strings during each Set/Delete operation.
2. strings.Builder for Gemini response markdown (relay/channel/gemini/relay-gemini.go)
- Replace string concatenation + strings.Join when assembling
"" content for inline image responses.
- Pre-allocates capacity from inline_data byte sizes.
3. Outbound BodyStorage + streaming Decoder (this commit's core)
- New relay/common/outbound_body.go helper wraps marshaled upstream bodies
in common.BodyStorage, allowing disk-cache mode to offload jsonData to
a temp file while waiting for upstream TTFB. The original []byte can
then be GC'd, removing ~5MB/req of heap residency during the longest
window of a request.
- All 7 relay handlers (gemini/claude/responses/embedding/image/compatible/
rerank) plus chat_completions_via_responses adopt the helper with
defer closer.Close() and explicit jsonData = nil.
- relay/common/relay_info.go: new UpstreamRequestBodySize so
relay/channel/api_request.go can populate req.ContentLength (lost when
body becomes a type-erased io.Reader).
- common/gin.go UnmarshalBodyReusable: when storage is disk-backed and
content-type is JSON, decode via DecodeJson(storage) instead of
storage.Bytes()+Unmarshal, removing one transient 5MB copy per request.
memory mode and form/multipart paths unchanged.
32 lines
1.3 KiB
Go
32 lines
1.3 KiB
Go
package common
|
|
|
|
import (
|
|
"io"
|
|
|
|
"github.com/QuantumNous/new-api/common"
|
|
)
|
|
|
|
// NewOutboundJSONBody wraps the already-marshaled upstream request body into a
|
|
// BodyStorage. When disk cache is enabled and the payload exceeds the configured
|
|
// threshold, the data is written to a temp file and the original []byte can be
|
|
// GC'd, significantly reducing the heap residency while waiting for the
|
|
// upstream provider to respond (the dominant cost for large base64 payloads).
|
|
//
|
|
// In memory mode the underlying memoryStorage reuses the same backing array,
|
|
// so this is equivalent to bytes.NewReader(data) in terms of memory usage.
|
|
//
|
|
// The caller MUST invoke closer.Close() once the upstream call has finished
|
|
// (typically via defer) to release the disk file / memory accounting.
|
|
//
|
|
// The returned reader is wrapped with common.ReaderOnly to prevent the HTTP
|
|
// transport from prematurely closing the underlying BodyStorage. The returned
|
|
// size is meant to be propagated to http.Request.ContentLength because the
|
|
// type-erased io.Reader prevents net/http from auto-detecting it.
|
|
func NewOutboundJSONBody(data []byte) (body io.Reader, size int64, closer io.Closer, err error) {
|
|
storage, err := common.CreateBodyStorage(data)
|
|
if err != nil {
|
|
return nil, 0, nil, err
|
|
}
|
|
return common.ReaderOnly(storage), storage.Size(), storage, nil
|
|
}
|