Create request usage entry when total_tokens is unset#3696
Conversation
Usage.add() only recorded a per-request entry in request_usage_entries when total_tokens > 0, so a request reporting input/output tokens but leaving total_tokens at 0 (some providers pass total_tokens through without populating it, e.g. via the Chat Completions / LiteLLM adapters) aggregated the top-level counters but created no entry, contradicting the field's documented invariant that an entry is created when the usage "has non-zero tokens". Key the entry off any non-zero token count.
|
Can you share the repro steps with real APIs? If this doesn't actually happen, we'd like to hold off adding the logic. |
|
Fair question. I don't have a repro against the OpenAI API itself; OpenAI always returns It comes from the Chat Completions and LiteLLM adapters, which copy I ran into it through that inconsistency rather than a specific public provider, so I understand the hesitation. If you'd rather not add provider-defensive logic here, I can close it, or narrow it to just the |
|
If the situation arises with popular providers' models, we are happy to have some logic like you suggested. Unless we get sufficient information about the real use cases, I'd like to hold off considering this. |
|
Makes sense, happy to hold off until there's a concrete provider trace to justify it. For the record, in case it comes up later: the path I had in mind is LiteLLM's non-streaming Gemini/Vertex chat usage builder, which sets Notably LiteLLM already added exactly this I don't want to land a change on a path no provider is observed to hit, so I'm fine closing this. Happy to leave it open purely as a reference if you'd prefer, and I'll reopen with an actual payload if I ever catch one in the wild — your call. |
|
Thanks for your prompt reply. Let me close this one for now. Thanks again for your interest. |
Summary
Usage.add()only recorded a per-request entry inrequest_usage_entrieswhentotal_tokens > 0. A single request that reportsinput_tokens/output_tokensbut leavestotal_tokensat0still aggregated the top-level counters (requests,input_tokens,output_tokens) but created norequest_usage_entriesentry.This contradicts the field's own documented invariant:
A request with
input_tokens=100, total_tokens=0has non-zero tokens, so an entry should be created. This is reachable in practice:total_tokensis passed through from the provider in the Chat Completions / LiteLLM adapters (e.g.chatcmpl_stream_handler.pybuildstotal_tokens=usage.total_tokens or 0), so a provider that omits the total yieldsinput_tokens>0, total_tokens=0. The result is arequest_usage_entrieslist whose count and token sums disagree with the aggregate totals, making per-request cost/context-window accounting silently incomplete for that request.The fix keys the entry off any non-zero token count (
total_tokens,input_tokens, oroutput_tokens).Test plan
Added
test_usage_add_preserves_single_request_when_total_tokens_unset: a single request withinput_tokens=100,output_tokens=200,total_tokens=0now creates oneRequestUsageentry. It fails before this change (request_usage_entriesis empty) and passes after. The existingtest_usage_add_ignores_zero_token_requests(all-zero request → no entry) still passes.make format,make lint,mypy, and thetests/test_usage.pysuite pass locally.Issue number
N/A
Checks
make lintandmake format