Skip to content

Create request usage entry when total_tokens is unset#3696

Closed
aditya-786 wants to merge 1 commit into
openai:mainfrom
aditya-786:fix/usage-entry-zero-total
Closed

Create request usage entry when total_tokens is unset#3696
aditya-786 wants to merge 1 commit into
openai:mainfrom
aditya-786:fix/usage-entry-zero-total

Conversation

@aditya-786

Copy link
Copy Markdown

Summary

Usage.add() only recorded a per-request entry in request_usage_entries when total_tokens > 0. A single request that reports input_tokens/output_tokens but leaves total_tokens at 0 still aggregated the top-level counters (requests, input_tokens, output_tokens) but created no request_usage_entries entry.

This contradicts the field's own documented invariant:

Each call to add() automatically creates an entry in this list if the added usage represents a new request (i.e., has non-zero tokens).

A request with input_tokens=100, total_tokens=0 has non-zero tokens, so an entry should be created. This is reachable in practice: total_tokens is passed through from the provider in the Chat Completions / LiteLLM adapters (e.g. chatcmpl_stream_handler.py builds total_tokens=usage.total_tokens or 0), so a provider that omits the total yields input_tokens>0, total_tokens=0. The result is a request_usage_entries list whose count and token sums disagree with the aggregate totals, making per-request cost/context-window accounting silently incomplete for that request.

The fix keys the entry off any non-zero token count (total_tokens, input_tokens, or output_tokens).

Test plan

Added test_usage_add_preserves_single_request_when_total_tokens_unset: a single request with input_tokens=100, output_tokens=200, total_tokens=0 now creates one RequestUsage entry. It fails before this change (request_usage_entries is empty) and passes after. The existing test_usage_add_ignores_zero_token_requests (all-zero request → no entry) still passes. make format, make lint, mypy, and the tests/test_usage.py suite pass locally.

Issue number

N/A

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

Usage.add() only recorded a per-request entry in request_usage_entries
when total_tokens > 0, so a request reporting input/output tokens but
leaving total_tokens at 0 (some providers pass total_tokens through
without populating it, e.g. via the Chat Completions / LiteLLM adapters)
aggregated the top-level counters but created no entry, contradicting the
field's documented invariant that an entry is created when the usage
"has non-zero tokens". Key the entry off any non-zero token count.
@seratch

seratch commented Jun 29, 2026

Copy link
Copy Markdown
Member

Can you share the repro steps with real APIs? If this doesn't actually happen, we'd like to hold off adding the logic.

@aditya-786

Copy link
Copy Markdown
Author

Fair question. I don't have a repro against the OpenAI API itself; OpenAI always returns total_tokens, so it never triggers there.

It comes from the Chat Completions and LiteLLM adapters, which copy total_tokens straight from the provider's usage (openai_chatcompletions.py, litellm_model.py) instead of deriving it from input+output. When a provider reports prompt/completion tokens but leaves total_tokens at 0 or omits it, the resulting Usage has input_tokens/output_tokens > 0 and total_tokens == 0. add() still folds those into the aggregate, but the old other.total_tokens > 0 guard skips the per-request entry, so request_usage_entries ends up inconsistent with the totals it's meant to break down.

I ran into it through that inconsistency rather than a specific public provider, so I understand the hesitation. If you'd rather not add provider-defensive logic here, I can close it, or narrow it to just the request_usage_entries path. Your call.

@seratch

seratch commented Jun 29, 2026

Copy link
Copy Markdown
Member

If the situation arises with popular providers' models, we are happy to have some logic like you suggested. Unless we get sufficient information about the real use cases, I'd like to hold off considering this.

@aditya-786

Copy link
Copy Markdown
Author

Makes sense, happy to hold off until there's a concrete provider trace to justify it.

For the record, in case it comes up later: the path I had in mind is LiteLLM's non-streaming Gemini/Vertex chat usage builder, which sets total_tokens=usage_metadata.get("totalTokenCount", 0) with a hard 0 default and no fallback to prompt + completion (vertex_and_google_ai_studio_gemini.py), and that Usage is attached to the ModelResponse verbatim — LiteLLM's Usage model doesn't backfill the total either. So if Google's usageMetadata ever carries promptTokenCount/candidatesTokenCount but omits totalTokenCount, you'd get non-zero input/output with total_tokens=0, and the old total_tokens > 0 guard drops the per-request entry while still folding it into the aggregate.

Notably LiteLLM already added exactly this ... or (prompt + completion) fallback on the Vertex batch path (BerriAI/litellm#27912) but not on the chat path, so the inconsistency is real at the code level — I just don't have a captured live response showing the omitted field, which is the part you'd reasonably want before changing behavior here.

I don't want to land a change on a path no provider is observed to hit, so I'm fine closing this. Happy to leave it open purely as a reference if you'd prefer, and I'll reopen with an actual payload if I ever catch one in the wild — your call.

@seratch

seratch commented Jun 29, 2026

Copy link
Copy Markdown
Member

Thanks for your prompt reply. Let me close this one for now. Thanks again for your interest.

@seratch seratch closed this Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants