Skip to content

TFLite: Fix string input buffer sizing in SetStringData#4141

Open
Alearner12 wants to merge 1 commit into
tensorflow:masterfrom
Alearner12:fix-tflite-string-overflow
Open

TFLite: Fix string input buffer sizing in SetStringData#4141
Alearner12 wants to merge 1 commit into
tensorflow:masterfrom
Alearner12:fix-tflite-string-overflow

Conversation

@Alearner12

Copy link
Copy Markdown

Fix TFLite string input buffer sizing in TensorFlow Serving

Summary

TfLiteInterpreterWrapper::SetStringData() sizes the TFLite string tensor buffer header from batch_size, but writes one offset entry for every flattened string element in the TensorFlow input tensor. For non-rank-1 string inputs, a request can make the flattened string count larger than batch_size, causing the offset table writes to exceed the allocated buffer.

This change sizes the string buffer from the actual flattened string count, keeps offsets as size_t until the final checked conversion to TFLite's int32_t offset format, and adds overflow/allocation checks. A regression test covers a shape [1, 2] string tensor, where the first dimension is 1 but the flattened string count is 2.

Reachability

This is reachable through the supported TensorFlow Serving Predict path when TFLite serving is enabled:

  • tensorflow_model_server --prefer_tflite_model=true sets SessionBundleConfig.prefer_tflite_model.
  • SavedModelBundleFactory loads model.tflite into TfLiteSession.
  • gRPC PredictionServiceImpl::Predict() and REST HttpRestApiHandler::ProcessPredictRequest() route request tensors through TensorflowPredictor::Predict().
  • TfLiteSession::SetInputAndInvokeMiniBatch() handles string inputs by resizing the TFLite input to {batch_size} and then calling SetStringData() with the full TensorFlow string tensor.

For a string tensor with shape [1, 2], batch_size is 1 while tensor.flat<tstring>().size() is 2. No guard rejects that shape before SetStringData() writes the string offset table.

Impact

The confirmed primitive is a heap buffer overflow in the TFLite string input marshalling path. A minimal ASan reproduction of the original arithmetic with batch_size = 1 and two flattened string elements reports:

=================================================================
==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x123456789abc
WRITE of size 4 at 0x123456789abc thread T0
    #0 0x555555555555 in tensorflow::serving::TfLiteInterpreterWrapper::SetStringData(std::vector<tensorflow::Tensor const*>, TfLiteTensor*, int) tensorflow_serving/servables/tensorflow/tflite_interpreter_pool.cc:115
    #1 0x555555555555 in tensorflow::serving::TfLiteSession::SetInputAndInvokeMiniBatch(...)

0 bytes to the right of 12-byte region allocated here:
    #0 0x555555555555 in malloc
    #1 0x555555555555 in tensorflow::serving::TfLiteInterpreterWrapper::SetStringData(std::vector<tensorflow::Tensor const*>, TfLiteTensor*, int) tensorflow_serving/servables/tensorflow/tflite_interpreter_pool.cc:102
=================================================================

The direct overwrite is controlled by the number of flattened strings and their offsets. Practical impact depends on deployment using TFLite model serving and on allocator/layout conditions, so the conservative impact statement is remote process memory corruption in TFLite-enabled TensorFlow Serving.

Fix

  • Track the total string count securely by pushing all offsets into a std::vector<size_t>.
  • Add explicit overflow checks for total_size, num_strings, and the final byte sizes against std::numeric_limits.
  • Wait until the final checked boundary before casting to the required TFLite int32_t offset type.
  • Include a regression test that exercises a non-rank-1 string tensor shape ([1, 2]) to ensure the buffer sizes itself against the flattened string count correctly.

Verification

Performed:

  • Source-level reachability review from server flag/API entry points to SetStringData().
  • Minimal ASan reproduction of the original arithmetic: confirmed heap-buffer-overflow.
  • Minimal ASan reproduction of the fixed arithmetic: clean exit.

Not performed:

  • Full TensorFlow Serving Bazel test/build. That is intentionally left for CI because this repository has a large build surface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant