Bruno Arine

ONNX Models Segmentation Fault

I’m using YOLOX for inference, and I’m getting unexplainable ONNX runtime errors 15% of the time.

2025-01-30 14:30:46,845 E [78] azmlinfsrv - Encountered Exception: Traceback (most recent call last):
  File "/opt/miniconda/envs/amlenv/lib/python3.9/site-packages/azureml_inference_server_http/server/user_script.py", line 132, in invoke_run
    run_output = self._wrapped_user_run(**run_parameters, request_headers=dict(request.headers))
  File "/opt/miniconda/envs/amlenv/lib/python3.9/site-packages/azureml_inference_server_http/server/user_script.py", line 156, in <lambda>
    self._wrapped_user_run = lambda request_headers, **kwargs: self._user_run(**kwargs)
  File "/var/azureml-app/endpoint/score.py", line 88, in run
    overlay, boxes, labels, scores = inference_fun(data)
  File "/var/azureml-app/endpoint/yolox_inference.py", line 483, in detect
    output = self.session.run(None, ort_inputs)
  File "/opt/miniconda/envs/amlenv/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'/backbone/C3_n3/conv3/conv/Conv' Status Message: /onnxruntime_src/onnxruntime/core/common/safeint.h:17 static void SafeIntExceptionHandler<onnxruntime::OnnxRuntimeException>::SafeIntOnOverflow()

Just in case, I checked that the onnx and onnxruntime versions match between my training component environment and the inference environment. and if you re run the exact same request (same image) the issue persists. It fails 15% of the time but the payload is always the same!

The culprit was a conv node in the loaded ONNX model. The solution was disabling brute-force search for the best algorithm on that node.

if "CUDAExecutionProvider" in onnxruntime.get_available_providers():
        providers=[("CUDAExecutionProvider", {"cudnn_conv_algo_search": "DEFAULT"})]
else:
        providers = ["CPUExecutionProvider"]
session = onnxruntime.InferenceSession(onnx_file, providers=providers)

Related Posts