Using the Flux Model | Deepgram's Docs

Requirements

Please familiarize yourself with these general requirements before attempting to deploy Flux to your self-hosted Deepgram instances.

The Flux model must be hosted on a separate instance from other Deepgram speech-to-text (STT) and text-to-speech (TTS) models.
Your Deepgram API and Engine TOML files must explicitly enable Flux
Flux doesn’t require and is not compatible with any other models (e.g. diarizer, entity detector)
You must use Deepgram container images from October 2025 or later (release-251015)
The Flux model file must be added to your engine models directory

Flux GPU Resource Allocation

Flux must run in isolation from other Deepgram models. Flux requires a certain amount of GPU memory per stream, and this memory is allocated on Engine startup. By default, Flux allocates all GPU memory for Flux streams.

Do not enable Flux in the Engine configuration file unless you intend to use it.

An Engine running Flux is not designed to serve any other Deepgram traffic, including other STT models (such as Nova-3), TTS, or supplementary models. Attempting to do so will result in resource exhaustion and request failures due to lack of GPU memory.

For example, if you enable Flux, and then submit a Nova-3 request on the same GPU, you will encounter out-of-memory errors such as:

...called `Result::unwrap()` on an `Err` value: Internal torch error: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of
21.95 GiB of which 13.00 MiB is free. Process 6456 has 1006.00 MiB memory in use. Process 6583 has 20.94 GiB memory in use. Of the allocated memory
788.04 MiB is allocated by PyTorch, and 13.96 MiB is reserved by PyTorch but unallocated...

Provision separate infrastructure for Flux, and ensure no other Deepgram models are present on the same Engine.

Flux Model File

For Deepgram self-hosted setups, there is a single model file that you’ll need for Flux. You can request this from your Deepgram account representative or Deepgram Support.

Enable Flux in Deepgram Self-Hosted Deployment

Flux requires a couple of configuration changes in your self-hosted Deepgram deployment.

In your Deepgram Engine configuration, make sure that Flux is enabled.

Deepgram Engine Configuration

1 [flux]
2 enabled = true

Do not add or enable [flux] unless this server is dedicated exclusively to Flux.

In your Deepgram API configuration, make sure that the /v2/listen endpoint is enabled. This endpoint is new for Flux. Earlier Deepgram Speech-to-Text (STT) models (including Nova-3 and Nova-2) are served via the /v1/listen endpoint.

Deepgram API Configuration

1 [features]
2 listen_v2 = true

Deepgram Self-Hosted Logs

The following log entries may be useful in identifying Flux behaviors.

Ensure Flux Model is Loaded

To ensure that the Flux model is being loaded by your Deepgram self-hosted instance, you can check the engine container logs.

Use the appropriate tool to find your engine container, and obtain the logs for that container.

For example:

# 🐳 Docker: Find container ID or name, and get logs
docker ps
docker logs <containerIdOrName>
# ⛴️ Kubernetes: Find the Engine Pod and get logs for it
kubectl --namespace dg-self-hosted get pod
kubectl --namespace dg-self-hosted logs engine-12345

During the startup of the engine container, to indicate a successful model load, look for the log entry similar to the following.

INFO impeller::charmer::output_processor: flux-subprocess/21 sttreaming.inference_process [<string>:62]: Model loaded successfully, ready to accept requests.
INFO impeller::flux::prewarm: Finished prewarming Flux model
INFO impeller: Starting instance. instance=55c02da4-e79f-4bf4-bf73-cd9dbfb21a18

Potential Issues

Flux Present but Disabled

If you have the Flux model file in your engine models directory, but the feature is disabled in the engine.toml configuration file, you will still see this message appear in your engine logs, provided you’re running a container image that supports Flux.

INFO load_model{path=/models/flux-general-en.90eea7fc.dg}: impeller::model_suppliers::autoload: Inserting model key=AsrKey { name: "flux-general-en", version: "2025-09-10.68227", languages: List([Language("en"), Language("en-au"), Language("en-ca"), Language("en-gb"), Language("en-in"), Language("en-nz"), Language("en-us")]), aliases: {}, tags: [], uuid: 90eea7fc-ff94-47a9-81f4-57b693aa500f, formatted: false, mode: TurnTaking, architecture: Some(Flux) }

Flux Model File Missing

If you see the errors below, that indicates the Flux model file is missing from your models directory. Please ask your Deepgram account representative or support team for assistance in obtaining the Flux model file.

ERROR impeller: Engine was configured to run the Flux model, but we failed to load it. err=Failure(Can't find flux-general-en model)
thread 'main-rt-5' panicked at /build/src/lib.rs:1119:17:
Error loading Flux model: failure: Can't find flux-general-en model
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Error: task 20 panicked with message "Error loading Flux model: failure: Can't find flux-general-en model"
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: driver shutting down
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x70242061e170 in /libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xfa (0x7024205c16dc in /libtorch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3cc (0x70249564208c in /libtorch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xae (0x702495640dbe in /libtorch/lib/libc10_cuda.so)
frame #4: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4c (0x702495640f4c in /libtorch/lib/libc10_cuda.so)
frame #5: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x75 (0x702495641d35 in /libtorch/lib/libc10_cuda.so)
frame #6: <unknown function> + 0x753c6 (0x7024956513c6 in /libtorch/lib/libc10_cuda.so)
frame #7: <unknown function> + 0x7591c (0x70249565191c in /libtorch/lib/libc10_cuda.so)
frame #8: c10::cuda::getStreamFromPool(bool, signed char) + 0x13 (0x702495652c43 in /libtorch/lib/libc10_cuda.so)
frame #9: <unknown function> + 0x3573dab (0x576927e72dab in /bin/impeller)
frame #10: <unknown function> + 0x1939036 (0x576926238036 in /bin/impeller)
frame #11: <unknown function> + 0x1961017 (0x576926260017 in /bin/impeller)
frame #12: <unknown function> + 0x3297caf (0x576927b96caf in /bin/impeller)
frame #13: <unknown function> + 0x8a19a (0x70249c2d519a in /usr/lib64/libc.so.6)
frame #14: clone + 0x44 (0x70249c359534 in /usr/lib64/libc.so.6)

Flux Not Enabled in Engine

If the Flux model is failing to load, you may see this warning from the engine (aka. impeller) logs. This suggests that the Engine.toml configuration has not had Flux enabled, but the Flux model is in your models directory.

INFO load_model{path=/models/flux-general-en.90eea7fc.dg}: impeller::model_suppliers::autoload: new
WARN load_model{path=/models/flux-general-en.90eea7fc.dg}: impeller::model_suppliers::autoload: Failed to load model err=Manifest(TomlError { message: "unknown variant `turn-taking`, expected one of `all`, `batch`, `streaming`", raw: Some("architecture = \"flux\"\nformatted = false\ngeneration = \"alpha\"\nlanguages = [\"en\", \"en-AU\", \"en-CA\", \"en-GB\", \"en-IN\", \"en-NZ\", \"en-US\"]\nmode = \"turn-taking\"\nmultilingual = false\nname = \"flux-general-en\"\nuuid = \"90eea7fc-ff94-47a9-81f4-57b693aa500f\"\nversion = \"2025-09-10.68227\"\n"), keys: ["mode"], span: Some(141..154) })

Older Deepgram Container Images

If you see the following error in your logs, you may be running an old Deepgram engine container image. Please make sure that you are using an engine container image that was released with Flux (release-251015), or later.

Error: TOML parse error at line 57, column 1
   |
57 | [flux]
   | ^^^^^^
missing field `socket_path`

Access Flux Endpoint

The Flux model is accessed by the WebSocket protocol only, using the ws://<ipOrHostname>/v2/listen URL. This URL path is exposed by the Deepgram API server container, similar to the other Deepgram APIs.

Once you’ve verified that Flux is installed and loaded by the Deepgram self-hosted services, please follow the developer documentation.