Using the Flux Model
Flux is a purpose-built, low-latency streaming speech-to-text model tailored for voice agent use cases. This article describes how to ensure Flux is present in your self-hosted Deepgram environment, the configuration steps, and key considerations unique to Flux.
Requirements
Please familiarize yourself with these general requirements before attempting to deploy Flux to your self-hosted Deepgram instances.
- The Flux model must be hosted on a separate instance from other Deepgram speech-to-text (STT) and text-to-speech (TTS) models.
- Your Deepgram API and Engine TOML files must explicitly enable Flux
- Flux doesn’t require and is not compatible with any other models (e.g. diarizer, entity detector)
- You must use Deepgram container images from October 2025 or later (
release-251015) - The Flux model file must be added to your
enginemodels directory
Flux GPU Resource Allocation
Flux must run in isolation from other Deepgram models. Flux requires a certain amount of GPU memory per stream, and this memory is allocated on Engine startup. By default, Flux allocates all GPU memory for Flux streams.
Do not enable Flux in the Engine configuration file unless you intend to use it.
An Engine running Flux is not designed to serve any other Deepgram traffic, including other STT models (such as Nova-3), TTS, or supplementary models. Attempting to do so will result in resource exhaustion and request failures due to lack of GPU memory.
For example, if you enable Flux, and then submit a Nova-3 request on the same GPU, you will encounter out-of-memory errors such as:
Provision separate infrastructure for Flux, and ensure no other Deepgram models are present on the same Engine.
Flux Model File
For Deepgram self-hosted setups, there is a single model file that you’ll need for Flux. You can request this from your Deepgram account representative or Deepgram Support.
Enable Flux in Deepgram Self-Hosted Deployment
Flux requires a couple of configuration changes in your self-hosted Deepgram deployment.
In your Deepgram Engine configuration, make sure that Flux is enabled.
Do not add or enable [flux] unless this server is dedicated exclusively to Flux.
In your Deepgram API configuration, make sure that the /v2/listen endpoint is enabled. This endpoint is new for Flux.
Earlier Deepgram Speech-to-Text (STT) models (including Nova-3 and Nova-2) are served via the /v1/listen endpoint.
Deepgram Self-Hosted Logs
The following log entries may be useful in identifying Flux behaviors.
Ensure Flux Model is Loaded
To ensure that the Flux model is being loaded by your Deepgram self-hosted instance, you can check the engine container logs.
Use the appropriate tool to find your engine container, and obtain the logs for that container.
For example:
During the startup of the engine container, to indicate a successful model load, look for the log entry similar to the following.
Potential Issues
Flux Present but Disabled
If you have the Flux model file in your engine models directory, but the feature is disabled in the engine.toml configuration file,
you will still see this message appear in your engine logs, provided you’re running a container image that supports Flux.
Flux Model File Missing
If you see the errors below, that indicates the Flux model file is missing from your models directory.
Please ask your Deepgram account representative or support team for assistance in obtaining the Flux model file.
Flux Not Enabled in Engine
If the Flux model is failing to load, you may see this warning from the engine (aka. impeller) logs.
This suggests that the Engine.toml configuration has not had Flux enabled, but the Flux model is in your models directory.
Older Deepgram Container Images
If you see the following error in your logs, you may be running an old Deepgram engine container image.
Please make sure that you are using an engine container image that was released with Flux (release-251015), or later.
Access Flux Endpoint
The Flux model is accessed by the WebSocket protocol only, using the ws://<ipOrHostname>/v2/listen URL.
This URL path is exposed by the Deepgram API server container, similar to the other Deepgram APIs.
Once you’ve verified that Flux is installed and loaded by the Deepgram self-hosted services, please follow the developer documentation.