Using Lower-Level Websockets with the Streaming API
Using Lower-Level Websockets with the Streaming API
Learn how to implement using lower-level websockets with Deepgram’s Streaming API.
Using Lower-Level Websockets with the Streaming API
Learn how to implement using lower-level websockets with Deepgram’s Streaming API.
The Deepgram’s Streaming API unlocks many use cases ranging from captioning to notetaking and much more. If you aren’t able to use our Deepgram SDKs for your Streaming needs, this guide will provide a Reference Implementation for you.
Most users will not need this Reference Implementation because Deepgram provides SDKs that already implement the Streaming API. This is an optional guide to help individuals interested in building and maintaining their own SDK specific to the Deepgram Streaming API.
For additional reference see our Deepgram SDKs which include the Websocket-based Streaming API:
The Deepgram SDKs should address most needs; however, if you find limitations or issues in any of the above SDKs, we encourage you to report issues, bugs, or ideas for new features in the open source repositories. Our SDK projects are open to code contributions as well.
If you still need to implement your own SDK, this guide will enable you to do that.
It is highly recommended that you familiarize yourself with the WebSocket protocol defined by RFC-6455. If you are still getting familiar with what an IETF RFC is, they are very detailed specifications on how something works and behaves. In this case, RFC-6455 describes how to implement WebSockets. You will need to understand this document to understand how to interact with the Deepgram Streaming API.
Once you understand the WebSocket protocol, it’s recommended to understand the capabilities of your WebSocket protocol library available in the language you chose to implement your SDK in.
Refer to the language specific implementations for RFC-6455 (i.e. the WebSocket protocol):
These are just some of the available implementations in those languages. They are just the ones that are very popular in those language-specific communities.
Additionally, you will need to understand applications that are multi-threaded, access the internet, and do so securely via TLS. These are going to be essential components to building your SDK.
The goal of your SDK should minimally be:
It’s essential that you encapsulate your WebSocket connection in a class or similar representation. This will reduce undesired, highly coupled WebSocket code with your application’s code. In the industry, this has often been referred to as minimizing “Spaghetti code”. If you have WebSocket code or you need to import the above WebSocket libraries into your func main(), this is undesirable unless your application is trivially small.
To implement the WebSocket Client correctly, you must implement based on the WebSocket protocol defined in RFC-6455. Please refer to section 4.1 Client Requirements in RFC-6455.
You want first to declare a WebSocket class of some sort specific to your implementation language:
NOTE: Depending on the programming language of choice, you might either need to implement async/await and threaded classes to support both threading models. These concepts occur in languages like Javascript, Python, and others. You can implement one or both based on your user’s needs.
You will then need to implement the following class methods.
This function should:
URL and API Key.Keepalive Interval.This thread should:
The SendBinary() function should:
The SendMessage() function should:
KeepAlive or CloseStream messages are examples of these types of messages.If you need more information on the difference, please refer to RFC-6455.
This thread is optional providing that audio data is constantly streaming to through the WebSocket; otherwise, it should:
Keepalive Interval to maintain the connection.Notice this thread is independent of the Receive/Process Message Thread above.
This function should:
Now that you have a basic client, you must handle the Deepgram API specifics. Refer to this documentation for more information .
When establishing a connection, you must pass the required parameters defined by the Deepgram Query Parameters.
If successfully connected, you should start receiving transcription messages (albeit empty) in the Response Schema defined below.
For convenience, you will need to marshal these JSON representations into usable objects/classes to give your users an easier time using your SDK.
If you do implement the KeepAlive message, you will need to follow the guidelines defined here.
When you are ready to close your WebSocket client, you will need to follow the shutdown guidelines defined here.
You must be able to handle any protocol-level defined in RFC-6455 and application-level (i.e., messages from Deepgram) you will need to follow the guidelines defined here.
Here are some common implementation mistakes.
There are usually a few reasons why the Deepgram Platform will terminate the connection:
encoding parameter does not match the encoding in the audio stream.There are usually a few reasons why the Deepgram Platform will terminate the connection (similar to the above):
There are usually a few reasons why the Deepgram Platform delays sending transcription messages:
ping and traceroute or tracert to map the network path from source to destination.By adopting object-oriented programming (OOP), the pseudo-code above provides a clear structure for implementing the SDK across different programming languages that support OOP paradigms. This structure facilitates better abstraction, encapsulation, and modularity, making the SDK more adaptable to future changes in the Deepgram API or the underlying WebSocket protocol.
As you implement and refine your SDK, remember that the essence of good software design lies in solving the problem at hand and crafting a solution that’s maintainable, extensible, and easy to use.