Overview
The WebSocket API enables real-time bidirectional communication for streaming synthesis and interactive dialogue. Connect once and send multiple synthesis requests without HTTP overhead.
Connecting
Open a WebSocket connection to the streaming endpoint. Authenticate by passing your API key as a query parameter or in the first message.
/v1/ws/streamAUTHWebSocket endpoint for real-time streaming synthesis.
Open a WebSocket connection
const ws = new WebSocket(
'wss://originneural.ai/v1/ws/stream?token=origin_sk_your_key'
);
ws.onopen = () => {
console.log('Connected');
};
ws.onmessage = (event) => {
// Handle incoming audio chunks or status messages
const data = JSON.parse(event.data);
if (data.type === 'audio') {
// data.audio is base64-encoded PCM audio
playAudioChunk(data.audio);
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};Message Types
The WebSocket protocol uses JSON messages with a type field. Here are the message types for client-to-server and server-to-client communication.
- Client → Server: "synthesize" — Start synthesis with text and voice_id.
- Client → Server: "stop" — Cancel the current synthesis.
- Client → Server: "ping" — Keep-alive ping.
- Server → Client: "audio" — Base64-encoded PCM audio chunk.
- Server → Client: "done" — Synthesis complete.
- Server → Client: "error" — Error message with code and description.
- Server → Client: "pong" — Response to ping.
Message format examples
{
"type": "synthesize",
"text": "Hello world",
"voice_id": "default",
"engine": "kokoro",
"speed": 1.0
}Dialogue Sessions
The Echo engine (moshi) supports interactive dialogue sessions. Open a dialogue WebSocket and send text turns — the engine maintains conversational context across turns.
- Dialogue sessions use the Echo engine (moshi) exclusively.
- Context is maintained for the duration of the WebSocket connection.
- Max session duration is 10 minutes. Reconnect to start a new session.
/v1/ws/dialogueAUTHWebSocket endpoint for interactive voice dialogue.
Dialogue session
const ws = new WebSocket(
'wss://originneural.ai/v1/ws/dialogue?token=origin_sk_your_key'
);
ws.onopen = () => {
// Start a dialogue turn
ws.send(JSON.stringify({
type: 'turn',
text: 'Tell me about voice synthesis.',
voice_id: 'default',
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response_audio') {
playAudioChunk(data.audio);
} else if (data.type === 'response_text') {
console.log('Response:', data.text);
}
};Best Practices
Follow these best practices for reliable WebSocket connections:
- Send ping messages every 30 seconds to keep the connection alive.
- Implement reconnection logic with exponential backoff.
- Buffer audio chunks client-side before playback to avoid gaps.
- Close the connection gracefully when done (ws.close(1000)).