coqui

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 19, 2026 License: GPL-3.0 Imports: 16 Imported by: 0

Documentation

Overview

Package coqui provides a local Coqui TTS-backed TTS provider that connects to either a Coqui XTTS v2 server or a standard Coqui TTS server via its REST API. It implements the tts.Provider interface.

Two API modes are supported:

  • APIModeStandard (default): targets the standard Coqui TTS server (ghcr.io/coqui-ai/tts-cpu). Synthesis is performed via GET /api/tts with URL query parameters; voice catalogue is retrieved from GET /details.

  • APIModeXTTS: targets the Coqui XTTS v2 API server. Synthesis is performed via POST /tts_to_audio/ with a JSON body; voice catalogue is retrieved from GET /studio_speakers; voice cloning is available via POST /clone_speaker.

Because both servers operate in batch mode (one HTTP call per utterance rather than a streaming socket), SynthesizeStream accumulates incoming text fragments into complete sentences and then dispatches concurrent HTTP requests with a small lookahead buffer to minimise perceived latency.

Typical usage (standard server):

p := coqui.New("http://localhost:5002",
    coqui.WithLanguage("en"),
    coqui.WithTimeout(15*time.Second),
    // APIModeStandard is the default; this line is optional:
    coqui.WithAPIMode(coqui.APIModeStandard),
)
audio, err := p.SynthesizeStream(ctx, textCh, voiceProfile)

Typical usage (XTTS v2 server):

p := coqui.New("http://localhost:8002",
    coqui.WithLanguage("en"),
    coqui.WithAPIMode(coqui.APIModeXTTS),
)
audio, err := p.SynthesizeStream(ctx, textCh, voiceProfile)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type APIMode

type APIMode string

APIMode selects which Coqui server API the provider will target.

const (
	// APIModeXTTS targets the Coqui XTTS v2 API server (/tts_to_audio/).
	// It supports voice cloning via /clone_speaker and voice listing via
	// /studio_speakers.
	APIModeXTTS APIMode = "xtts"

	// APIModeStandard targets the standard Coqui TTS server (/api/tts).
	// This is the default mode. Voice listing is performed via /details.
	// Voice cloning is not supported in this mode.
	APIModeStandard APIMode = "standard"
)

type Option

type Option func(*Provider)

Option is a functional option for configuring a Coqui Provider.

func WithAPIMode

func WithAPIMode(mode APIMode) Option

WithAPIMode sets the server API mode. Use APIModeStandard (default) for the standard Coqui TTS Docker image (ghcr.io/coqui-ai/tts-cpu) or APIModeXTTS for the XTTS v2 API server.

func WithLanguage

func WithLanguage(lang string) Option

WithLanguage sets the BCP-47 language code sent to the TTS server (e.g., "en", "de", "fr"). Defaults to "en" if not set.

func WithTimeout

func WithTimeout(d time.Duration) Option

WithTimeout sets the per-request HTTP timeout for calls to the TTS server. Defaults to 30 s if not set.

type Provider

type Provider struct {
	// contains filtered or unexported fields
}

Provider implements tts.Provider backed by a locally-running Coqui TTS server. It is safe for concurrent use; multiple SynthesizeStream calls may run in parallel.

func New

func New(serverURL string, opts ...Option) (*Provider, error)

New creates a new Coqui Provider that targets the TTS server at serverURL (e.g., "http://localhost:5002"). serverURL must be non-empty. Functional options may override the language, per-request timeout, and API mode. The default API mode is APIModeStandard.

func (*Provider) CloneVoice

func (p *Provider) CloneVoice(ctx context.Context, samples [][]byte) (*tts.VoiceProfile, error)

CloneVoice creates a new speaker voice by uploading WAV audio samples to the XTTS server via POST /clone_speaker. Each element of samples must be a valid WAV-encoded audio file.

Voice cloning is only supported in APIModeXTTS. In APIModeStandard, this method always returns an error.

Returns a VoiceProfile for the cloned voice or an error if the request fails. A nil or empty samples slice returns an error rather than sending an empty request.

func (*Provider) ListVoices

func (p *Provider) ListVoices(ctx context.Context) ([]tts.VoiceProfile, error)

ListVoices retrieves the list of available voices from the Coqui server.

In APIModeXTTS, it calls GET /studio_speakers and maps each entry to a VoiceProfile. In APIModeStandard, it calls GET /details and returns one VoiceProfile per speaker for multi-speaker models, or a single VoiceProfile (identified by model name) for single-speaker models.

func (*Provider) SynthesizeStream

func (p *Provider) SynthesizeStream(ctx context.Context, text <-chan string, voice tts.VoiceProfile) (<-chan []byte, error)

SynthesizeStream consumes text fragments from the text channel, accumulates them into complete sentences (split on '.', '!', '?' followed by whitespace or EOF), and for each sentence issues an HTTP synthesis request to the Coqui server. WAV responses are stripped of their file headers and the raw PCM is emitted on the returned channel in the original sentence order.

Up to sentenceLookaheadBuf HTTP requests may be in-flight concurrently to hide network/server latency while preserving output ordering.

The returned channel is closed when all text has been synthesised or when ctx is cancelled. The caller must drain the channel to prevent goroutine leaks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL