coqui

package

v0.1.2 Latest Latest Go to latest Published: Mar 19, 2026 License: GPL-3.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/MrWong99/glyphoxa

Links

Open Source Insights

Documentation ¶

Overview ¶

Package coqui provides a local Coqui TTS-backed TTS provider that connects to either a Coqui XTTS v2 server or a standard Coqui TTS server via its REST API. It implements the tts.Provider interface.

Two API modes are supported:

APIModeStandard (default): targets the standard Coqui TTS server (ghcr.io/coqui-ai/tts-cpu). Synthesis is performed via GET /api/tts with URL query parameters; voice catalogue is retrieved from GET /details.
APIModeXTTS: targets the Coqui XTTS v2 API server. Synthesis is performed via POST /tts_to_audio/ with a JSON body; voice catalogue is retrieved from GET /studio_speakers; voice cloning is available via POST /clone_speaker.

Because both servers operate in batch mode (one HTTP call per utterance rather than a streaming socket), SynthesizeStream accumulates incoming text fragments into complete sentences and then dispatches concurrent HTTP requests with a small lookahead buffer to minimise perceived latency.

Typical usage (standard server):

p := coqui.New("http://localhost:5002",
    coqui.WithLanguage("en"),
    coqui.WithTimeout(15*time.Second),
    // APIModeStandard is the default; this line is optional:
    coqui.WithAPIMode(coqui.APIModeStandard),
)
audio, err := p.SynthesizeStream(ctx, textCh, voiceProfile)

Typical usage (XTTS v2 server):

p := coqui.New("http://localhost:8002",
    coqui.WithLanguage("en"),
    coqui.WithAPIMode(coqui.APIModeXTTS),
)
audio, err := p.SynthesizeStream(ctx, textCh, voiceProfile)

Index ¶

type APIMode
type Option
type Provider
- func New(serverURL string, opts ...Option) (*Provider, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type APIMode ¶

type APIMode string

APIMode selects which Coqui server API the provider will target.

const (
	// APIModeXTTS targets the Coqui XTTS v2 API server (/tts_to_audio/).
	// It supports voice cloning via /clone_speaker and voice listing via
	// /studio_speakers.
	APIModeXTTS APIMode = "xtts"

	// APIModeStandard targets the standard Coqui TTS server (/api/tts).
	// This is the default mode. Voice listing is performed via /details.
	// Voice cloning is not supported in this mode.
	APIModeStandard APIMode = "standard"
)

type Option ¶

type Option func(*Provider)

Option is a functional option for configuring a Coqui Provider.

func WithAPIMode ¶

func WithAPIMode(mode APIMode) Option

WithAPIMode sets the server API mode. Use APIModeStandard (default) for the standard Coqui TTS Docker image (ghcr.io/coqui-ai/tts-cpu) or APIModeXTTS for the XTTS v2 API server.

func WithLanguage ¶

func WithLanguage(lang string) Option

WithLanguage sets the BCP-47 language code sent to the TTS server (e.g., "en", "de", "fr"). Defaults to "en" if not set.

func WithTimeout ¶

func WithTimeout(d time.Duration) Option

WithTimeout sets the per-request HTTP timeout for calls to the TTS server. Defaults to 30 s if not set.

type Provider ¶

type Provider struct {
	// contains filtered or unexported fields
}

Provider implements tts.Provider backed by a locally-running Coqui TTS server. It is safe for concurrent use; multiple SynthesizeStream calls may run in parallel.

func New ¶

func New(serverURL string, opts ...Option) (*Provider, error)

New creates a new Coqui Provider that targets the TTS server at serverURL (e.g., "http://localhost:5002"). serverURL must be non-empty. Functional options may override the language, per-request timeout, and API mode. The default API mode is APIModeStandard.

func (*Provider) CloneVoice ¶

func (p *Provider) CloneVoice(ctx context.Context, samples [][]byte) (*tts.VoiceProfile, error)

CloneVoice creates a new speaker voice by uploading WAV audio samples to the XTTS server via POST /clone_speaker. Each element of samples must be a valid WAV-encoded audio file.

Voice cloning is only supported in APIModeXTTS. In APIModeStandard, this method always returns an error.

Returns a VoiceProfile for the cloned voice or an error if the request fails. A nil or empty samples slice returns an error rather than sending an empty request.

func (*Provider) ListVoices ¶

func (p *Provider) ListVoices(ctx context.Context) ([]tts.VoiceProfile, error)

ListVoices retrieves the list of available voices from the Coqui server.

In APIModeXTTS, it calls GET /studio_speakers and maps each entry to a VoiceProfile. In APIModeStandard, it calls GET /details and returns one VoiceProfile per speaker for multi-speaker models, or a single VoiceProfile (identified by model name) for single-speaker models.

func (*Provider) SynthesizeStream ¶

func (p *Provider) SynthesizeStream(ctx context.Context, text <-chan string, voice tts.VoiceProfile) (<-chan []byte, error)

SynthesizeStream consumes text fragments from the text channel, accumulates them into complete sentences (split on '.', '!', '?' followed by whitespace or EOF), and for each sentence issues an HTTP synthesis request to the Coqui server. WAV responses are stripped of their file headers and the raw PCM is emitted on the returned channel in the original sentence order.

Up to sentenceLookaheadBuf HTTP requests may be in-flight concurrently to hide network/server latency while preserving output ordering.

The returned channel is closed when all text has been synthesised or when ctx is cancelled. The caller must drain the channel to prevent goroutine leaks.

Source Files ¶

View all Source files

coqui.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL