suffixarr

package module
v0.0.0-...-af364bf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 11, 2025 License: MIT Imports: 7 Imported by: 0

README

docs

Suffix Array Implementation in Go

This project provides an efficient implementation of a suffix array using the SA-IS algorithm (Suffix Array Induced Sorting) in Go. It supports both small (≤256 characters) and arbitrary alphabets, optimized for performance and memory usage. The implementation is designed for string processing tasks such as pattern matching, substring search, and text indexing.

Features

  • SA-IS Algorithm: Linear-time construction of suffix arrays for small and arbitrary alphabets.
  • Small Alphabet Support: Optimized for texts with up to 256 unique characters (e.g., ASCII).
  • Arbitrary Alphabet Support: Handles large or arbitrary alphabets using a map-based bucketing approach.
  • Prefix Search: Includes methods to find all occurrences of a prefix, with results in lexicographical or text order.
  • Memory Efficiency: Reuses arrays and minimizes allocations during construction.

Installation

  1. Install the package using:

    go get github.com/nkamenev/suffixarr
    
  2. Ensure the Go environment is set up (Go 1.18 or later recommended).

  3. Import the package in your Go code:

    import "github.com/nkamenev/suffixarr"
    

Usage

Creating a Suffix Array
package main

import (
	"fmt"
	"github.com/nkamenev/suffixarr"
)

func main() {
	// Example text: "banana" as int32 slice (ASCII values)
	text := []int32("banana")
	sa := suffixarr.New(text)
}
Finding Prefix Occurrences

The Lookup method returns indices of suffixes starting with a given prefix in lexicographical order. The LookupTextOrd method returns the same indices sorted by their position in the text.

// Find all occurrences of prefix "a"
prefix := []int32{'a'}
indices := sa.Lookup(prefix)
fmt.Println("Occurrences of 'a' (lexicographical order):", indices)

// Find occurrences in text order
textOrdered := sa.LookupTextOrd(prefix)
fmt.Println("Occurrences of 'a' (text order):", textOrdered)
Example Output

For text "banana":

  • Suffix array: [5 3 1 0 4 2] (indices of suffixes ["a", "ana", "anana", "banana", "na", "nana"]).
  • Lookup("a"): [5 3 1] (suffixes "a", "ana", "anana").
  • LookupTextOrd("a"): [1 3 5] (same indices sorted by text position).

Algorithm Details

The implementation uses the SA-IS algorithm, which constructs a suffix array in O(n) time for a text of length n. Key features:

  • Small Alphabets: Uses array-based bucketing for efficiency.
  • Arbitrary Alphabets: Employs a map-based approach with probabilistic counting to handle large alphabets.
  • LMS Substrings: Leverages Left-Most S-type (LMS) substrings for recursive construction.
  • Induced Sorting: Efficiently sorts suffixes by inducing L-type and S-type suffixes from LMS positions.

Performance

  • Time Complexity: O(n) for suffix array construction, O(m + log n) for prefix lookup (where m is the prefix length).
  • Space Complexity: O(n) for the suffix array and auxiliary data structures.
  • Optimization: Minimizes memory allocations by reusing arrays and supports large texts efficiently.

Testing

To run tests, use:

go test -v

Contributing

Contributions are welcome! Please submit issues or pull requests for bug fixes, optimizations, or additional features. Ensure code follows Go conventions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation

Overview

Copyright (c) 2025 Nikita Kamenev Licensed under the MIT License. See LICENSE file in the project root for details.

Copyright (c) 2025 Nikita Kamenev Licensed under the MIT License. See LICENSE file in the project root for details.

Copyright (c) 2025 Nikita Kamenev Licensed under the MIT License. See LICENSE file in the project root for details.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GSA

type GSA struct {
	// contains filtered or unexported fields
}

GSA represents a generalized suffix array for multiple strings.

func NewGSA

func NewGSA(src []string) *GSA

NewGSA creates a generalized suffix array from strings.

func NewGSA_32

func NewGSA_32(src [][]int32) *GSA

NewGSA_32 creates a generalized suffix array from int32 slices.

func (*GSA) LookupPrefix

func (gsa *GSA) LookupPrefix(prefix []int32) []Index

LookupPrefix finds prefix occurrences in the generalized suffix array, sorted by text position.

func (*GSA) LookupSuffix

func (gsa *GSA) LookupSuffix(suf []int32) []Index

LookupSuffix finds suffix occurrences in the generalized suffix array, sorted by text position.

func (*GSA) LookupTextOrder

func (gsa *GSA) LookupTextOrder(prefix []int32) []Index

LookupTextOrder finds prefix occurrences in the generalized suffix array, sorted by text position.

type Index

type Index struct {
	String     int32
	Occurences []int32
}

Index holds a string's occurrences in the generalized suffix array.

type SuffixArray

type SuffixArray struct {
	// contains filtered or unexported fields
}

SuffixArray holds a text and its suffix array.

func New

func New(text []int32) *SuffixArray

New creates a suffix array for the given text.

func (*SuffixArray) Lookup

func (sa *SuffixArray) Lookup(prefix []int32) []int32

Lookup finds suffixes starting with the given prefix.

func (*SuffixArray) LookupPrefix

func (sa *SuffixArray) LookupPrefix(prefix []int32) int

LookupPrefix checks if the text starts with the given prefix. For an empty prefix, returns -1 as it precedes the first character. Returns 0 if matched, -2 otherwise.

func (*SuffixArray) LookupSuffix

func (sa *SuffixArray) LookupSuffix(suffix []int32) int

LookupSuffix finds the exact suffix in the text. For an empty suffix, returns len(sa) as it occurs at the end of the string. Otherwise, returns the starting index or -1 if not found.

func (*SuffixArray) LookupTextOrder

func (sa *SuffixArray) LookupTextOrder(prefix []int32) []int32

LookupTextOrder finds suffixes starting with the prefix, sorted by text position.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL