json-canon

module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2026 License: Apache-2.0

README

json-canon

Go Reference Go Report Card CI Coverage

json-canon produces byte-deterministic JSON. Same input, same bytes, across Go versions, architectures, and kernel versions. It implements RFC 8785 (JSON Canonicalization Scheme) with a strict parser that rejects ambiguous input, a hand-written Burger-Dybvig number formatter validated against 286,362 oracle test vectors, and a stable CLI contract under SemVer. Zero external dependencies. Linux only.

Go API

import (
	"errors"
	"fmt"
	"log"

	"github.com/lattice-substrate/json-canon/jcs"
	"github.com/lattice-substrate/json-canon/jcserr"
	"github.com/lattice-substrate/json-canon/jcstoken"
)

func main() {
	input := []byte(`{"b": 2, "a": 1, "c": 3.0}`)

	v, err := jcstoken.Parse(input)
	if err != nil {
		var je *jcserr.Error
		if errors.As(err, &je) {
			fmt.Printf("failure class: %s\n", je.Class)
		}
		log.Fatal(err)
	}

	canonical, err := jcs.Serialize(v)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(string(canonical))
	// Output: {"a":1,"b":2,"c":3}
}

For a single call that parses and serializes in one step:

canonical, err := jcs.Canonicalize(input)

CLI

# Canonicalize
echo '{"b":2,"a":1}' | jcs-canon canonicalize -
# Output: {"a":1,"b":2}

# Verify canonical form
jcs-canon verify document.json

Install

Library:

go get github.com/lattice-substrate/json-canon

CLI (build from source):

CGO_ENABLED=0 go build -trimpath -buildvcs=false \
  -ldflags="-s -w -buildid= -X main.version=v0.0.0-dev" \
  -o jcs-canon ./cmd/jcs-canon

Release binaries are available on the releases page. Verification instructions: CONTRIBUTING.md.

Design

json-canon owns every stage of the pipeline from input bytes to canonical output. Nothing is delegated to encoding/json or strconv.FormatFloat.

Parser. A strict RFC 8259 parser that rejects what encoding/json silently accepts: lone surrogates, duplicate keys after escape decoding, noncharacters, lexical negative zero, overflow, underflow. Seven independent resource bounds (input size, nesting depth, total values, object members, array elements, string bytes, number token length), all fail-fast, all configurable. If the parser returns a value, that value has exactly one canonical representation.

Number formatting. Burger-Dybvig algorithm with ECMA-262 even-digit tie-breaking, implemented from scratch in pure math/big arithmetic. strconv.FormatFloat is a high-quality formatter, but it is not an ECMA-262 conformance contract. Its output policy can change across Go versions and doesn't match RFC 8785's required formatting rules at key exponent boundaries. The implementation is validated against 286,362 oracle test vectors (54,445 boundary cases + 231,917 stress vectors) with pinned SHA-256 checksums on the test data itself.

Key sorting. UTF-16 code-unit order, not UTF-8 byte order. These orderings disagree for characters above U+FFFF. A supplementary-plane character like U+10000 sorts before U+E000 in UTF-16 code-unit order and after it in UTF-8 byte order. Most implementations get this wrong because the bug only surfaces with emoji, CJK Extension B, or historic script keys. Rare in testing, not rare in production. RFC 8785 §3.2.3 mandates UTF-16 code-unit order because JCS is defined for interoperability with ECMAScript string semantics.

Error taxonomy. 13 failure classes mapped to 3 exit codes (0, 2, 10). Classified by root cause, not error origin. A missing file path is CLI_USAGE (the invocation is wrong), not INTERNAL_IO (infrastructure broke). Exit code 2 means "fix your input or invocation." Exit code 10 means "investigate the environment." The class name is the stable contract; the surrounding message text is not. Machines should switch on the class, not parse the message.

Determinism evidence. Unit tests prove correctness. They do not prove determinism across environments. An offline replay harness runs the tool across Linux distributions (Debian, Ubuntu, Alpine, Fedora, Rocky, openSUSE) in both container and VM execution modes, capturing SHA-256 digests of all output. Releases are gated on byte-identical digests across the full matrix. The evidence bundle (checksummed, machine-readable, committed alongside the source) makes the determinism claim auditable, not just asserted.

Engineering Articles

Detailed technical articles covering the engineering behind json-canon:

  1. Shortest Round-Trip: Implementing IEEE 754 to Decimal Conversion in Go - Burger-Dybvig algorithm, multiprecision arithmetic, ECMA-262 formatting
  2. A Strict RFC 8259 JSON Parser - single-pass parser design, surrogate validation, bounds enforcement
  3. The Small Decisions That Infrastructure Depends On - UTF-16 sort order, failure taxonomy, ABI contracts
  4. Proving Determinism: Evidence-Based Release Engineering - offline replay harness, SHA-256 evidence chains, release gating
  5. IEEE 754 Compliance Does Not Mean Platform Independence - FMA instructions, Go compiler fusion policy, platform-independent digit generation

CLI Reference

jcs-canon canonicalize [--quiet] [file|-]
jcs-canon verify [--quiet] [file|-]
jcs-canon --help
jcs-canon --version
Exit Codes
Code Meaning Example Causes
0 Success Canonical output produced, or document verified
2 Input rejection Parse error, policy violation, non-canonical, invalid usage
10 Internal error I/O write failure, unexpected state

Full taxonomy: FAILURE_TAXONOMY.md.

Stability

The CLI ABI (command names, flag semantics, exit codes, failure classes, stdout/stderr contracts, canonical output bytes) is governed by strict SemVer. Patch releases change no behavior. Minor releases are additive only. Any breaking change requires a major version bump. The machine-readable contract is abi_manifest.json; the full specification is ABI.md.

Documentation

Normative References

Spec Scope
RFC 8785 JSON Canonicalization Scheme
RFC 8259 JSON grammar and data model
RFC 7493 I-JSON: duplicate keys, surrogates, noncharacters
RFC 3629 UTF-8 encoding validity
ECMA-262 §6.1.6.1.20 Number::toString
IEEE 754-2008 Binary64 double-precision semantics

License

See LICENSE.

Directories

Path Synopsis
cmd
jcs-canon command
Command jcs-canon canonicalizes and verifies JSON using RFC 8785 JCS.
Command jcs-canon canonicalizes and verifies JSON using RFC 8785 JCS.
jcs-gate command
Command jcs-gate runs the repository's required verification gates in order.
Command jcs-gate runs the repository's required verification gates in order.
jcs-offline-replay command
Command jcs-offline-replay prepares, runs, and verifies offline replay evidence.
Command jcs-offline-replay prepares, runs, and verifies offline replay evidence.
jcs-offline-worker command
Command jcs-offline-worker executes replay vectors from an offline bundle.
Command jcs-offline-worker executes replay vectors from an offline bundle.
Package jcs implements RFC 8785 JSON Canonicalization Scheme serialization.
Package jcs implements RFC 8785 JSON Canonicalization Scheme serialization.
Package jcserr defines the failure taxonomy for json-canon.
Package jcserr defines the failure taxonomy for json-canon.
Package jcsfloat implements the ECMAScript Number::toString algorithm for IEEE 754 double-precision floating-point values, as required by RFC 8785.
Package jcsfloat implements the ECMAScript Number::toString algorithm for IEEE 754 double-precision floating-point values, as required by RFC 8785.
Package jcstoken provides a strict JSON tokenizer/parser for RFC 8785 JCS.
Package jcstoken provides a strict JSON tokenizer/parser for RFC 8785 JCS.
offline
replay
Package replay implements offline replay orchestration contracts and artifacts.
Package replay implements offline replay orchestration contracts and artifacts.
runtime/container
Package container provides node adapters for container-backed offline replay lanes.
Package container provides node adapters for container-backed offline replay lanes.
runtime/executil
Package executil provides command execution helpers for offline runtime adapters.
Package executil provides command execution helpers for offline runtime adapters.
runtime/libvirt
Package libvirt provides node adapters for VM/libvirt-backed replay lanes.
Package libvirt provides node adapters for VM/libvirt-backed replay lanes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL