Subtick — Nonce correctness model

1 · The trap

A naïve "demo wallet" assigns the next nonce, sends the transaction, and increments a local counter. If anything between the API and the executor silently drops that transaction (mempool eviction under back-pressure, future-nonce TTL expiry, transient executor stall — none of which raise a rejection), the chain account stays put while the local counter races ahead. Every later transaction is assigned a future nonce; the chain holds them as out-of-order and eventually drops them too.

The visible symptom is a wallet that returns accepted: true forever while chain.nonce never moves. We hit this on Subtick's own demo (repo), reproduced it deterministically, and rebuilt the gate to make it structurally impossible.

2 · The model

The wallet operates on three rules.

Rule A — Chain is the source of truth

Local nonce state is a cache, never an authority. Every assignment reads chain.nonce first; if the chain has moved, the local last_committed is lifted. Local state is allowed to lead the chain (in-flight transactions), never to contradict it.

Rule B — Contiguous-window assignment

The set of currently in-flight nonces is required to form a contiguous range [last_committed, last_committed + N) where N = in_flight.len(). The next nonce we hand out is always last_committed + N. Holes are forbidden by construction — we never assign past a missing nonce.

Rule C — Freeze on first timeout, rebase on next request

If a commit poll times out (the chain didn't move past our assigned nonce within the deadline), the wallet enters a frozen state immediately — a circuit-breaker, not a flag we check later. The very next request reads chain truth, clears the in-flight set, and resumes assignment from chain.nonce. The abandoned slot is reused by this recovery request, which is what closes the chain's nonce gap.

Invariant maintained at all times: max_assigned − last_committed ≤ K, where K is the configured concurrency permit count. No holes; no future-nonce pile-up; no silent drift.

3 · The gate, in code

The gate has three concrete components, all in subtick/src/api/mod.rs:

A tokio Semaphore with K permits — bounds concurrency. Default K=1 for single-sender demos.
A briefly-held tokio Mutex<NonceState> — atomic assign + recovery. Never held across an .await boundary that involves I/O.
An event-driven commit wait — handlers subscribe to the validator's BatchExecuted broadcast before mempool insert, then tokio::select! between event arrivals and a hard deadline. No 5 ms polling loops.

// Sketch — full source in subtick/src/api/mod.rs
struct NonceState {
    last_committed:    u64,
    in_flight:         BTreeMap<u64, Instant>,
    frozen:            bool,
    initialized:       bool,
}

// Acquire-and-assign protocol:
async fn acquire_demo_slot(...) -> Result<Slot, Error> {
    let permit = wallet.permits.acquire_owned().await?;
    let mut g = wallet.nonce.lock().await;

    sync_chain_truth(&mut g, ...);    // Rule A
    if g.frozen { rebase(&mut g); }   // Rule C

    let assigned = g.last_committed + g.in_flight.len() as u64;
    g.in_flight.insert(assigned, Instant::now());  // Rule B
    Ok(Slot { assigned, permit, ... })
}

// On the way out — finalize based on commit poll outcome:
match executed_ms {
    Some(_) => { g.in_flight.remove(&assigned); g.last_committed = assigned + 1; }
    None    => { g.frozen = true; }                            // circuit-breaker
}

4 · Numbers (public testnet, K=1, WS event-driven)

Reproducible: 200 sequential and 50 parallel POST /demo/transfer against https://subtick.dev, with and without 10% server-side drop injection. Methodology is in the project repo's web/_bench.mjs + web/_canary_inject.mjs.

Clean baseline (no faults)

scenario	p50	p95	p99	success	drift end
sequential ×200	109 ms	137 ms	140 ms	100.0%	0
parallel ×50 (K=1)	2977 ms	5445 ms	5623 ms	100.0%	0

The parallel-50 latency is not chain latency — it's queue time through the gate. K=1 serializes all transactions from the shared demo sender; throughput converges to the single-sender chain commit rate (~7-8 tx/s).

Fault tolerance — 10% silent-drop injection

scenario	p50	success	injected	timeout	reconcile
sequential ×200	110 ms	90.0%	20	20	20

Self-heal is exact and 1:1. Every injected drop produces exactly one timeout, one reconcile event, and the gate clears the gap before the next request lands. Success rate matches 1 − injection_rate precisely; the chain commits exactly the un-dropped transactions; drift ends at 0 every time.

What we test as a hard blocker: drift = 0 at end-of-run, reconcile == timeout, and alice.nonce_delta == metric_committed. Any deviation halts the build.

5 · Telemetry

The wallet's state is exposed as plain JSON for ops + dev tools:

// GET https://subtick.dev/demo/state.wallet
{
  "last_committed":           42,
  "in_flight_current":        0,
  "oldest_in_flight_age_ms":  0,
  "frozen":                  false,
  "max_in_flight":           1
}

// GET https://subtick.dev/demo/metrics
{
  "tx_assigned_total":       42,
  "tx_committed_total":      42,
  "tx_timeout_total":        0,
  "tx_mempool_reject_total": 0,
  "tx_injected_drop_total":  0,
  "reconcile_total":         0,
  "in_flight_current":       0,
  "oldest_in_flight_age_ms":  0,
  "frozen":                  false,
  "max_in_flight":            1,
  "inject_fail_pct":          0
}

Two hard alarms wire to these without any framework: frozen=true lasting more than ~10 s with no committed advance, or oldest_in_flight_age_ms exceeding the commit deadline. Either signal indicates the chain itself has stalled, not the gate.

6 · Multi-sender execution (pool of 4) — shipped

Single-sender K=1 was both safe and optimal because the chain enforces strict per-account nonce ordering: any K>1 from one account just queued behind itself. Throughput was therefore capped at the single-sender chain commit rate, ~7-8 tx/s.

The unlock was structural — distribute requests across a pool of pre-funded sender accounts, each with its own independent gate. Different accounts have independent nonce sequences; the chain can commit them in parallel batches. Per-account ordering still holds, all four invariants from §2 still hold per-sender, and throughput rises with the pool size.

Numbers (parallel ×50, anonymous round-robin allocator)

config	p50	p99	elapsed	throughput
single sender (K=1)	2977 ms	5623 ms	5969 ms	7.9 req/s
pool of 4 (K=1/sender)	802 ms	1894 ms	2118 ms	23.6 req/s

≈3× throughput · 3.7× lower p50 · 3× lower p99. The gate's per-sender invariants are unchanged — every sender enforces its own contiguous-window assignment, its own circuit-breaker freeze on first timeout, its own self-heal on the next request. There is no shared mutable state between senders besides the round-robin counter, which only assigns the slot.

K-per-sender saturation

At pool size 4, raising K above 1 gives no additional throughput under burst load — the chain's commit rate per account is the ceiling, and queueing K transactions on the same account just shifts latency from the queue to the wait. The structural response is the same as before: more independent accounts (a bigger pool, or per-visitor ephemeral keypairs in a future batch). Pool=4, K=1 is the sweet spot today.

Wire format

Three demo endpoints (/demo/transfer, /pixel/place, /auction/bid) accept an optional x-session-id header. Same payloads as before. With the header present the request sticks to one sender via SHA-256(session_id) % pool_size; absent, it round-robins across the pool. No body changes — existing clients continue to work and just load-balance.

Verify it live

Two endpoints surface the multi-sender state in real time:

// per-sender breakdown:
GET https://subtick.dev/demo/state
{
  "alice": { ... },                // legacy: pool[0]
  "bob":   { ... },                // shared recipient
  "pool": [
    { "address": "f0f2…7244", "chain_nonce": 142, "in_flight_current": 0, "frozen": false, ... },
    { "address": "8ba9…b5a7", "chain_nonce": 138, ... },
    { "address": "3e73…f777", ... },
    { "address": "e0aa…7c23", ... }
  ],
  "pool_size": 4,
  "sessions_active": 0
}

// aggregate counters + per-sender selection counts:
GET https://subtick.dev/demo/metrics

What still holds, per sender: drift = 0 at end-of-burst, reconcile == timeout on every fault, the contiguous-window invariant. The pool didn't relax correctness — it parallelised it.

Cross-shard recipient propagation (production fix)

Senders in the pool are spread deterministically across the executor's 4 shards (by shard_of(pubkey) = pubkey[0] % 4). The shared recipient (Bob, on a single shard) almost always lives on a different shard than the sender. Before this batch, the executor's per-shard commit thread credited the recipient on its own ShadowState regardless of shard_of(recipient) — landing the credit as a phantom entry on the sender's shard instead of the recipient's. The visible symptom: ~30% of cross-shard transfers reflected in Bob's balance, the other ~70% scattered invisibly.

The fix is a per-shard inbox: NUM_SHARDS bounded crossbeam channels, one inbox per shard. When the sender's commit thread sees shard_of(recipient) ≠ self.shard_id, it buffers the credit and try_sends it to the recipient shard's inbox after releasing its inner lock. Each shard's commit thread drains its inbox at the top of every cycle, before processing its next batch group. Eventual consistency: the recipient credit trails the sender debit by ≤ one commit cycle on the recipient's shard.

Numbers (post-fix, pool=4, isolated 30-tx delta)

signal	before fix	after fix
cross-shard delivery	~30% (3 of 10)	100% (Bob Δ = 30 × 100 exact)
cross_shard_dropped	n/a	0
FROZEN events under burst	occasional	0
drift (assigned vs committed)	visible	0

Three new aggregate counters surface the propagation health on /demo/metrics: cross_shard_sent, cross_shard_applied, cross_shard_dropped. In steady state the first two track within ~1–2 (the drain cadence), and the third stays at 0.

Auto-fallback guard. A systemd-timed watchdog polls /demo/metrics every 60 s. If cross_shard_dropped > 0, frozen = true, or (assigned − committed − in_flight) > 0 at any tick, it forces SUBTICK_DEMO_POOL_SIZE=1 and restarts the service before the issue can compound. With pool=4 already at the safe baseline, the signal is logged without action. Restart cooldown: 5 minutes.

What's next (not in this batch)

Browser-side signing (per-visitor ephemeral keypairs, server in relay-only mode) is the next architectural step. The wallet abstraction that sits behind the pool today — SessionWallet — is already the unit that swaps out: the handlers, the gate logic, and the correctness invariants do not change when the backend goes from "pre-funded shared pool" to "per-session ephemeral keypair".

7 · Why this matters for builders

If you're integrating Subtick — or building anything that talks to a real-time chain over HTTP — a wallet that diverges silently is indistinguishable from a working one until your customer screenshots a frozen UI. The bug we hit and fixed here is generic to any "server-managed nonce" pattern: token-bucket pacers, payment retry loops, atomic counter wallets. The protocol above (chain truth · contiguous window · circuit-breaker freeze · event-driven detection) ports to all of them.

Source for the demo wallet — including the failure injection, the recovery test, and the bench harness — is in the public repo under subtick/src/api/mod.rs and web/_bench.mjs.

How Subtick guaranteesno silent nonce divergence.

1 · The trap

2 · The model

Rule A — Chain is the source of truth

Rule B — Contiguous-window assignment

Rule C — Freeze on first timeout, rebase on next request

3 · The gate, in code

4 · Numbers (public testnet, K=1, WS event-driven)

Clean baseline (no faults)

Fault tolerance — 10% silent-drop injection

5 · Telemetry

6 · Multi-sender execution (pool of 4) — shipped

Numbers (parallel ×50, anonymous round-robin allocator)

K-per-sender saturation

Wire format

Verify it live

Cross-shard recipient propagation (production fix)

Numbers (post-fix, pool=4, isolated 30-tx delta)

What's next (not in this batch)

7 · Why this matters for builders

How Subtick guarantees
no silent nonce divergence.