Sharding Data Layer

The Sharding Data Layer is the engine at the core of Untrace's architecture. It takes any data blob, splits it into cryptographically independent fragments, and distributes those fragments across the decentralized node network — ensuring that no single node, operator, or jurisdiction ever holds enough to reconstruct the original.

Why Sharding?

Traditional storage — cloud, on-premise, or even most blockchains — keeps data whole in one location. That is the root of every data breach: a single target worth attacking.

Untrace's sharding layer eliminates the target. When data is sharded:

No node holds enough information to breach — fragments are individually meaningless
No subpoena is sufficient — no single jurisdiction controls enough shards
No single point of failure exists — the network tolerates node failures without data loss
No administrator can be compelled — there is no administrator with access to the whole

The Sharding Pipeline

[ Raw Data Blob ]
        ↓
[ AES-256-GCM Encryption ]
  → Outputs: (encrypted_blob, symmetric_key K)
        ↓
[ Shamir's Secret Sharing applied to K ]
  → Outputs: N key shares (k₁, k₂, ..., kₙ)
        ↓
[ Encrypted blob split into N data segments ]
  → Outputs: N segments (s₁, s₂, ..., sₙ)
        ↓
[ Each node receives: (kᵢ, sᵢ) ]
        ↓
[ On-chain: Pedersen commitment to all (kᵢ, sᵢ) pairs ]
        ↓
[ Reconstruction: Collect K-of-N pairs → Lagrange interpolation → decrypt ]

The encryption and sharding happen entirely client-side. The Untrace network sees only encrypted, individually useless fragments.

Shard Distribution

Node Selection

When a vault is created, the protocol selects N nodes for shard assignment using a deterministic but unpredictable selection algorithm seeded by the vault ID and current epoch:

node_set = select_nodes(
  vault_id,
  epoch,
  n = threshold_config.total_shards,
  constraints = {
    max_per_operator: 1,       // No operator gets two shards
    geo_diversity: true,        // Different countries required
    as_diversity: true,         // Different autonomous systems required
    jurisdiction_diversity: true
  }
)

This selection is verifiable on-chain — anyone can confirm that distribution followed the protocol rules.

Threshold Configuration

| Sensitivity Level | K (threshold) | N (total shards) | Tolerates | | ----------------- | ------------- | ---------------- | --------------- | | Standard | 3 | 5 | 2 node failures | | High | 5 | 9 | 4 node failures | | Maximum | 7 | 13 | 6 node failures |

Enterprises with specific compliance requirements can configure custom (K, N) parameters.

Shard Integrity

Each shard pair (kᵢ, sᵢ) is protected by two integrity layers:

1. MAC at the shard level Each shard is authenticated with a Message Authentication Code (MAC). Any tampering with a shard is detected during reconstruction — the corrupted shard is rejected and a replacement is requested from backup nodes.

2. On-chain commitment A Pedersen commitment to all shards is anchored at vault creation time. At reconstruction, the client verifies that retrieved shards match the on-chain commitment before decrypting. No substitution attack is possible.

Shard Replication and Availability

Beyond the primary N nodes, the protocol maintains backup replicas to guarantee availability:

Each shard has M backup nodes (default M = 2) that hold encrypted copies
Backup nodes activate automatically if a primary node fails storage proofs
The vault owner is never required to manage replication manually
Replication targets maintain the same geographic and jurisdictional diversity constraints

The network is designed to survive the simultaneous failure of any N−K nodes plus their backup replicas without data loss.

Shard Lifecycle

Vault Write
    → Shards created, distributed, committed on-chain
    → Nodes begin serving PoSt (Proof of Spacetime) every epoch

Vault Update
    → New shard generation created for the same vault ID
    → Previous generation retained (versioning) or pruned per policy
    → New on-chain commitment anchored

Vault Delete
    → Deletion instruction signed by owner DID
    → Broadcast to all shard nodes
    → Nodes destroy shards and submit destruction proof
    → On-chain commitment marked as deleted

Deletion is cryptographically enforced — nodes that fail to destroy shards after a deletion order have their stake slashed.

Comparison With Alternative Approaches

| Approach | Breach Possible | Single Point of Failure | Privacy by Default | | -------------------------- | ------------------------------- | ----------------------- | ------------------ | | Centralized cloud | Yes | Yes | No | | Encrypted cloud | Yes (key theft) | Yes | No | | IPFS / Filecoin | Yes (content addressed, public) | Partial | No | | Untrace Sharding Layer | No | None | Yes |

IPFS and Filecoin store whole data objects — sharding in those systems is about redundancy, not privacy. Untrace sharding is different: each fragment is individually encrypted and cryptographically meaningless without the others, and access to the reconstruction key is ZK-gated.