# Writing a LAVS-001 decoder in any language

| | |
|---|---|
| **Status** | Published reference |
| **Suite version** | 1.0 |
| **Targets** | LAVS-001.0, LAVS-001.1, LAVS-001.2 |
| **Spec source** | [github.com/guitargnarr/lavs-format/blob/main/spec/LAVS-001.md](https://github.com/guitargnarr/lavs-format/blob/main/spec/LAVS-001.md) |
| **License** | CC BY 4.0 |

This is the practical walk-through for implementing a LAVS-001 decoder from the spec. It assumes you have read or have ready access to the [LAVS-001 specification](https://github.com/guitargnarr/lavs-format/blob/main/spec/LAVS-001.md) and the property table in [`spec/property-table.toml`](https://github.com/guitargnarr/lavs-format/blob/main/spec/property-table.toml). It walks the format byte-by-byte through the canonical `cube.lavs` reference edition so you can verify your parser against known-correct bytes at every step.

The reference implementation lives in TypeScript at [github.com/guitargnarr/lavs-format](https://github.com/guitargnarr/lavs-format). The format itself is language-agnostic — the discussion below uses pseudocode and references no specific language. When you're done implementing your decoder, run your output through the [conformance test suite](/conformance.md) to verify spec correctness.

---

## 1. The format at a glance

A `.lavs` file is a single contiguous byte sequence with five regions in order:

```
┌────────────────────────────────────────────────────────────┐
│  Header                                                    │  bytes 0 .. h
│    LAVS magic (4 bytes)                                    │
│    Spec major (LEB128 unsigned)                            │
│    Spec minor (LEB128 unsigned)                            │
│    Runtime ABI version (LEB128 unsigned)                   │
│    File ID (LEB128 unsigned)                               │
│    Edition block offset (uint32 LE — FIXED WIDTH)          │
├────────────────────────────────────────────────────────────┤
│  Property dictionary                                       │  bytes h .. d
│    LEB128 list of property IDs, terminated by 0x00         │
│    Packed 2-bit wire-type tags  (⌈N/4⌉ bytes)              │
├────────────────────────────────────────────────────────────┤
│  Scene stream                                              │  bytes d .. e
│    (Core object)*                                          │
│    Each object:                                            │
│      type ID (LEB128)                                      │
│      (property key, value)*                                │
│      sentinel 0x00                                         │
├────────────────────────────────────────────────────────────┤
│  Edition block                                             │  bytes e .. f
│    Edition object (type ID 100) + sentinel 0x00            │
│    (optional) signature length (LEB128) + signature bytes  │
├────────────────────────────────────────────────────────────┤
│  CRC32 of edition block                                    │  bytes f .. EOF
│    uint32 LE — final 4 bytes of file                       │
└────────────────────────────────────────────────────────────┘
```

The header's `edition block offset` is **fixed-width 4-byte little-endian**. This is the format's load-bearing decision: external tools can seek directly to the edition block to read metadata without parsing the scene stream. The header offset gives you `e`; the file length minus 4 gives you `f` (start of CRC32 footer).

---

## 2. Encoding primitives

You need to implement exactly four encoding primitives. Everything else in the format is built from these.

### 2.1 LEB128 unsigned

Variable-length encoding of unsigned integers. 7 data bits per byte; the high bit is a continuation flag.

```
function decodeLEB128(bytes, offset):
    result = 0
    shift = 0
    bytesRead = 0
    loop:
        byte = bytes[offset + bytesRead]
        bytesRead = bytesRead + 1
        result = result | ((byte & 0x7F) << shift)
        if (byte & 0x80) == 0:
            break
        shift = shift + 7
    return (result, bytesRead)
```

LEB128 is the same encoding used in DWARF, WebAssembly, Protocol Buffers, and Rive.

### 2.2 Fixed-width little-endian integers

`uint8` is one byte. `uint16` is two bytes, least significant byte first. `uint32` is four bytes, least significant byte first.

The only place a fixed-width `uint32` LE appears is the `edition block offset` in the header and the `CRC32 footer` at end of file. Everything else uses LEB128 for variable-length integers.

### 2.3 IEEE 754 binary32 (float32 LE)

Standard 32-bit floating point, little-endian: sign bit, 8 exponent bits, 23 mantissa bits, total 4 bytes.

### 2.4 RGBA color (color32)

Four bytes: Red, Green, Blue, Alpha. Each channel `uint8` in [0, 255]. Color values are sRGB; the runtime converts to linear space for rendering math.

### 2.5 UTF-8 string

LEB128 length prefix `N` followed by `N` bytes of UTF-8 text. No null terminator. **Note:** the `string` wire type is also used to carry raw binary payloads (vertex positions, normals, indices) — the decoder must NOT apply UTF-8 validation to property values on Buffer objects (property IDs 311, 312, 313, 314, 316).

---

## 3. Parsing the header

The header has six fields, in order:

1. **Magic** — 4 bytes, exactly `0x4C 0x41 0x56 0x53` (ASCII `LAVS`). Reject any file that does not start with this.
2. **Spec major** — LEB128 unsigned. A LAVS-001 reader **MUST** refuse files where this is not `1`.
3. **Spec minor** — LEB128 unsigned. `0` for v1.0, `1` for v1.1, `2` for v1.2, etc. A reader **MUST** accept any minor version of major 1, skipping unknown properties via the dictionary's tag table (see §4).
4. **Runtime ABI version** — LEB128 unsigned. Independent of spec version. A reader **SHOULD** accept files where ABI version ≤ its own.
5. **File ID** — LEB128 unsigned. Random per-export identifier. Not for cryptographic identity (use the signature in the edition block for that).
6. **Edition block offset** — **uint32 little-endian** (NOT LEB128). Absolute byte offset from byte 0 to the start of the edition block.

Notice that the header is variable-length: the LEB128 fields take 1+ bytes each depending on value size. For the canonical `cube.lavs` reference edition, the header is 16 bytes total:

```
offset  bytes               field                value
0       4C 41 56 53         Magic                "LAVS"
4       01                  Spec major           1
5       00                  Spec minor           0
6       01                  Runtime ABI          1
7       C1 AC 85 E2 04      File ID              0x4C415641 ('LAVA' as a 32-bit int = 1,279,350,337)
12      7E 03 00 00         Edition block offset 894
```

The `cube.lavs` File ID is `0x4C415641`, which requires 5 LEB128 bytes because the value exceeds 2^28. Your decoder must handle variable-length LEB128 correctly — don't assume 1 byte.

After reading the header, you have:
- `editionBlockOffset = 894` — the edition block starts at file byte 894
- The next byte (16, for cube.lavs) starts the property dictionary

---

## 4. Parsing the property dictionary

The property dictionary tells you, for every property ID used anywhere in the file, what wire type its values use. This is the mechanism that enables forward-compatibility: a v1.0 reader can skip a v1.2 property correctly because it knows the wire type and therefore the byte width.

### 4.1 Property ID list

Starting at the end of the header, read LEB128 unsigned integers until you encounter the terminator byte `0x00`. Each integer is a property ID. The dictionary **MUST NOT** repeat IDs.

For `cube.lavs`, the property dictionary at offset 16 contains 24 unique property IDs:

```
100, 110, 111, 200, 202, 300, 301, 310, 311, 312, 315, 316, 510,
600, 700, 701, 702, 703, 704, 705, 706, 707, 708, 711
```

Followed by `0x00` terminator. The LEB128-encoded list (plus terminator) takes 46 bytes for `cube.lavs`.

### 4.2 Wire-type tag table

Immediately after the terminator, a packed bitfield of `N` two-bit entries gives the wire type for each property ID in the dictionary, in the order they appeared in §4.1. The bitfield is packed **least-significant bits first**.

Wire type tag values:

| Tag | Wire type | Encoding |
|---|---|---|
| `00` | varint | LEB128 unsigned |
| `01` | string | UTF-8 string (length-prefixed), also carries raw binary payloads |
| `10` | float | float32 little-endian |
| `11` | color | color32 (RGBA, 4 bytes) |

The tag table occupies `⌈N / 4⌉` bytes. For `cube.lavs` with 24 properties, that's 6 bytes:

```
offset 62: 0xE9 0x02 0x45 0x55 0x40 0x55
```

Unpacking the first tag byte `0xE9 = 0b11101001` (LSB first):

- bits 1-0: `01` → string (property ID 100 = `name`)
- bits 3-2: `10` → float (property ID 110 = `width`)
- bits 5-4: `10` → float (property ID 111 = `height`)
- bits 7-6: `11` → color (property ID 200 = `baseColor`)

After reading the tag table, your decoder has a map from property ID to wire type. Hold onto this for the scene stream walk.

For `cube.lavs`, the property dictionary spans offsets 16 to 67 (52 bytes total).

---

## 5. Parsing the scene stream

The scene stream runs from the end of the property dictionary to the start of the edition block (byte `editionBlockOffset - 1`, inclusive).

### 5.1 Object format

Each object in the scene stream consists of:

1. **Type ID** — LEB128 unsigned. Identifies the Core type (`Profile = 1`, `Artboard = 2`, `Scene = 3`, etc.).
2. **Property pairs** — zero or more `(propertyID as LEB128, value)` pairs, where the value is encoded per the wire type your dictionary lookup returned.
3. **Sentinel `0x00`** — a single zero byte terminating the object's properties.

So your parser loop is:

```
while pos < editionBlockOffset:
    (typeId, n) = decodeLEB128(bytes, pos); pos += n
    properties = {}
    while bytes[pos] != 0x00:
        (propId, n) = decodeLEB128(bytes, pos); pos += n
        wireType = dictionary[propId]
        (value, n) = decodeValue(bytes, pos, wireType)
        properties[propId] = value
        pos += n
    pos += 1  # consume the sentinel
    sceneObjects.append((typeId, properties))
```

The `decodeValue` function dispatches on wire type:

- varint → `decodeLEB128(bytes, pos)`
- string → read LEB128 length `L`, then read `L` raw bytes (interpret as UTF-8 if the property ID is text, or as raw binary if it's `positions/normals/uvs/colors/indices`)
- float → read 4 bytes, interpret as IEEE 754 LE float32
- color → read 4 bytes as `[R, G, B, A]`

### 5.2 Tree reconstruction by document order

LAVS-001 does not encode parent-child relationships via indices. The scene tree is reconstructed by maintaining a container stack as you read the stream:

- An **Artboard** (type 2) pushes itself as root.
- A **Scene** (type 3) pushes as a child of the topmost Artboard.
- A **Node** (type 4) attaches as a child of the topmost container (Scene or Node).
- Other types (Mesh, Light, Camera) attach to the topmost Node.

References between objects (e.g., `Mesh.bufferRef`) use **document-order indices**, counted from 1. Index 0 means "no reference".

For `cube.lavs`, the scene stream has 8 objects in document order:

| Doc index | Type | Notes |
|---|---|---|
| 1 | Profile | features = "pbr-color" |
| 2 | Artboard | name = "Cube", width = 1.0, height = 1.0 |
| 3 | Scene | name = "Default" |
| 4 | Node | name = "CubeRoot" |
| 5 | Buffer | vertexCount = 24, positions + normals + indices |
| 6 | Material | baseColor = #14B8A6 (teal), roughness = 0.4 |
| 7 | Mesh | bufferRef = 5, materialRef = 6 |
| 8 | PalettePin | paletteName = "teal" |

### 5.3 Worked example — the first scene object

The first scene object in `cube.lavs` starts at offset 68 (immediately after the property dictionary). Decoding byte by byte:

- offset 68: `0x01` → typeId = 1 → Profile
- offset 69: `0xD8 0x04` → propId = 600 (LEB128: 0x58 + 4×128 = 600)
  - wire type from dictionary lookup: string (tag `01`)
- offset 71: `0x09` → string length = 9
- offset 72-80: 9 UTF-8 bytes — `pbr-color`
- offset 81: `0x00` → sentinel, object complete

The Profile object decodes to `{ type: 1, props: { 600: "pbr-color" } }`. It took 14 bytes total.

Continue this loop for the remaining 7 objects until your cursor reaches offset 894 (the edition block offset).

---

## 6. Parsing the edition block — fast-path read

This is the format's killer feature: you do **not** need to parse the scene stream to read edition metadata.

Steps:

1. Read the header. You have `editionBlockOffset = 894`.
2. Compute `CRCFooterStart = fileSize - 4`.
3. Read the 4 bytes at `CRCFooterStart` as a uint32 little-endian → `expectedCRC`.
4. Compute CRC32 over `bytes[editionBlockOffset .. CRCFooterStart - 1]` using the standard polynomial (see §8). Compare to `expectedCRC`.
5. Starting at `editionBlockOffset`, parse a single Core object:
   - Read typeId as LEB128. **MUST** be `100` (Edition).
   - Read property pairs until sentinel `0x00`.
6. If the Edition has `signatureAlgorithm` (property 709) declared, the signature blob follows: LEB128 length + bytes.
7. Map the property IDs to friendly names and return the edition metadata.

For `cube.lavs`:

- editionBlockOffset = 894
- CRC footer at 1170 (4 bytes: `0x38 0xFA 0x9D 0x95` = CRC32 value `0x959DFA38`)
- Edition block bytes: from 894 to 1169 (276 bytes total)
- First byte at 894 is `0x64`, which is LEB128 for value 100 = Edition. ✓

The Edition object's properties decode to:

```
{
  title: "Cube Edition I",
  author: "Matthew Scott · Project Lavos LLC",
  editionNumber: 1,
  editionTotal: 0,  // open edition
  year: 2026,
  license: "CC-BY-4.0",
  description: "The LAVS-001 canonical worked example. ...",
  palettePin: "teal",
  runtimeVersionMin: "0.1.0",
  provenance: "lavs-encoder://lavs-format@<sha>/build-cube.ts?size=1"
}
```

No signatureAlgorithm is declared, so no signature blob follows. The edition block ends at byte 1169; the CRC32 footer is at 1170-1173.

---

## 7. CRC32 footer

The final 4 bytes of every `.lavs` file are a uint32 little-endian CRC32 of the edition block (from `editionBlockOffset` to the byte immediately preceding the CRC).

The CRC uses the standard CRC-32 polynomial:

- Polynomial: `0x04C11DB7` (reversed: `0xEDB88320`)
- Initial value: `0xFFFFFFFF`
- Final XOR: `0xFFFFFFFF`

These are the same parameters used by PNG, ZIP, and zlib. In most languages, this is the function provided by the standard CRC32 library:

```
# Python (stdlib)
import zlib
crc = zlib.crc32(edition_block_bytes) & 0xFFFFFFFF

# Go (stdlib)
import "hash/crc32"
crc := crc32.ChecksumIEEE(editionBlockBytes)
```

If your library's `crc32` function uses different parameters (rare but possible — Castagnoli polynomial is sometimes the default), check the parameters or use a known-good implementation.

The CRC scope is the **edition block only**, not the entire file. This is intentional: external verifiers that only care about edition metadata can verify CRC + parse Edition object with no scene-stream work. Scene-stream integrity is covered by the signature (if signed), not the CRC.

---

## 8. Putting it together

A complete LAVS-001 decoder in any language has roughly this shape:

1. Read file → byte buffer.
2. Verify magic.
3. Read header fields (5 LEB128s + 1 uint32 LE).
4. Read property dictionary → map of property ID to wire type.
5. Loop reading scene objects until `editionBlockOffset` reached.
6. Read edition block (one Core object) → edition metadata.
7. If `signatureAlgorithm` present, read signature blob.
8. Compute CRC32 over edition block bytes; compare to footer.
9. Return structured result: header info + scene objects + edition metadata + CRC validity.

A read-only decoder is approximately 200-300 lines in most languages. The TypeScript reference is in `src/format/decoder.ts`. The official Python decoder will be in `~/Projects/lavs-format-py` once stabilized.

---

## 9. Verifying your decoder

Once you have a working decoder:

1. Decode the [six reference editions](https://github.com/guitargnarr/lavs-format/tree/main/examples) and compare your output to the TypeScript reference's `npm run verify` output.
2. Run your decoder against an intentionally corrupted file (truncate the last 100 bytes) — your decoder should fail with a clear error, not silently produce garbage.
3. Flip one byte inside the edition block of a known-good file — your decoder should detect the CRC mismatch.
4. Read the [conformance test suite](/conformance.md) and run your decoder through the 18 checks it specifies. If your decoder reaches the same conclusions on each check as the reference does on each file, you're conformant.

---

## 10. How editions surface

The signed editions you decode will be published in two places:

- **`/editions` on projectlavos.com** — the manifesto-framed contextualization gateway, where buyers and visitors discover what editions are and why they exist.
- **`lavs.projectlavos.com`** — the standalone gallery where editions render live and edition metadata (signature verification, edition number, license) surfaces in detail.

If you are building a third-party viewer or marketplace tool, fetching the manifest from `https://lavos-pubkey.projectlavos.com/manifest.json` is the canonical way to enumerate editions. The signing public key for verification lives at `https://lavos-pubkey.projectlavos.com/pubkey.pem`; the revocation list at `https://lavos-pubkey.projectlavos.com/revoked.txt`. See [`rotation-policy.md`](/rotation-policy.md) for the full signing and verification protocol.

---

## 11. License

This tutorial is licensed CC BY 4.0 — the same license as the spec. You may copy, adapt, redistribute, and translate it; attribution to Project Lavos LLC is required.

The reference implementation (encoder, decoder, runtime, viewer, engine) at `github.com/guitargnarr/lavs-format` is proprietary; the spec it implements is open. Your independent decoder, if you write one, is yours — produce it under whatever license suits you.

---

If you find an ambiguity in this tutorial or in the spec, file an issue at [github.com/guitargnarr/lavs-format/issues](https://github.com/guitargnarr/lavs-format/issues). Conformance bugs in the reference implementation get triaged; gaps in this tutorial get filled.

---

## Appendix — Validating your implementation against the cube reference

Once you have a decoder that parses `cube.lavs`, the table below is a quick correctness check. Every value below has been verified against the canonical bytes by the Python reference implementation. Your decoder, given the same `cube.lavs` bytes, should produce the same values.

| Property | Expected value |
|---|---|
| File total bytes | `1174` |
| Magic header | `0x4C 0x41 0x56 0x53` (ASCII `LAVS`) |
| `specMajor` (LEB128) | `1` |
| `specMinor` (LEB128) | `0` |
| `runtimeAbi` (LEB128) | `1` |
| `fileId` (LEB128) | `0x4C415641` = `1,279,350,337` decimal (5 LEB128 bytes) |
| `editionBlockOffset` (uint32 LE) | `894` |
| Header end offset | `16` |
| Property dictionary IDs | `[100, 110, 111, 200, 202, 300, 301, 310, 311, 312, 315, 316, 510, 600, 700, 701, 702, 703, 704, 705, 706, 707, 708, 711]` |
| Property dictionary ID count | `24` |
| Tag table byte count | `6` (⌈24/4⌉) |
| Tag table first byte | `0xE9` |
| Tag byte 0 decodes to wire types | `(01, 10, 10, 11)` = `(string, float, float, color)` for properties `(100, 110, 111, 200)` |
| Property dictionary total size | `52` bytes (header end 16 → scene stream start 68) |
| Scene stream start offset | `68` |
| Scene object count | `8` |
| Scene object types | `[Profile(1), Artboard(2), Scene(3), Node(4), Buffer(6), Material(7), Mesh(5), PalettePin(17)]` |
| First object type | Profile (Type 1) |
| First object property 600 (features) | `"pbr-color"` |
| Edition block byte count | `276` |
| CRC32 footer offset | `1170` |
| Edition title | `"Cube Edition I"` |
| Edition author | `"Matthew Scott · Project Lavos LLC"` |
| Edition number | `1` |
| Edition total | `0` (open edition) |
| Edition year | `2026` |
| Edition license | `"CC-BY-4.0"` |
| Edition `signatureAlgorithm` | absent (this is a release candidate, not yet signed) |
| CRC32 of edition block | valid (matches the footer) |

If your decoder matches every value above, you have a correctly-implemented LAVS-001 reader for the cube reference edition. Extend the same protocol to the other canonical editions (mobius, torus-knot, voronoi, lsystem-tree, neural-mesh, cube-with-sky) by running them through your decoder and comparing scene object counts, edition metadata, and CRC validity against the TypeScript reference's `verify.ts` output.

The Python reference's [test_tutorial_alignment.py](https://github.com/guitargnarr/lavs-format-py/blob/main/tests/test_tutorial_alignment.py) (publication pending) is the operational version of this table — it asserts every value above against the Python decoder's output. If you write your decoder in a language with a test framework, replicating that test against your output is the recommended way to lock down correctness.
