Host-capture · FG2 pack · PS1 replay
Method
How a 1992 Windows screensaver ends up running on a 1994 console without an emulator under it.
~9 min read · 2326 words
On this page
The problem
Sierra’s Johnny Castaway shipped in 1992 for Windows 3.1. It is a
screensaver. The engine is a small interpreter for two custom
bytecodes – ADS, which selects scenes, and TTM, which scripts the
per-scene drawing – backed by a flat resource pack
(RESOURCE.MAP / RESOURCE.001) of compressed bitmaps and screens.
On modern hardware the engine has been decoded, cleaned up, and made
portable; the upstream that this project descends from,
jno6809/jc_reborn, runs on top
of SDL2.
A PS1 port is not “swap SDL2 for PSn00bSDK.” Three things make a straight port impossible:
- No comparable graphics pipeline. The PS1 GPU does not draw
into a CPU-addressable framebuffer. It pushes commands through an
ordering table; sprites live in VRAM as 4-bit or 8-bit
CLUT-indexed textures; “compositing a frame” means submitting
SPRTandPOLY_FT4primitives in z-sorted order. Sierra’s TTM ops were authored for a flat 8-bit framebuffer withSAVE_ZONE/RESTORE_ZONEsemantics, not for a tile-and-blit GPU. - 2 MB of system RAM and no filesystem cache. The desktop build trusts the OS to keep recently-read resources warm. The PS1 has 2 MB total, a 2x CD drive (~300 KB/s, 150ms cold seek), no disk cache, no virtual memory, and no syscall layer. Anything you want quickly has to already be in RAM. Anything large has to be paged off the disc deterministically.
- SDL2 is not portable to the PS1. PSn00bSDK is the modern
open-source SDK; it gives you
psxgpu,psxcd,psxspu,psxapi,psxgte,psxsio. There is noprintfyou can trust in a hot loop, nomallocworth leaning on for transients, nopthread, noclock_gettime. SDL’s design assumptions are not even close. The symbol-by-symbol mapping the host build hands off to the PS1 build is documented at /docs/api/.
The first prototype tried to brute-force these problems in the runtime – a faithful TTM/ADS interpreter on the PS1, replaying scenes from extracted resources, recovering disappearing actors with heuristics. It mostly worked. It also produced “Johnny disappears” bugs that moved around as fast as they could be fixed, because the runtime was trusting state that the desktop engine had built up across many scenes. That class of bug is what motivated the pivot.
The hybrid pipeline
desktop host ----> capture-host-scene.sh ----> high/low frames + frame-meta JSONs + sound-events.jsonl
|
v
export-scene-foreground-pilot.sh
|
v
build-scene-foreground-pack.py
|
v (FG2: pal4/indexed8 spans + sound-event table)
generated/ps1/foreground/*.FG2 ----> CD image ----> PS1
|
v
foreground_pilot.c (replay)
|
v
sound_ps1.c soundPlay() on cue
Each stage:
- Host capture. The desktop build (the same code, with the SDL2
backend) is asked to play one ADS+tag scene under controlled boot
state – tide, night, holiday, raft-stage. It writes a high-tide
capture and a low-tide capture: a sequence of full PNG frames, one
per displayed game frame, plus a
frame-meta.jsonwith timing data and asound-events.jsonlcontaining every0xC051 PLAY_SAMPLEop the TTM interpreter fired, with frame index, sound number, and pan / volume. This is run byscripts/capture-host-scene.sh. - Pack compile.
scripts/build-scene-foreground-pack.pyturns a capture into one FG2 binary. Each output frame becomes either a base full-render (the first frame, or any frame the differ flagged as a forced base) or a diff-from-prior. Diffs are stored as runs of indexed-pixel spans:(y, x_start, length, indexed_bytes). Sound events fold into a per-frame event table. - CD packaging.
mkpsxisoconsumesconfig/ps1/cd_layout.xmland lays out the disc image. Each routed scene contributes two pack entries (high-tide, low-tide) plus the small fixed payload – the executable,RESOURCE.MAP/.001for static metadata lookup, the SCR/PSB/SND assets the runtime still consults, and the title raw. - PS1 replay.
src/foreground_pilot/foreground_pilot.copens the matching pack for the selected scene+tide, decodes its header and per-frame index, and during the scene loop stamps each frame’s diff spans on top of the prior composite. Background, wave animation, and holiday overlays come from the PS1’s own narrow runtime; sound events fire throughsrc/platform/ps1/sound_ps1.con a per-pack event cursor with a fixed 3-frame delay so SPU key-on lines up with the visible trigger.
This is the right shape because everything that needs the desktop engine’s full state (scene continuity, replay state, the ADS selector) happens once at capture time on a 64-bit machine with gigabytes of RAM. Everything that needs to be cheap on the PS1 (memory access, sprite stamping, audio key-on) is one straight pass through a small, deterministic file.
What’s in a pack
An FG2 pack – e.g. FISHING_1.FG2 or FISHIN_L1.FG2 for the
low-tide twin – is a flat little-endian binary. The relevant
research notes are in
docs/ps1/research/PACK_PAYLOAD_LAYOUT.md
and
docs/ps1/research/PACK_MANIFEST_SCHEMA.md.
Concretely, a pack contains:
- Header. Magic bytes, format version, frame count, base-frame
count, palette size, indexed-bit-depth flag (
pal4for 4-bit / 16-color CLUT,indexed8for 8-bit / 256-color), high-or-low tide marker, source-scene identifier (ADS family + tag), and the byte offsets of the entry table, palette, base-frame block, and diff block. Sector alignment is 2048 bytes so each block lands on a CD-ROM sector boundary – the loader can read what it needs without straddling sectors and forcing extra seeks. - Palette. A single CLUT for the whole scene, packed as PS1 16-bit BGR-1555 entries. The host-side capture is constrained to a scene-stable palette so the pack does not have to re-upload CLUTs per frame.
- Entry table. One row per displayed frame. Each row is a fixed
struct: frame index, source kind (
baseordiff), block offset, block length, and the frame’s intended display duration in 60ths-of-a-second ticks. The entry table is what the replay loop walks; the diff/base blocks are loaded lazily. - Base frames. Full-render, indexed-pixel grids covering the scene’s authored compose region (not the whole 640x480 – only the rectangle the foreground actually touches). One base at the start of the scene; a small number of forced bases mid-scene where the differ found a discontinuity it didn’t want to encode.
- Diff frames. Run-length-encoded spans of changed indexed
pixels, addressed against the rectangle the prior frame
established. Each span:
(row, x_start, run_length, bytes). A PS1 frame typically spends 80-95% fewer pixel-writes than a full redraw because most of a Castaway frame is unchanged background ocean and unchanged island. - Sound-event table. Per-frame list of
(sound_number, pan, volume)triples lifted from the capturedPLAY_SAMPLEevents. This is whatforeground_pilot.cuses to firesoundPlay()on cue. - Frame-meta tail. Source frame timing in milliseconds, used at capture-validate time and preserved in the pack so a regtest can confirm the on-PS1 cadence still matches the host capture.
The runtime also carries a small companion JSON sidecar (pack_index
on the host side) used by the regtest harness, but on the disc the
runtime only needs the binary. There is one pack per scene per tide,
so the routed disc image carries up to 126 packs (63 x 2) – the
generated FG2 corpus is roughly 343 MB, which is why packs are
routed onto the CD selectively rather than all at once during
bring-up.
PS1 hardware constraints we hit
These are the gotchas that actually cost wall-clock days. Most of
them are in
docs/ps1/hardware-specs.md
or the dated worklogs under
docs/ps1/research/.
SPI pad polling needs tx_len=5, not 4. PSn00bSDK 0.24’s BIOS
pad driver (InitPAD / StartPAD) does not auto-poll under
DuckStation in the project’s runtime context. The fix was to lift
the SPI controller driver from spicyjpeg’s pads example and run
it directly: timer-2 plus SIO0 IRQ at 250 Hz, in src/platform/ps1/spi.c.
That driver, as published, sends a 4-byte poll TX. Under DuckStation
the controller bytes never make it back; the read returns
0xFFFF. The console only delivers button bytes when the full
5-byte poll sequence comes from the TX buffer. Bumping tx_len
from 4 to 5 made the controller work. This is documented in
docs/ps1/hardware-specs.md and pinned in the project’s working
notes. If you copy the spicyjpeg driver, change tx_len.
FntFlush is empirically broken in the scene-runtime context.
The pause menu needed on-screen text. The PSn00bSDK font path
(FntLoad / FntPrint / FntFlush) accepts the calls without
error but produces no visible pixels in the running scene context
– primitives accumulate in the OT and never present. Rather than
chase the root cause through PSn00bSDK internals, the pause menu
ships a custom embedded 8x8 ASCII font, drawn with POLY_F4 glyph
quads on the same OT as the scene. Captions reuse that same font
atlas. New on-screen text should not regress to FntFlush.
VRAM corruption across scenes – grRestoreBgTiles wipes
currDirty. The dirty-rectangle bookkeeping in
src/graphics_ps1/graphics_ps1.c tracks per-frame dirty regions in currDirty
and prevDirty. On a normal frame, currDirty is the spans the
foreground touched this frame; prevDirty is what it touched last
frame and now needs background restoration. The pause menu opens
mid-scene, dims everything, and on resume needs the entire scene
to redraw cleanly. The first attempt called grRestoreBgTiles() on
resume, which uses currDirty to know what to restore – but
grRestoreBgTiles itself wipes currDirty as it goes. A full
redraw on resume needs both prevDirty and currDirty honored,
which is why the codebase now exposes grForceFullRedrawNextFrame()
to flag the next frame as a forced full background restore. This
is pinned in the project’s memory and showed up multiple times
during pause-menu bring-up.
SPU HLE vs hardware divergence under DuckStation.
SpuSetCommonMasterVolume is not honored by DuckStation’s HLE
audio path. The pause menu’s mute toggle had to be reimplemented
as a direct write to the SPU master-volume registers. This was
isolated during the v0.3.6-ps1 audio bring-up, alongside a
batch of VAG-encoder bugs (scripts/wav2vag.py): inverted
shift-exponent, swapped ADPCM nibble pair order, missing 64-byte
SPU DMA alignment, ADSR1 attack-rate orientation. Audio-on-real-
hardware behavior is presumed-correct but unverified – one of
the open items. See commit 355227fa for the full bug list.
TTY printf is the only real debug surface, and it has a price.
For most of 2026-Q1 the project ran with debugMode=0 and
“visual debugging” – colored pixels via LoadImage, the
five-panel telemetry overlay, gated JCPERF summaries during
scene transitions only. Per-frame vprintf was outright
destabilizing scene playback (unbounded format buffers, hot-path
text I/O changing timing). As of 2026-04-25, bounded vprintf
plus DuckStation TTY/file logging restored gated printf()
breadcrumbs for setup/teardown – the JCSPI, JCPAD, JCPERF
prefixes downstream tools key off. It still must not be called
per frame; that’s why ps1_perf is level-gated
(OFF/SUMMARY/DETAIL/DEBUG).
Other gotchas worth flagging in passing:
- The PSn00bSDK 0.24 toolchain runs in Docker on
linux/amd64(config/ps1/Dockerfile.ps1). Native macOS toolchains were attempted and abandoned – missingcc1/cc1plus, source builds need Linux. Docker was the cheapest path that worked. - 4-bit indexed sprite format (
indexedPixels) saved roughly 4x the RAM of the original 15-bit direct-color path, which is what let multi-sprite scenes fit in 2 MB at all. - Hash-based O(1) resource lookup replaced the original O(N)
strcmpscan during the 2026-03 perf push. Worth ~15-25% of compositing time. - BSS budget was held under ~57 KB through development;
mallocis used for transients rather than static arrays precisely because static arrays push BSS into the danger zone.
Why hybrid won
The PS1 does not have to be smart. The host build does the smart
work – runs the real engine, captures the real frames, encodes the
diffs, lays out the disc – and the PS1 just plays back. That is
why 63 scenes can fit on a single CD-ROM at all, why the
executable is around 208 KiB at v0.9.3-ps1 after
the dead ADS/TTM/FG1 paths were stripped (down from a much larger
pre-strip ELF), and why scene continuity bugs stopped being a
runtime concern: the
runtime no longer carries the state that those bugs lived in.
The cost is that every scene needs a verified host capture before
it joins the validated count. At v0.9.3-ps1 that
count is 63 / 63 —
every routed scene the original game had now plays pixel-perfect
on the PS1 with synced SFX across every applicable variant. The
path from the first signed-off scene to all 63 was the same
repeatable loop on every row: capture, pack, route, replay, sign
off. The hard work was the loop’s edges — multi-view foreground
stitches for the wide scenes, residual-cleanup pack fixes when a
few pixels missed, the backdrop-key guard that kept
story-loop walks
from running across stale islands. That is the property
the project was reaching for.
The second bar, the
performance battle card, is its
own ledger. It moved from +17.4% over target / 87.1% target
speed at the compact full-matrix baseline to
99.8% target speed at
v0.9.3-ps1 — closed without changing pixels, sound
event timing, scene identity, or long-run heap stability. The
reference manual
explains what each column means; the
retrospective
walks through which experiments landed (FGP3 packs, scene-local
prefetch relief, stream-window retuning, padded residual packs,
scoped read groups) and which did not (-O2, naive read-group
probes); the
v0.8.1 follow-on
documents the soak loop that catches what the per-commit matrix
doesn’t. Visual signoff and headless perf stay separate ledgers
because their failure modes are uncorrelated; mixing them is how
regressions ship.
Related pages
- Development workflow — the author’s per-scene runbook (capture, encode, replay, screenshot, validate); this method page is the why, that page is the what to type.
- File formats — the five formats this pipeline produces and consumes (FG2 pack payload, pack manifest, dirty-region template, transition prefetch schema, SDL compat lite).
- Hardware — the PS1 envelope (33.8688 MHz MIPS, 2 MB RAM, 1 MB VRAM, 512 KB SPU, 2× CD) every constraint above traces back to.
- Glossary — the technical vocabulary used throughout (ADS, TTM, FG2 pack, capture, replay, dirty-rect, FntFlush, FISHING 1 bar).
- History — the longer narrative version, dated, eras-and-milestones.
- Status — the component-level state at the current release.
- Lab: the pivot that almost didn’t happen — magazine retrospective on the choice between “looks similar” and pixel-perfect-with-host-capture that defined the rest of this method. The decision behind every section above.
- Lab: the 63-scene grind — magazine treatment of applying this method to every routed scene, one capture-encode-replay-validate loop at a time. This page is the recipe; that essay is what running it 63 times actually looked like, including the last-cluster hard cases.