Published on: 2024-07-26

Making games play themselves with Rust Part 2.5: Proof by Exhaustive Search

Introduction

This post represents a rabbit hole I went down while putting together the previous post. If you haven’t seen it, I would recommend reading that first to add context to what I am doing here. This post documents the comical amount of effort I went through to verify a hypothesis I had about how to easily discriminate between monster and treasure sprites. It covers some of what didn’t work as well as what did, so feel free to skip around to what interests you most.

What Almost Was

The goal was simple - given a 32x32 pixel tile, determine whether it contains a treasure sprite, a monster sprite, or empty. I immediately thought I could solve this by picking a pixel somewhere in the middle (that’s guaranteed to be covered by any monster or treasure sprite that could possibly be there) and checking if it was the color of the treasure sprite. If not, check it against the background color and if it didn’t match then it had to be a monster. I implemented this by picking a random pixel near the middle of the tile and it worked flawlessly parsing a couple hundred puzzles. I wrote a paragraph about it in the other blog post and moved on to the next section. I went for a bike ride, made dinner, then watched Jeopardy. It wasn’t until I was lying in bed later that evening that another thought came to mind.

How can I be sure that the pixel I selected will never have any false positives?

The treasure detection code is very simple…

match pixel {
    background_color => Tile::Empty,
    treasure_color => Tile::Treasure,
    _ => Tile::Monster,
}

… but it makes a very crucial assumption that none of the monsters have the same color as the treasure chest for the pixel that I am sampling. There are 18 different monsters, each with multiple animated frames. It’s possible in all the puzzles I parsed, there was one frame of one animation that I didn’t encounter which would be the failure case. As unlikely as this would be, the only way to be sure was with an exhaustive search. In this moment I had nerd-sniped myself and nothing less than perfection would be adequate.

What Didn’t Work

I figured the easiest way to build some confidence was to just parse a LOT more puzzles. I could have made some estimates about the number of animation frames in each animation and combined that with the rate at which I scrolled through puzzles to come up with an idea of how long I should search to give a statistically high degree of confidence that my search was complete. Instead, I captured screenshots of 25000 puzzles (surely that must be enough) and used a mask to separate the sprites from the background.

From left to right: The plate, mask, and the puzzle used to make the mask.

After cropping each tile to a 32x32 region, I hashed each image and removed all the duplicates. This left me with 66 sprites, of which 15 were partial sprites that were actually parts of larger sprites that overflowed into neighboring tiles.

The full set of sprites (some are animated so it looks like there are duplicates but there are not), including the treasure chest, and their respective masks. I've crossed out a handful of masks that show partial results from large monsters that overflow into neighboring tiles.

If you got out your abacus and counted how many monsters there are, you’ll have noticed I came up shy of the 18 I mentioned previously. This is because quickly scrolling through puzzles only works for randomly generated puzzles and some monsters are exclusive to the curated puzzles. I essentially just ignored this problem at the time assuming that if it worked I could find a way to capture the other sprites later.

The next part of the solution involved overlaying all the masks together and computing the intersection (which pixels are white in EVERY mask). I could overlay this “supermask” on the treasure sprite to identify pixels which were candidates for sampling in the parsing code.

From left to right: 'Supermask', treasure sprite, treasure-colored super mask.

This result was kinda cool, but I wasn’t satisfied yet. There were a couple problems with this solution:

I sampled 25000 puzzles to find all the sprites. While this is probably “good enough”, it’s not an exhaustive search and there is still a change I missed a frame somewhere
Like I mentioned above, this doesn’t include monsters exclusive to the curated puzzles
(minor) Many of the sprites have transparent shadows which should be discarded from the masks.

So while this was a fun attempt, I’ll have to get more serious if I want the real solution.

Datamining

If you’ve ever dug around in the files shipped with a game, you’ll find one of three things.

The game assets are stored in normal file formats that are easy to read.
The game assets are wrapped in a proprietary format which often will encode metadata useful only in the context of the game.
The game assets are wrapped in proprietary formats and intentionally obfuscated specifically to prevent people like me from doing things like this.

Fortunately Last Call BBS isn’t in category #3, but unforunately it is still #2, with proprietary encoding for the textures. I searched around online to see if anyone had figured out the format and luckily stumblede upon a reddit post describing everything I need.

If you have the game installed through Steam, the assets for Dungeons and Diagrams are stored in Steam/steamapps/common/Last Call BBS/Content/Packed/textures/tokyo. The .tex files contain the width and height of the sprite in pixels, and the image data which is stored in LZ4-compressed RGBA format. There is also some other miscellaneous data that is not needed.

Offset	Size (Bytes)	Description
8	4	Width
12	4	Height
72	4	Payload size
76	The rest	Image data

I used the byteorder crate which is useful for reading and writing binary data. It makes it easy to for example read 8 bytes into a u64 with control over the endianness. I also used the lz4_flex crate to decompress the image data. I wrote a function which takes a path to a .tex file and another path for where to write the .png file.


use std::{
    fs::File,
    io::{Cursor, Read, Seek, SeekFrom},
    path::{Path, PathBuf},
};

use byteorder::{LittleEndian, ReadBytesExt};
use xcap::image;

fn convert_tex(src_path: &Path, dest_path: PathBuf) {
    let path = PathBuf::from(src_path);

    // Storage for the file and compressed image data
    let mut buffer = Vec::new();
    let mut compressed = Vec::new();

    File::open(path).unwrap().read_to_end(&mut buffer).unwrap();

    // Cursor is used for file navigation, skip first 8 bytes
    let mut rdr = Cursor::new(buffer);
    rdr.seek(SeekFrom::Current(8)).unwrap();

    // Read width and height, then skip more metadata
    let width = rdr.read_u32::<LittleEndian>().unwrap();
    let height = rdr.read_u32::<LittleEndian>().unwrap();
    rdr.seek(SeekFrom::Current(56)).unwrap();

    // Read payload size, then read the payload
    let payload_size = rdr.read_u32::<LittleEndian>().unwrap();
    compressed.resize(payload_size as usize, 0);
    rdr.read_exact(&mut compressed).unwrap();

    // Decompress the texture, flip it vertically, and save it
    let texture = lz4_flex::decompress(&compressed, 1000000).unwrap();
    let img = image::RgbaImage::from_raw(width, height, texture).unwrap();
    let img = image::imageops::flip_vertical(&img);

    img.save(dest_path).unwrap();
}

I also encountered some .array.tex files which store animated sprites. The format is pretty similar to .tex files but with an extra field describing the number of frames present in the file. I ran this function recursively through the entire Last Call BBS assets folder to produce a copy of the directory structure but with all the .tex’s replaced with .png files. Check out the spoils!

Tile plates, for both the unsolved (left) and solve (right) states

All the sprites, including all animation frames. I composed them into a single video here so I didn't have to upload 18 separate gifs!

Walls (in both valid/invalid state), treasure, path marker, and number sprites.

In total, there are 66 unique monster sprites spread across 18 monsters. Because things can never be easy, all of the game assets are cropped and so don’t align nicely with the 33x33 pixel grid that the game board uses. I had to perform image registration so when I combine all the sprite masks they overlap correctly.

I first attempted to do the alignment using the imagemagick suite of tools, particularly composite for image overlay and compare for quantifying image similary. I needed some references images to compare against, so I manually captured one per monster. Then by sliding the sprite across the board image and finding the offset with the best similary score I could compute the correct offset for each monster with a simple bash script. A gif is worth a thousand words here.

A small part of the alignment process, with the alignment highlighted.

#!/usr/bin/bash
cwd=$(pwd)
target="dnd.png"
best=1000

# Recall the "tokyo" directory from earlier that houses all the sprites.
# Loop through each monster
for monster in $(ls tokyo)
do
    echo "checking $monster"
    for frame in $(ls $cwd/tokyo/$monster)
    do
        echo "  frame: $frame"
    done

    # Loop an offset value for the sprite. I've chosen bounds that roughly align with the 
    # game board to avoid searching in areas where the sprite couldn't possibly be.
    for y in $(seq 140 420)
    do
        for x in $(seq 25 300)
        do
            # Use 'composite' to overlay the sprite on the main board screenshot
            magick composite $cwd/tokyo/$monster/0.png -geometry +$x+$y $target tmp/over.png
            # Compute a match score using 'mean squared error' metric. It outputs to stderr
            # for some reason hence the extra pipe wrangling. 'cut' pulls out the relevant
            # data from the output for doing numerical comparison
            score=$(magick compare dnd.png tmp/over.png -metric MSE tmp/dif.png 2>&1 > /dev/null | cut -f 1 -d '.')
            if [[ $score -lt $best ]]
            then
                best=$score
                echo "best: $x $y $score"
                cp tmp/dif.png tmp/best.png
            fi
        done
    done
done

Here’s the output of the script after about 30 seconds. The 3 numbers are the x coordinate, y coordinate, and the match score (lower is better). You can see the match score decreasing over time.

checking bear
best: 25 140 26
best: 25 141 25
best: 25 143 24
best: 25 144 23
best: 25 145 22
best: 25 146 21
best: 25 148 20
best: 25 152 19
best: 25 155 18
best: 25 170 17
best: 25 346 16
best: 26 350 15

It appeared to be working, but it was painfully slow. It took around 30 seconds to scan a single column of pixels out of ~275. This meant that in the worst case, it would take upwards of two hours to find a match. Even worse, this is only a single frame out of the 66 total monster frames, and I would have to check each frame against each reference puzzle, which comes out to 1188 searches. At up to two hours each it would have a worst case completion of approximately 100 days (feel free to laugh). Bash was a mistake. I had to go back to Rust.

No hate to imagemagick here - I love their tools and used them to make several of the animations in both this post and the last. If you’re wondering why it was so slow, compare creates a new image and writes it to disk (not optional) for every comparison. It is designed for creating “diff” images.

For the Rust implementation, I started with a simple function to compute the squared error between two pixels, ignoring any pixels that are not fully opaque. The pixel from the reference image is called the haystack and the pixel from the extracted sprite is needle.

use std::{ffi::{OsStr, OsString}, fs, path::Path};
use xcap::image::{self, DynamicImage, GenericImageView};

// Compute the squared error between two 4-byte pixels. Ignore
// pixels that are not fully opaque.
fn compare_pixels(haystack: [u8; 4], needle: [u8; 4]) -> u64 {
    if needle[3] < 255 {
        0
    } else {
        let mut sum = 0u64;
        for i in 0..3 {
            let e = haystack[i].abs_diff(needle[i]) as u64;
            sum += e * e;
        }
        sum
    }
}

Next, a function that handled sliding our needle across the haystack. I computed bounds for the sliding which prevent the needle from being partially outside the bounds of the haystack.

// Compute sum of square error between two images at every offset, return position of lowest score
fn compare_images(haystack: &DynamicImage, needle: &DynamicImage) -> (u64, (u32, u32)) {
    let nw = needle.width();
    let nh = needle.height();

    // Keep track of best score and the position it was achieved
    let mut best_score: u64 = u64::MAX;
    let mut best_pos: (u32, u32) = (0, 0);
    // Slide the needle image over the haystack
    for x in 0..haystack.width() - nw {
        for y in 0..haystack.height() - nh {
            // Create a sub-view of the haystack
            let view = haystack.view(x, y, nw, nh);

            // Copmute sum of squared errors
            let sse = view
                .pixels()
                .map(|(_, _, p)| p)
                .zip(needle.pixels().map(|(_, _, p)| p))
                .fold(0, |acc, (p1, p2)| acc + compare_pixels(p1.0, p2.0));

            // Check if we found a better match
            if sse < best_score {
                // Perfect match, return early
                if sse == 0 {
                    return (sse, (x, y));
                }
                best_score = sse;
                best_pos = (x, y);
            }
        }
    }
    return (best_score, best_pos);
}

And lastly, a high-level function that iterated over the directories and printed out some results.

pub fn find_monster_offsets() {
    // Iterate over each monster image folder
    for dir in fs::read_dir("tokyo").unwrap() {
        let dir = dir.unwrap();
        let monster = dir.file_name();

        print!("{: <12}", monster.to_string_lossy());

        // Open reference image
        let board_path = Path::new("monster_refs").join(Path::new(&monster).with_extension("png"));
        if let Ok(board) = image::open(board_path) {
            let mut best_score = u64::MAX;
            let mut best_pos = (0, 0);
            let mut best_frame = OsString::new();

            // Iterate over each frame to find best match
            for frame in fs::read_dir(dir.path()).unwrap() {
                let frame = frame.unwrap();
                let monster_img = image::open(frame.path()).unwrap();
                let (score, pos) = compare_images(&board, &monster_img);
                if score < best_score {
                    best_score = score;
                    best_pos = pos;
                    best_frame = frame.file_name();
                }
            }
            println!(
                "{: <6} {best_pos:?} {best_score}",
                best_frame.to_string_lossy()
            );
        } else {
            println!("--  skipped  --");
            continue;
        }
    }
}

Running find_monster_offsets() immediately proved to be several orders of magnitude faster than the bash script (unsurprisingly). It took 45 seconds to complete, and if I needed to run it more than once I could probably optimize it substantially. For each monster, it told me which frame of the animation was found in the screenshot, and the offset from the top left corner of the board to the monster. The score is the sum of squared errors mentioned earlier, which was 0 (perfect match) in all cases.

monster   frame  position    score
ogre      1.png  (211, 267)  0
slime     0.png  ( 51, 378)  0
golem     0.png  ( 81, 399)  0
insectoid 0.png  ( 49, 364)  0
lich      3.png  (145, 304)  0
lookseer  0.png  (283, 339)  0
king      1.png  ( 78, 298)  0
bear      0.png  ( 44, 270)  0
goblin    0.png  ( 48, 193)  0
minotaur  0.png  ( 46, 395)  0
demon     0.png  (211, 257)  0
goat      0.png  ( 48, 234)  0
skeleton  0.png  (279, 172)  0
kobold    0.png  ( 45, 291)  0
cultist   0.png  ( 53, 366)  0
squid     0.png  (246, 402)  0
imp       0.png  ( 50, 202)  0
chest     0.png  (284, 309)  0

I manually verified a couple of these as a sanity check, and they seemed to be correct. To visualize what these numbers even mean, consider the demon sprite. It was found with an offset of (211,257) which is the distance from the top left of the image to the top left of the blue rectangle.

The blue rectangle represents the bounding box of the full sprite. It touches a total of 6 tiles.

If you consider which tile that monster is “on”, it’s the 4th row and 6th column. Annoyingly, none of the corners of the sprite were within that tile, so I used the center of the sprite to identify the tile it was actually on. Based on that, the offset from the top left corner of the monster’s tile to the top left corner of the sprite’s bounding box was (-3, -17). Small sprites that fit within a single tile would have positive offsets.

Computing the corrected offset was pretty straightforward:

const TILES_OFFSET: (u32, u32) = (49, 175);
const TILE_STRIDE: u32 = 33;
fn correct_offset(size: (u32, u32), offset: (u32, u32)) -> (i32, i32) {
    let center_x = size.0 / 2;
    let center_y = size.1 / 2;

    // Normalize coordinates so first tile is at (0, 0), and use the
    // center of the sprite rather than its corner to keep it positive.
    let offset_x = offset.0 + center_x - TILES_OFFSET.0;
    let offset_y = offset.1 + center_y - TILES_OFFSET.1;

    // Modulo shifts the value to something within the top-left tile.
    let offset_x = offset_x % TILE_STRIDE;
    let offset_y = offset_y % TILE_STRIDE;

    // Subtract the centerpoint to get the corrected offset
    (center_x as i32 - offset_x as i32, center_y as i32 - offset_y as i32)
}

The corrected output looked like this. The result for demon matches what I calculated by hand which was promising.

ogre      1.png  ( -3,  -7) 0
slime     0.png  (  2,   5) 0
golem     0.png  ( -1,  -7) 0
insectoid 0.png  (  0,  -9) 0
lich      3.png  ( -3,  -3) 0
lookseer  0.png  (  3,  -1) 0
king      1.png  ( -4,  -9) 0
bear      0.png  ( -5,  -4) 0
goblin    0.png  ( -1, -15) 0
minotaur  0.png  ( -3, -11) 0
demon     0.png  ( -3, -17) 0
goat      0.png  ( -1,  -7) 0
skeleton  0.png  ( -1,  -3) 0
kobold    0.png  ( -4, -16) 0
cultist   0.png  (  4,  -7) 0
squid     0.png  ( -1,  -4) 0
imp       0.png  (  1,  -6) 0
chest     0.png  (  4,   2) 0

Now I could finally combine all the masks and compute their intersection. I returned to imagemagicks convert tool to simultaneously resize all the sprites to a common size and offset the sprites so they are properly registered with one another. Massaging the previous output slightly so each line reads like cultist,+4-7 let me easily consume it with a bash script.

out_dir=script_output/resize

# Mask of current sprite
mask=$out_dir/mask.png

# Cumulative combined mask
supermask=$out_dir/supermask.png

mkdir -p $out_dir

# Start with a 32x32 white canvas
convert -size 32x32 canvas:white -alpha on $supermask

for line in $offsets
do
    # Parse monster name and x/y offsets
    monster=$(echo $line | cut -d ',' -f 1)
    offset_x=$(echo $line | cut -d ',' -f 2)
    offset_y=$(echo $line | cut -d ',' -f 3)
    echo "$monster"

    for frame in $(ls tokyo/$monster)
    do
        echo "  $frame"
        # Resize, apply offset, cull transparent pixels, save mask
        magick convert \
            -background none -extent 32x32+$offset_x+$offset_y \
            -channel alpha -threshold 99% +channel \
            -alpha extract -alpha on tokyo/$monster/$frame $mask

        # AND the mask with the supermask
        magick composite $mask $supermask -compose dst-in $supermask
    done
done

I had so much fun playing around with Imagemagick that I spent a few hours writing another script specifically to generate this gif of the mask creation.

Each successive mask subtracts a few pixels until we are left with a just a handful.

Taking the final mask and multiplying it with the treasure chest revealed a similar but still noticeably different result than the previous attempt:

From left to right: 'Supermask', treasure sprite, treasure-colored super mask.

At this point I was confident that I hadn’t missed anything so all that was left was to choose one of the remaining pixels arbitrarily as my monster discriminant. I chose the one at coordinate (16, 12), as an omage to the wonderful song 1612 by Vulfpeck, featuring Antuan Stanley. I’ve highlighted it in blue to make it more apparent.

Lastly, to put things in perspective, here’s an image of the board with the 64 monster sampling points highlighted. I used 2x2 highlight pixels so it’s easier to see, but when parsing I am only examining a single pixel per tile to determine its contents.

The 64 pixels we need

Conclusion

I hope you enjoyed this detour. I may do some more posts like this in the future for niche things I find interesting that I know probably don’t appeal to everyone. It was a lot of work for what boiled down to the computation of two small numbers. I could have just stuck with my gut from the start, but it lacked the satisfaction of rigorous proof. Stay tuned for part 3 of the series where I attempt to finally solve a puzzle!