Published on: 2024-06-24

Making games play themselves with Rust Part 1: Screen Interaction

Introduction

Dungeons and Diagrams (henceforth referred to as DnD) is a minigame in Last Call BBS, the (unfortunately) final puzzle game released by Zachtronics before closing their studio to pursue other ventures. DnD is a logic game where the objective is to fill out an 8x8 grid with walls such that it forms a single contiguous open region along with a few other simple constraints. The game includes 63 curated puzzles of increasing difficulty for the player to solve, as well as an “infinite” mode that has 100 million randomly seeded puzzles of varying difficulty.

The Z5 Powerlance by Sawayama is a fictional retro PC. Once booted, the player can access various games, including Dungeons and Diagrams, shown here.

With some effort (the difficulty gets very high!) I manually solved the 63 curated puzzles, and approximately 170 of the seeded puzzles. As an avid Factorio enjoyer, I couldn’t help imagining how fun it would be to automate this game watch it play itself. I am writing this blog series to scratch that itch, as well as to provide something of a reference for anyone else who’s interested in game automation. There’s a lot to cover so I will split it into a few separate posts, and I’ll try to keep it interesting with lots of images. This post will cover the setup required to programmatically view pixels from the game and perform mouse and keyboard input.

I am writing my solver in Rust, running on Ubuntu 24.04. If you are on Windows or Mac, worry not, as I hope to make my implementation as platform-agnostic as possible. I am assuming some familiarity with Rust here but if you don’t know rust I hope the explanations along with the visuals will be interesting enough on their own.

Objectives

Before we get into the weeds, I’d like to set out my objectives for both this and subsequent posts. In this post, I would like to produce a program which can achieve the following:

Find a window on the deskop given it’s title, in this case “Last Call BBS”.
Capture the framebuffer for that window, and peek into the pixel values.
Move the mouse to specific coordinates, and perform mouse input (left and right click).
Normalize these coordinates so everything is relative to the DnD sub-window.
(Bonus) Perform keyboard input. This isn’t needed for the solver, but it will help me with some testing and validation later.

For future posts, I will cover (at least) the following:

Parsing the game state
Solving the game with backtracking
Solving the game with FACTS and LOGIC
Generating my own puzzles (including much larger grid sizes)

The Rules

This game-within-a-game is reaching dangerous levels of niche, so it’d probably be best to kick things off with a brief introduction to the game’s rules and mechanics. The game provides the following tutorial (it is nicely animated in game, but I’ve snagged screenshots for your convenience):

The tutorial walks you through the logic required for a simple puzzle.

If the rules aren’t totally clear yet, I will elaborate on them in a future post where I implement the solver. For now just know that the player has to place the walls in the grid such that the above constraints are met. Each puzzle has a unique solution which can be derived through repeated application of the rules above. The game also encourages the player to use path markers (the red gems) to help keep track and assist in further deduction. Now, let’s get started!

Capturing the Screen

Before we get into parsing the game state and simulating the player, we need a way to capture the pixels of the game and send user input back in response. There are many rust crates to accomplish these tasks, but I have chosen xcap for window queries and screen capture, and enigo (this required installing libxdo as well) for user input. Both of these libraries claim to be cross-platform, so they should work on Windows and Mac as well.

Now, if you looked closely at the image in the introduction, you may have noticed that is has a screen distortion effect applied, which would make it difficult to reliably detect colors and positions of image features. It also has 2x pixel scaling, so there is a lot of redundant data in the image. Luckily both of these can be solved by modifying the graphics settings for the game.

One of my favourite settings dialogs of all time.

Here I have changed to windowed mode with the smallest resolution, and disabled pixel stretching and the screen effect.

Screen minus the distortion effect.

The result is a game window with 1:1 pixel sizes and no distortion. The downside is that it makes it a tiny window on my monitor, requiring me to sit pretty close to the screen if I am playing it manually. Luckily, if all goes to plan, the game will soon be playing itself!

Now we can begin with a simple app to find a window by name, and move the mouse to the center of it. For fun let’s also use a keystroke to open and close the tutorial (F1), send a click, and take a screenshot.

use std::{thread, time::Duration};

use enigo::{Button, Coordinate, Direction, Enigo, Key, Keyboard, Mouse, Settings};
use xcap::Window;

const GAME_TITLE: &str = "Last Call BBS";

fn main() {
    // Create an iterator over all the open windows
    let windows = Window::all().unwrap();

    // Search for our specific window 
    let window = windows
        .iter()
        .find(|win| win.title() == GAME_TITLE)
        .unwrap();

    // Initialize enigo input simulator
    let mut enigo = Enigo::new(&Settings::default()).unwrap();

    // Move the mouse
    enigo
        .move_mouse(
            window.x() + window.width() as i32 / 2,
            window.y() + window.height() as i32 / 2,
            Coordinate::Abs,
        )
        .unwrap();

    // Click
    enigo.button(Button::Left, Direction::Click).unwrap();

    // Open and close the tutorial
    enigo.key(Key::F1, Direction::Click);
    thread::sleep(Duration::from_millis(100));
    enigo.key(Key::F1, Direction::Click);
    thread::sleep(Duration::from_millis(100));

    // Take a screenshot
    let img = window.capture_image().unwrap();
    img.save("game.png").unwrap();
}

This results in the image below - we placed a wall! I had to add thread::sleep in there otherwise the screenshot would happen faster than the game could process the input. The tutorial window also opens and closes as expected, but I didn’t capture it in the image.

Screen after placing a single wall tile.

This performs as expected, with a small caveat that it seems to require the mouse to already be above the Last Call BBS window or else the movement and click don’t register. I believe this is likely a bug with Enigo, but for now I’ll live with the limitation as it’s pretty minor.

Error Handling

Let’s remove those unsightly unwrap calls, and put some proper error handling in place. Almost every action I perform here is falliable. I’m going to wrap them into a custom InitError, describing things that could go wrong during program initialization. At the same time, I’m going to wrap my existing functionality into a new struct DungeonCrawler that will store some persistent data for later use. Lastly, it will be important to know where the Dungeons and Diagrams subwindow is located with respect to the main game window, so I’ll add a function to locate that offset.

#[derive(Debug)]
enum InitError {
    BBSNotFound,                            // Game is not running
    SearchError(SearchError),               // DnD sub-window location error
    XCapError(xcap::XCapError),             // xcap error wrapper
    EnigoError(enigo::NewConError),         // enigo error wrapper
    ImageError(xcap::image::ImageError),    // image capture error wrapper
}

#[derive(Debug)]
enum SearchError {
    NotFound,                 // DnD sub-window couldn't be located
    MultipleResults(usize),   // Multiple matches were found for the pattern
    OutOfBounds,              // Window found, but it is partially out of bounds
}

This may not be the perfect way to do error handling in Rust. I am no expert, but I have found this to be a pretty concise and ergonomic way of working with them (in tandem with the ? operator and map_err). All errors will eventually bubble up to main() where the program will print the error and exit gracefully. Now on to the DungeonCrawler implementation.

Parsing Preparation

This is pretty similar to the test application we saw earlier. Notable additions are the call to find_dnd_offset which looks for the DnD window, and a call to img.view() to save a cropped screenshot containing only the DnD window. Saving the images to disk is not really necessary, but it may be useful for debugging in the future so it doesn’t hurt. At the end of initialization, I also click(0, 0) which forces the window to capture the mouse, working around the caveat I mentioned earlier.

struct DungeonCrawler {
    enigo: Enigo,
    window: Window,
    offset: (u32, u32),
}

impl DungeonCrawler {
    fn new() -> Result<Self, InitError> {
        // Find 'Last Call BBS' window
        let windows = Window::all().unwrap();
        let window = windows
            .iter()
            .find(|win| win.title() == GAME_TITLE)
            .ok_or(InitError::BBSNotFound)?
            .clone();

        // Capture the screen
        let img = window
            .capture_image()
            .map_err(|e| InitError::XCapError(e))?;
        img.save("game.png").map_err(|e| InitError::ImageError(e))?;

        // Locate DnD subwindow
        let dnd_offset = find_dnd_offset(&img).map_err(|e| InitError::SearchError(e))?;
        img.view(dnd_offset.0, dnd_offset.1, GAME_CROP.0, GAME_CROP.1)
            .to_image()
            .save("dnd.png")
            .map_err(|e| InitError::ImageError(e))?;

        let mut dc = Self {
            enigo: Enigo::new(&Settings::default()).map_err(|e| InitError::EnigoError(e))?,
            offset: (
                window.x() as u32 + dnd_offset.0,
                window.y() as u32 + dnd_offset.1,
            ),
            window,
        };

        // Force a click to capture the mouse in the application
        dc.click(0, 0)
            .map_err(|e| InitError::MouseCaptureError(e))?;

        Ok(dc)
    }
}

To find the location of the DnD window, I spent a bit of time looking for a small pattern of pixels that could be considered unique given the entire rest of the screen. I wanted it to be fast so I kept it to a short pattern of 3 pixels (12 bytes). The pattern is located in the top-left corner of the x button to close the DnD window.

Enhance!

In code, I have to also add alpha channel bytes, hence the extra 255’s. The exact pattern can be seen in the snippet below:

fn find_dnd_offset(image: &Image) -> Result<(u32, u32), SearchError> {
    // Pattern of image bytes to uniquely locate the DnD subwindow. The chosen pattern
    // exists at 299,0 relative to the top left corner of the subwindow.
    const PATTERN_LEN: usize = 12;
    const PATTERN: [u8; PATTERN_LEN] = [69, 52, 56, 255, 237, 169, 135, 255, 181, 147, 131, 255];
    const PATTERN_OFFSET: (u32, u32) = (299, 0);

    // Iterate over sliding window of 12 bytes, considering only every 4th window (pixel alignment)
    let matches = image
        .array_windows::<PATTERN_LEN>()
        .step_by(4)
        .enumerate()
        .filter_map(|(i, &chunk)| {
            if chunk == PATTERN {
                // Given the window index, calculate x and y offsets. Unsigned
                // wrapping subtraction here simplifies the bounds check later
                Some((
                    (i as u32 % image.width()).wrapping_sub(PATTERN_OFFSET.0),
                    (i as u32 / image.width()).wrapping_sub(PATTERN_OFFSET.1),
                ))
            } else {
                None
            }
        })
        .collect::<Vec<(u32, u32)>>();

    use SearchError::*;
    match matches.len() {
        0 => Err(NotFound),
        1 => {
            let (x, y) = matches[0];
            if x > 625 || y > 80 {
                Err(OutOfBounds)
            } else {
                Ok(matches[0])
            }
        }
        n => Err(MultipleResults(n)),
    }
}

This code works by iterating over the image as a sequence of bytes. array_windows::<PATTERN_LEN> gives me an iterator that looks at a sliding window of 12 bytes, and step_by(4) lets me only consider every 4th window, giving me alignment with pixel boundaries. I enumerate these windows so I can later compute the X/Y coordinates and filter the windows based on whether or not any given sequence of 12 bytes matches my pattern. The result is mapped into a simple list of (x, y) tuples.

As I was experimenting with patterns, it was not uncommon to find multiple matches, so I explicitly handle that case and consider it to be an error. Additionally, it is possible to locate the window offset but calculate that it must not be fully visible on screen, so I also handle OutOfBounds as another failure case.

Finally, I can rewrite main to create an instance of DungeonCrawler which should compute the global offset from the top left corner of my monitor to the top left corner of the DnD window

fn main() {
    let dc = match DungeonCrawler::new() {
        Ok(d) => d,
        Err(e) => {
            eprintln!("Error: {:#?}", e);
            std::process::exit(1)
        }
    };

    println!("offset: {:?}", dc.offset);
}

Sure enough, running this produces the following output:

offset: (284, 126)

Which we can manually verify in the following image. I’ve painted a single red pixel in the top left corner of the DnD window with some helpful arrows because a single pixel may be hard to spot. It actually sits a bit outside of the actual window because the top left corner is beveled. I used GIMP to check the pixel coordinate, but feel free to count it yourself!

The window 'anchor' point.

From this point on, I can use this offset to normalize my coordinates to the DnD window. I don’t need to care about where the game client is located on my monitor or where the DnD window is located within the game client.

2 Fast 2 Furious

With the DnD window offset calculated, I’ll add a helper function click() that will add the offset to any coordinates I pass in so I can more easily plan out where clicks are sent. I’ve defined a constant CLICK_DELAY which I can use to allow a bit of time to pass between each input. Sending too much input to a game often results in events being dropped or the game stuttering as it tries to process everything so quickly. To avoid dropping events, I’ll test a few values for this delay and try to find something that is as fast as possible without dropping any events. Here’s click():

// time (ms) to wait after a click before sending the next one
const CLICK_DELAY: u64 = 15;

impl DungeonCrawler {
    // constructor hidden for brevity
    fn new() -> Result<Self, InitError> { ... }
    
    fn click(&mut self, x: u32, y: u32) -> InputResult<()> {
        let cx = (x + self.offset.0) as i32;
        let cy = (y + self.offset.1) as i32;
        self.enigo.move_mouse(cx, cy, Coordinate::Abs)?;
        thread::sleep(Duration::from_millis(CLICK_DELAY / 2));
        self.enigo.button(Button::Left, Direction::Click)?;
        thread::sleep(Duration::from_millis(CLICK_DELAY / 2));
        Ok(())
    }
}

You may notice I am using CLICK_DELAY to sleep twice per click. It seems like sending a click instantaneously after moving the mouse could be prone to failure, so I’ll pre-emptively avoid any funny business there. Somewhat arbitrarily, I’ve chosen to use a delay of 15ms, which with 64 squares comes out pretty close to exactly 1 second for a single pass.

To put this function through its paces, I’ll write a simple loop that tries to place a wall in every cell of the board, followed by a second pass to clear them. I’ll repeat this entire process a couple times and manually watch to see if any clicks are missed.

// added to main()

    // repeat 8 times
    for _ in 0..8 {
        for i in 0..8 {
            for j in 0..8 {
                let cx = i * 33 + 66;
                let cy = j * 33 + 191;
                dc.click(cx, cy).unwrap();
            }
        }
    }

And now for the results. Drumroll please!

I guess you didn't drumroll hard enough.

Unfortunately, we can see after each pass of clicks there are several walls/holes that are missed. Luckily, backing it off to 20ms per click makes it much more reliable. I still see a click dropped occasionally, even with higher delays like 50-100ms, but I will ignore that for now until I start solving puzzles at scale. In a real puzzle solution, usually less than half the cells will be walls and need clicking, so I should still be able to input solutions in under a second.

This brings us to the end of the post. I’ll continue next time with some text parsing and sprite detection. In the meantime, please enjoy this video of the autoclicker working successfully iterating over randomly ordered cells.

Speedclicking, properly.