Nov 16, 2023

Building a point-and-click adventure game using GPT, part II

The next step in my generative gaming project: graphics.

DALL-E 3

(Want to jump to the finish? Here’s the newest game.)

As of the time of this writing, OpenAI has just opened broad access to its powerful DALL-E 3 image generation service.

(And dropped the cost of GPT-4!)

I'm Andrew McGill, a product builder who turns delightful ideas into real things.

I used to make stuff at The Atlantic and POLITICO. Now I build things with people like you.

Let’s go

This means that instead of manually asking ChatGPT for a cool pixel-art image, and getting them one at a time (like the title image for this blog post)...

...I can write a script to roll out dozens of images at once.

A three-by-three-grid of different pixel art scenes. — DALL-E 3 via API.

Growing up, I loved Maniac Mansion, a point-and-click adventure from LucasArts. Honestly, it was little too difficult for an eight-year-old. But I spent hours exploring that creaky old mansion — a perfect example of how games can transport you to a new world.

So! The time is right to make a point-and-click adventure game of my own.

How I built it

Unlike my last game, this game can't use a live OpenAI connection. For one, generating images is expensive — each image costs around 8 cents (though I could economize a bit probably). DALL-E 3 is also pretty slow.

That means I had to build a completely static game — something closed-ended that could be baked out ahead of time.

It’s not truly the generative game I’m looking for, where everything is possible and every experience is unique. But… this is an experiment! And it’s still cool!

So how do I structure it?

The point-and-click games of my youth had a few common mechanics:

Rooms. Whether you’re exploring a dungeon or the depths of space, you’re still moving from one setting to another.
Actions. Some rooms should have stuff for you to do or take.
Obstacles and goals. This can’t be a frictionless romp — you need a thing you’re pursuing, and some hardship along the way.

Of course, there are a million other things to consider — non-player characters, events, etc.

But with this in mind, I wrote a script using GPT-4 and DALL-E to do the following:

1. Generate a game premise based off a prompt.

My first prompt was simply “You’re in a haunted house.” GPT-4 spun this up:

In the dead of night, you find yourself standing before the crumbling gates of the notorious Ravenwood Manor, a once elegant estate now whispered to be haunted by the spirits of its former inhabitants. Your goal is to uncover the secrets of the manor and put the troubled spirits to rest.

Spooooky!

2. Build rooms.

I specified the number of rooms and asked GPT-4 to generate and connect them. That resulted in a network like this:

A flowchart showing the network of rooms in the game. — Don't get lost!

3. Add actions to the rooms.

I asked GPT-4 to throw in some verbs where it made sense — “examine bookcase,” “pull lever.”

Some actions required the player to do other things first — you can’t pull the secret lever until you search the bookcase, for instance.

Sometimes these actions open rooms the player couldn’t access before, like a hidden chamber.

4. Build the images.

I wrote a basic prompt defining the pixel-art style I wanted. Otherwise, I let DALL-E interpret the room descriptions as it wanted.

So this description:

The unkempt Garden is filled with the ghostly silhouettes of dead trees and tangled underbrush. Moonlight eerily illuminates a path leading to a decrepit gazebo. The Kitchen door lies to the north, and a moss-covered path leads west to the Study.

Became this:

A Gazebo matching the above description. — Gothic!

5. Use GPT-4’s new vision capabilities to identify the clickable portions of the room.

I submitted the DALL-E image along with the list of exits and actions, and asked GPT-4 to identify where everything was in the picture so I could drop interaction points

6. Drop the game data and images into a front-end.

I coded a simple front-end that reads the game data, renders the images, places click targets, etc.

So.... how’d it turn out?

Well… it turned out okay.

I got games!

A grand staircase in a decrepit foyer. — Spoooooky.

And they work — you can click around and try stuff and eventually win. Here’s a few to try:

But … they’re not completely fun. They're monotonous and confusing — you wander around and eventually win. (And GPT-4 Vision is really bad at placing the click targets.)

Making it better

I gotta confess: I got a little obsessed with this project 😬

It goes against the spirit of shipping quickly, but I really wanted these games to be more fun.

I found that they could be… if you’re willing to be a lot more specific in the prompt and do a bit of editing afterward.

Introducing “A Trip to the Fantasy Marketplace.”

A bustling pixel-art marketplace. — Get yer magic amulets here!

Here’s the prompt:

You are a young child in a fantasy adventure (King Arthur setting) who has been sent to the market to get three items:

A bunch of eggs

Some rope

A gift for your grandmother.

You cannot go home until you get these three things.

You might encounter a man selling magic beans. If you buy them, you lose the game.

You can steal the eggs, but if you steal the rope, you're caught and lose the game.

You can also buy things if you find coins or other goods to trade.

When you have all the things, you can return to the start of the game and complete a "Leave the market" action.

This actually feels like a real game. There are goals. There are obstacles. There’s a bit of narrative.

Now, this isn’t how GPT-4 actually generated the game. There was no magic bean merchant, and I didn’t get an option to steal the rope.

But there was a lovable urchin and a shopkeeper willing to sell you a locket. With some minor editing (clearing up some weird conditions, linking two quests together) I was able to turn this into a pretty fun little game.

Final thoughts

Looking back, I went into this project with two secret rules for myself:

I couldn’t edit the game files after GPT-4 made it;
I could only offer a simple narrative prompt, like “You’re on an abandoned space station.”

These are both kind of dumb!

If artificial intelligence gets 75% of the way there, but needs a bit of editing afterwards (”Hm, why is there a secret passage leading from the bathroom to the bedroom?”) – well, just edit it.

And of course AI is going to have a hard time pulling together a cohesive plot from just a one-sentence setting description.

It’s just another reminder that AI performs best with guidance from humans. (For now, at least.)

Let me know what you think!

Best of the blog

Read it all

Whatever you’re looking to build,
I’d love to chat. Drop me a line.