(Want to jump to the finish? Here’s the newest game.)
As of the time of this writing, OpenAI has just opened broad access to its powerful DALL-E 3 image generation service.
(And dropped the cost of GPT-4!)
This means that instead of manually asking ChatGPT for a cool pixel-art image, and getting them one at a time (like the title image for this blog post)...
...I can write a script to roll out dozens of images at once.
Growing up, I loved Maniac Mansion, a point-and-click adventure from LucasArts. Honestly, it was little too difficult for an eight-year-old. But I spent hours exploring that creaky old mansion — a perfect example of how games can transport you to a new world.
So! The time is right to make a point-and-click adventure game of my own.
Unlike my last game, this game can't use a live OpenAI connection. For one, generating images is expensive — each image costs around 8 cents (though I could economize a bit probably). DALL-E 3 is also pretty slow.
That means I had to build a completely static game — something closed-ended that could be baked out ahead of time.
It’s not truly the generative game I’m looking for, where everything is possible and every experience is unique. But… this is an experiment! And it’s still cool!
So how do I structure it?
The point-and-click games of my youth had a few common mechanics:
Of course, there are a million other things to consider — non-player characters, events, etc.
But with this in mind, I wrote a script using GPT-4 and DALL-E to do the following:
My first prompt was simply “You’re in a haunted house.” GPT-4 spun this up:
In the dead of night, you find yourself standing before the crumbling gates of the notorious Ravenwood Manor, a once elegant estate now whispered to be haunted by the spirits of its former inhabitants. Your goal is to uncover the secrets of the manor and put the troubled spirits to rest.
I specified the number of rooms and asked GPT-4 to generate and connect them. That resulted in a network like this:
I asked GPT-4 to throw in some verbs where it made sense — “examine bookcase,” “pull lever.”
Some actions required the player to do other things first — you can’t pull the secret lever until you search the bookcase, for instance.
Sometimes these actions open rooms the player couldn’t access before, like a hidden chamber.
I wrote a basic prompt defining the pixel-art style I wanted. Otherwise, I let DALL-E interpret the room descriptions as it wanted.
So this description:
The unkempt Garden is filled with the ghostly silhouettes of dead trees and tangled underbrush. Moonlight eerily illuminates a path leading to a decrepit gazebo. The Kitchen door lies to the north, and a moss-covered path leads west to the Study.
I submitted the DALL-E image along with the list of exits and actions, and asked GPT-4 to identify where everything was in the picture so I could drop interaction points
I coded a simple front-end that reads the game data, renders the images, places click targets, etc.
Well… it turned out okay.
I got games!
And they work — you can click around and try stuff and eventually win. Here’s a few to try:
But … they’re not completely fun. They're monotonous and confusing — you wander around and eventually win. (And GPT-4 Vision is really bad at placing the click targets.)
I gotta confess: I got a little obsessed with this project 😬
It goes against the spirit of shipping quickly, but I really wanted these games to be more fun.
I found that they could be… if you’re willing to be a lot more specific in the prompt and do a bit of editing afterward.
Introducing “A Trip to the Fantasy Marketplace.”
Here’s the prompt:
You are a young child in a fantasy adventure (King Arthur setting) who has been sent to the market to get three items:
- A bunch of eggs
- Some rope
- A gift for your grandmother.
You cannot go home until you get these three things.
You might encounter a man selling magic beans. If you buy them, you lose the game.
You can steal the eggs, but if you steal the rope, you're caught and lose the game.
You can also buy things if you find coins or other goods to trade.
When you have all the things, you can return to the start of the game and complete a "Leave the market" action.
This actually feels like a real game. There are goals. There are obstacles. There’s a bit of narrative.
Now, this isn’t how GPT-4 actually generated the game. There was no magic bean merchant, and I didn’t get an option to steal the rope.
But there was a lovable urchin and a shopkeeper willing to sell you a locket. With some minor editing (clearing up some weird conditions, linking two quests together) I was able to turn this into a pretty fun little game.
Looking back, I went into this project with two secret rules for myself:
These are both kind of dumb!
If artificial intelligence gets 75% of the way there, but needs a bit of editing afterwards (”Hm, why is there a secret passage leading from the bathroom to the bedroom?”) – well, just edit it.
And of course AI is going to have a hard time pulling together a cohesive plot from just a one-sentence setting description.
It’s just another reminder that AI performs best with guidance from humans. (For now, at least.)