In this article I'll explain how to implement map scrolling. Concepts covered: world coordinates, screen coordinates, camera position, coordinate transforms.
Coordinates#
Let's start with a simple world and build up to the above demo. Here's a 600x160 world on a 600x160 screen:
This world fits on the screen. Coordinates are straightforward. If you want to draw something that's at position (350, 120) in the world, you draw it at position (350, 120) on the screen.
But what if the entire world doesn't fit on the screen? We will see only part of the world:
Which part? Let's draw the screen from x =
While playing with this you'll notice that moving the screen to the right causes the screen contents to move left. This may seem weird at first. The same effect shows up when you move the scroll bar in a document down and the page contents move up.
The world coordinates tell us an object's position in the world, and the screen coordinates tell us where that object will be drawn on screen.
Scrolling#
We convert a position in the world x=
This is called a transform. We can think of it like this:
While playing with the subtracted value, you'll notice that subtracting more causes the screen to go to the right. This may seem counterintuitive at first but remember from the previous section that moving the screen to the right causes the screen contents to move left, and the transform is affecting the screen contents, not the screen itself.
In code, it looks like
screen_x = world_x - offset_x; # {{playerScreenX}} = {{playerWorldX}} - {{leftX}}
Transforms can also be run in reverse. This is how we can convert a mouse position in screen coordinates back into a world position:
Converting mouse positions to game world coordinates is a common question on stackoverflow, especially for isometric views. Thinking in terms of transforms allows us to solve this problem.
Cameras#
Now that we know how to scroll the map, let's use it in a game setting by keeping the player sprite in the center of the screen.
How do we implement this? Let's call the center of the screen screen_center_x, and use it to calculate offset_x
. Then we can use offset_x
to calculate the screen position:
# screen_center_x is screen_width / 2 offset_x = player_x - screen_center_x; # {{leftX}} = {{cameraX}} - {{screenWidth/2}} screen_x = world_x - offset_x; # {{playerScreenX}} = {{playerWorldX}} - {{leftX}}
Putting these two lines together, we can rearrange the code as:
screen_x = world_x - (player_x - screen_center_x);
It turns out to be more useful to express it this way:
screen_x = (world_x - player_x) + screen_center_x;
I find it easier to reason about a camera pointing at the center of the screen.
How do we implement this?
screen_x = (world_x - camera_x) + screen_center_x;
That's the same equation from earlier, but with camera_x
instead of player_x
. The camera position is in world coordinates. It's a position like any other position in the world!
# screen_center_x is screen_width / 2 offset_x = camera_x - screen_center_x; screen_x = world_x - offset_x;
A different way to express this is to use another coordinate system, called the view. It expresses what the camera can see. First we convert from world coordinates to view coordinates:
view_x = world_x - camera_x;
Then we convert from view coordinates (center at 0) to screen coordinates (left at 0).
screen_x = view_x + screen_center_x;
I do it this way because I make fewer errors when I break things down into simpler steps. Compare:
# Do everything in one step screen_x = world_x - camera_x + screen_center_x; # Two separate steps view_x = world_x - camera_x; screen_x = view_x + screen_center_x;
It's the same calculation but I find the two step version easier to write, think about, debug, and generalize to more effects (screen shake, zoom, etc.).
Let's try some examples of using the camera. Suppose we want to center the player on the screen. How would you do this? Thinking in terms of the camera, we point the camera at the player:
camera_x = player_x;
That's it! "Point the camera at the player" turns into camera_x = player_x
. This is where having the extra step of having the camera in the center of the screen instead of the left side of the screen pays off.
You may have noticed that at the left and right sides of the map, the player stays in the center, but there's a blank space past the map. Let's fix this by restricting the camera:
camera_x = player_x; camera_x = clamp(camera_x, 300, 1300);
How about a camera that keeps the player within
camera_x = clamp(camera_x, player_x - {{spaceAhead}}, player_x + {{spaceAhead}}); camera_x = clamp(camera_x, 300, 1300);
How about a camera that has
direction = player_x > previous_x ? 1 : -1; target_x = player_x + {{spaceAhead}} * direction; camera_x = clamp(target_x, camera_x - {{stepSize}}, camera_x + {{stepSize}}); camera_x = clamp(camera_x, 300, 1300);
There are lots of things that are easier to implement once we start thinking about camera positions instead of offsets.
Transforms#
TODO: Using <input> boxes for the numbers is nice for accessibility but makes the experience worse – in particular, I can't include units or other annotations :-(
To convert world coordinates to screen coordinates, we use a transform. In this example, we're scrolling horizontally, so the y
position doesn't change, and I will omit y
.
It's just a subtraction! Why introduce new terminology like "transform"? It's because we can reuse the same ideas across many different types of transforms. We're programmers. We like to build reusable abstractions.
Here's a useful idea we can reuse: (most) transforms can be inverted. For example, to figure out what object a mouse click belongs to, we can turn screen coordinates (mouse click) back into world coordinates (objects in the world):
world_x = screen_x + offset_x; # {{playerWorldX}} = {{playerScreenX}} + {{leftX}}{ NEED A DEMO HERE }
When we later look at other types of effects (zoom, screen shake, etc.), we'll be able to invert their transforms too. Math is full of these types of reusable abstractions. In this case, transforms are a type of function, and functions can have inverses. Transforms can also be represented by matrices, and matrices can have inverses too.
Each type of transform has a name. Adding and subtracting is called translate. Let's express the above in terms of translate:
screen_x = world_x + (-offset_x); # {{playerScreenX}} = {{playerWorldX}} + {{-leftX}}
I'm only handling x
here and will show y
later.
Operations#
{ how about cameras as an example of chaining? }
The conversion from world to screen is an example of a translate transform. It performs addition or subtraction. A scale transform performs multiplication or division. Scale is how we turn grid coordinates into pixel coordinates. Let's go back to that first example before scrolling:
It's made up of tiles with grid coordinates:
Suppose we want to draw the tile at (5, 3). Multiply by the tile size to get the pixel position:
world_x = grid_col * size; world_y = grid_row * size;
TODO: show both grid and world coordinates
Zoom - not sure about this#
By thinking in terms of the camera instead of a scroll offset, we can make other effects too, such as zoom
screen_x = (world_x - camera_x) * zoom + screen_center_x; # {{playerScreenX}} = ({{playerWorldX}} - {{cameraX}}) * {{zoom}} + {{screenWidth / 2}}
Rotation - remove this section
rotate =
Chaining transforms#
Transforms can be combined in sequence. Let's combine the grid transform with the scrolling transform:
screen_x = grid_col * size - camera_x + screen_center_x screen_y = grid_row * size - camera_y + screen_height_y
There are several steps here that have to be performed in order. I find it more useful to think of a chain of steps:
{visualization of chain grid → scale(16) → translate(-camera) → translate(screen_center) → screen}
Each step can be implemented and tested separately.
world_x = grid_col * size world_y = grid_row * size view_x = world_x - camera_x view_y = world_y - camera_y screen_x = view_x + screen_center_x screen_y = view_y + screen_center_y
For this three-transform chain it seems like it doesn't matter much either way. When adding more transforms it's nice to keep the steps simple and modular and reusable.
Invert transforms#
So far we've transformed world or grid coordinates into screen coordinates. Sometimes we want to go the other way: which position in the world corresponds to a location on the screen? A common example is figuring out what the player clicked on. The click is in screen coordinates. We want to figure out which object in the world the click corresponds to.
Invert the chain. Go through it in reverse and undo each step.
It's not only the mouse. Also determine bounds of what's visible. Reinforce idea: reverse the chain.
Demo
- Put bounds on camera.
- Make camera lag behind player.
- What else can we do now that we think of the camera as separate from player? Camera shake. Camera vertical bumpiness when walking over stone.
- Demo (now that you understand everything).
6 Appendix#
- Appendix: Chaining allows more operations to be mixed in. Zoom. Rotate. More translate. Skew. Demos without going into details?
- Appendix: Concept - function composition.
- Appendix: This all sets us up to introduce matrices. I don't plan to go into detail on this page though; that's a topic for another page!
Consider the other things on https://docs.google.com/document/d/1iNSQIyNpVGHeak6isbP6AHdHD50gs8MNXF1GCf08efg/pub[1] and https://www.gamedeveloper.com/design/scroll-back-the-theory-and-practice-of-cameras-in-side-scrollers[2]