Coordinate systems: scrolling

Apr–May 2019

In this article I'll explain how to implement map scrolling. Concepts covered: world coordinates, screen coordinates, camera position, coordinate transforms.

(insert demo here)

Coordinates#

Let's start with a simple world and build up to the above demo. Here's a 600x160 world on a 600x160 screen:

This world fits on the screen. Coordinates are straightforward. If you want to draw something that's at position (350, 120) in the world, you draw it at position (350, 120) on the screen.

But what if the entire world doesn't fit on the screen? We will see only part of the world:

Which part? Let's draw the screen from x = to . Try adjusting the position. You can see how the visible part of the map changes. This is what's visible on screen:

x = to

While playing with this you'll notice that moving the screen to the right causes the screen contents to move left. This may seem weird at first. The same effect shows up when you move the scroll bar in a document down and the page contents move up.

The world coordinates tell us an object's position in the world, and the screen coordinates tell us where that object will be drawn on screen.

Scrolling#

We convert a position in the world x= to a position on screen x= by subtracting an offset {{leftX}}:

This is called a transform. We can think of it like this:

start with world =x: subtract offsetx:

↓result: screen =x:

While playing with the subtracted value, you'll notice that subtracting more causes the screen to go to the right. This may seem counterintuitive at first but remember from the previous section that moving the screen to the right causes the screen contents to move left, and the transform is affecting the screen contents, not the screen itself.

In code, it looks like

screen_x = world_x - offset_x; # {{playerScreenX}} = {{playerWorldX}} - {{leftX}}

Transforms can also be run in reverse. This is how we can convert a mouse position in screen coordinates back into a world position:

result: world =x: add camera x: ↑{{playerScreenX - screenCenterX}}subtract screen center x: ↑start with screen =x:

Converting mouse positions to game world coordinates is a common question on stackoverflow, especially for isometric views. Thinking in terms of transforms allows us to solve this problem.

Cameras#

Now that we know how to scroll the map, let's use it in a game setting by keeping the player sprite in the center of the screen.

player_x =

How do we implement this? Let's call the center of the screen screen_center_x, and use it to calculate offset_x. Then we can use offset_x to calculate the screen position:

# screen_center_x is screen_width / 2
offset_x = player_x - screen_center_x; # {{leftX}} = {{cameraX}} - {{screenWidth/2}}
screen_x = world_x - offset_x; # {{playerScreenX}} = {{playerWorldX}} - {{leftX}}

Putting these two lines together, we can rearrange the code as:

screen_x = world_x - (player_x - screen_center_x);

It turns out to be more useful to express it this way:

screen_x = (world_x - player_x) + screen_center_x;

I find it easier to reason about a camera pointing at the center of the screen.

Camera position: x =

How do we implement this?

screen_x = (world_x - camera_x) + screen_center_x;

That's the same equation from earlier, but with camera_x instead of player_x. The camera position is in world coordinates. It's a position like any other position in the world!

start with world =x: subtract camera x: ↓add screen center x: ↓result: screen =x:

# screen_center_x is screen_width / 2
offset_x = camera_x - screen_center_x;
screen_x = world_x - offset_x;

A different way to express this is to use another coordinate system, called the view. It expresses what the camera can see. First we convert from world coordinates to view coordinates:

start with: world =x: subtract camera x: ↓viewx: {{ playerWorldX - cameraX }}add screen center x: {{screenCenterX}}↓result: screen =x:

view_x = world_x - camera_x;

Then we convert from view coordinates (center at 0) to screen coordinates (left at 0).

screen_x = view_x + screen_center_x;

I do it this way because I make fewer errors when I break things down into simpler steps. Compare:

# Do everything in one step
screen_x = world_x - camera_x + screen_center_x;

# Two separate steps
view_x = world_x - camera_x;
screen_x = view_x + screen_center_x;

It's the same calculation but I find the two step version easier to write, think about, debug, and generalize to more effects (screen shake, zoom, etc.).

Let's try some examples of using the camera. Suppose we want to center the player on the screen. How would you do this? Thinking in terms of the camera, we point the camera at the player:

camera_x = player_x;

player_x =

That's it! "Point the camera at the player" turns into camera_x = player_x. This is where having the extra step of having the camera in the center of the screen instead of the left side of the screen pays off.

You may have noticed that at the left and right sides of the map, the player stays in the center, but there's a blank space past the map. Let's fix this by restricting the camera:

camera_x = player_x;
camera_x = clamp(camera_x, 300, 1300);

player_x =

How about a camera that keeps the player within of the center? Try moving the player left and right to see this behavior:

camera_x = clamp(camera_x, player_x - {{spaceAhead}}, player_x + {{spaceAhead}});
camera_x = clamp(camera_x, 300, 1300);

player_x =

How about a camera that has extra pixels in the direction the player is walking, but adds it gradually, pixel{{stepSize===1?'':'s'}} per step? Try moving the player left and right to see this behavior:

direction = player_x > previous_x ? 1 : -1;
target_x = player_x + {{spaceAhead}} * direction;
camera_x = clamp(target_x, camera_x - {{stepSize}}, camera_x + {{stepSize}});
camera_x = clamp(camera_x, 300, 1300);

player_x =

There are lots of things that are easier to implement once we start thinking about camera positions instead of offsets.

Transforms#

TODO: Using <input> boxes for the numbers is nice for accessibility but makes the experience worse – in particular, I can't include units or other annotations :-(

To convert world coordinates to screen coordinates, we use a transform. In this example, we're scrolling horizontally, so the y position doesn't change, and I will omit y.

It's just a subtraction! Why introduce new terminology like "transform"? It's because we can reuse the same ideas across many different types of transforms. We're programmers. We like to build reusable abstractions.

Here's a useful idea we can reuse: (most) transforms can be inverted. For example, to figure out what object a mouse click belongs to, we can turn screen coordinates (mouse click) back into world coordinates (objects in the world):

screen_x =

add ↓world_x =

world_x = screen_x + offset_x; # {{playerWorldX}} = {{playerScreenX}} + {{leftX}}

{ NEED A DEMO HERE }

When we later look at other types of effects (zoom, screen shake, etc.), we'll be able to invert their transforms too. Math is full of these types of reusable abstractions. In this case, transforms are a type of function, and functions can have inverses. Transforms can also be represented by matrices, and matrices can have inverses too.

Each type of transform has a name. Adding and subtracting is called translate. Let's express the above in terms of translate:

{x: , y: _} → translate({{-leftX}}, _) → {x: , y: _}

 screen_x = world_x + (-offset_x); # {{playerScreenX}} = {{playerWorldX}} + {{-leftX}}

I'm only handling x here and will show y later.

Operations#

{ how about cameras as an example of chaining? }

The conversion from world to screen is an example of a translate transform. It performs addition or subtraction. A scale transform performs multiplication or division. Scale is how we turn grid coordinates into pixel coordinates. Let's go back to that first example before scrolling:

It's made up of tiles with grid coordinates:

TODO: label the tiles, make this interactive

Suppose we want to draw the tile at (5, 3). Multiply by the tile size to get the pixel position:

world_x = grid_col * size;
world_y = grid_row * size;

TODO: show both grid and world coordinates

Zoom - not sure about this

By thinking in terms of the camera instead of a scroll offset, we can make other effects too, such as zoom :

screen_x = (world_x - camera_x) * zoom + screen_center_x; # {{playerScreenX}} = ({{playerWorldX}} - {{cameraX}}) * {{zoom}} + {{screenWidth / 2}}

Rotation - remove this section

rotate = { MOVE TO A DIFFERENT PAGE }

Chaining transforms

Transforms can be combined in sequence. Let's combine the grid transform with the scrolling transform:

screen_x = grid_col * size - camera_x + screen_center_x
screen_y = grid_row * size - camera_y + screen_height_y

There are several steps here that have to be performed in order. I find it more useful to think of a chain of steps:

{visualization of chain grid → scale(16) → translate(-camera) → translate(screen_center) → screen}

Each step can be implemented and tested separately.

world_x  = grid_col * size
world_y  = grid_row * size
view_x   = world_x  - camera_x
view_y   = world_y  - camera_y
screen_x = view_x   + screen_center_x
screen_y = view_y   + screen_center_y

For this three-transform chain it seems like it doesn't matter much either way. When adding more transforms it's nice to keep the steps simple and modular and reusable.

Invert transforms

So far we've transformed world or grid coordinates into screen coordinates. Sometimes we want to go the other way: which position in the world corresponds to a location on the screen? A common example is figuring out what the player clicked on. The click is in screen coordinates. We want to figure out which object in the world the click corresponds to.

Invert the chain. Go through it in reverse and undo each step.

It's not only the mouse. Also determine bounds of what's visible. Reinforce idea: reverse the chain.

Demo

Put bounds on camera.
Make camera lag behind player.
What else can we do now that we think of the camera as separate from player? Camera shake. Camera vertical bumpiness when walking over stone.
Demo (now that you understand everything).

6 Appendix#

Appendix: Chaining allows more operations to be mixed in. Zoom. Rotate. More translate. Skew. Demos without going into details?
Appendix: Concept - function composition.
Appendix: This all sets us up to introduce matrices. I don't plan to go into detail on this page though; that's a topic for another page!

Consider the other things on https://docs.google.com/document/d/1iNSQIyNpVGHeak6isbP6AHdHD50gs8MNXF1GCf08efg/pub^[1] and https://www.gamedeveloper.com/design/scroll-back-the-theory-and-practice-of-cameras-in-side-scrollers^[2]

Coordinates#

Scrolling#

Cameras#

Transforms#

Operations#

Zoom - not sure about this#

Rotation - remove this section

Chaining transforms#

Invert transforms#

Demo

6 Appendix#

Zoom - not sure about this

Chaining transforms

Invert transforms