Rollback Networking in Dragon Saddle Melee

Part 3: Gotchas

This is the third part in a series of posts about the networking architecture for Dragon Saddle Melee.

Rollback Architecture Gotchas

Non-Deterministic Physics

Dragon Saddle Melee uses a port of Box2d under the hood to handle the physics of objects moving around and bouncing off of each other. The story behind that decision is for another article, but it did cause some issues with the networking architecture.

The biggest issue is that Box2d is non-deterministic -- two computers running the same physics simulation will not produce identical results. They will be very similar, but not identical. Over time, this will add up to a big discrepancy in object state, especially when objects begin colliding, and completely break the game. The exact source of the non-determinacy varies by physics engine (there are determinant physics engines out there, too) but usually involves a combo of floating point math not producing identical results on different machines and some level of randomness to the order that interactions between objects are solved.

If our physics engine was totally deterministic, we could optimize our network transmissions to exclude physics state. As long as we started from the same base state and always applied the same inputs, a deterministic physics engine would produce identical results on different machines. Since DSM doesn’t have that advantage, we have to also transmit the physics state.

However, transmitting state along with commands isn’t as big of a problem as it might seem. First, the object state in DSM compresses fairly well. Even some basic optimizations like packing fields and flags and some quantization makes a big difference. Second, we don’t have to send every object’s state in every bundle of states. Our simulation is close enough that we can trickle definitive states out a few objects at a time.

Finally, accepting that our physics engine isn’t deterministic takes the pressure off of the rest of the game logic to be deterministic. This is a big win in terms of reducing the complexity for the developer. If we were attempting a system where we never transmit state, our game logic must be 100%, absolutely no exceptions, deterministic. It’s easy to mess that up. For example, every time we iterate through a list of objects to process some logic that might affect other objects, it must be done in a consistent order.

Slow Physics Calculations

The other physics related issue DSM encountered relates to all the re-calculations involved in rolling back the state and playing it forward again each time definitive information arrives at a client from the server. At first it may seem like there is plenty of headroom in the physics calculations. DSM could be chugging along at 200 frames per second, but then under bad network conditions the client is running with a 250ms ping and a 6 frame command queue length (550ms total latency). That means for every definitive frame a client gets from the server, it has to recalculate about 11 frames of physics. That really adds up, especially when you’re already running the physics engine in an odd way that isn’t super speedy (a topic for another article).

To make matters worse, physics engines have a lot of optimizations to make detecting collisions and querying space for objects faster. Many of these optimizations rely on the objects’ state changing smoothly from one frame to the next. The physics engine is built around the concept that object state is only changed by the physics engine itself based on forces and collisions. Most engines store positions in special spatially partitioned data structures and maintain a list of objects that are in the process of colliding.

Unfortunately, rolling back breaks many of those optimizations. When the client receives an old state and needs to reset all object positions to that previous frame’s state, it has to force all object states to the old position and velocity external from the control of the physics engine. That invalidates all those fancy data structures and optimizations. In fact, it can even cause the engine to go slightly berzerk. DSM encountered one issue when forcing an object that was currently colliding with another object (at a later frame) to a different position would not invalidate the cached record of an ongoing collision. That caused the collision that occurred at a later frame to be mistakenly applied at the earlier frame currently being resimulated.

In short, physics engines generally aren’t designed to be rewound and played back over and over. DSM could have gotten a big win in terms of performance from a custom physics engine or a specially modified one, but that was beyond the scope of this solo-dev project.

Object Creation and Destruction

The fundamental architecture of each new frame state, Sn+1 , being purely a function of the previous frame state, Sn and the current set of commands, Cn+1 , seems simple, however there are many corner cases that can easily sneak bugs into the game.

Sn+1 = f( Sn , Cn+1 )

The biggest bug-generating corner cases in DSM related to object creation and destruction. For example, in DSM most projectiles are owned by another object, so that when the projectile hits someone, the game can attribute the kills to another player. But weird things happen when a projectile is fired but then the owning player disconnects. A naive implementation would just delete that player from the current frame state. Then, when rolling back we take all the objects in the current frame and reset them to the state at the previous frame -- except now we’re missing the disconnected player in the previous state. Even worse, we have a projectile owned by an object not even stored in the current frame’s state.

The solution is to keep within each frame’s datastructure a complete copy of the entire game’s state that isn’t reliant on any global shared datastructure. For example, in DSM each object is referenced by an ID and is of a certain type. Lots of the game logic involves checking an object’s type to determine how fast it moves, etc. It is tempting to keep a global table that just matches IDs to types. When an object gets created, add it to the global table. When it gets deleted, remove it from the table. But, when we rollback, we would also have to rollback the global table and undo all those creations and deletions. Plus, we don’t want to have to continually transmit a definitive version of this ID-to-type pairing table from the server. The solution is to store an independent copy of this table in every frame’s data structure.

Next Up

That wraps up the third post in this series. In the next (and final) post I’ll talk about implementing in Unity the graphical view of the simulation in the game client for Dragon Saddle Melee.

Chris

Developer of Dragon Saddle Melee

Co-Founder of Main Tank Software

February 28th, 2022