Before we begin this journey into the inner workings of networking in games, it's important to define some terms, and get some background on the Internet and how it works. This is of inestimable help later when trying to explain why certain things are done the way they are when coding for the net, plus, if you're anything like me, it's just plain interesting.
I'm going to go over some history of the Internet, with some simple examples of how it works, without getting too technical. This is not intended to be a programming reference document, more an enlightenment of what others are talking about when they talk about latency, pings, TCP/IP and so on. I will avoid those areas of the net that aren't directly related to games, since there is no reason to bore the pants off anyone more than is strictly necessary. This is not a 'how to' document, but more a FYI type of thing. There is nothing in this about making your home system better over the Internet, but more an explanation of why so many Internet games have networking troubles, and where they come from.
[size="5"] So let's begin
The Internet as everyone knows it came about from a much smaller network called ARPANET that the Pentagon created a) because America was on a science kick in the 60's and wanted to get a head start in burgeoning industries and b) because the Pentagon wanted to use and keep tabs on the expensive mainframes it was funding at places like MIT and UCLA without having to use multiple remote terminals.
To cut a long story short, the pentagon put out a contract to tender that would link multiple mainframes together, for use in real time. This would mean that one man at one terminal should be able to access multiple machines, share data and run programs on different machines.
From this contract, the concept of packet switching and routers was born. Now everyone bandies those words 'routers' and 'packet switching' around, but what do they actually mean? Well, first up, lets dissolve one common misconception. Many people use the phone system as an example when discussing the Internet. "It's like phone system" they say "You have an IP address that's like a telephone number". Well, not really. A better one would be to use the post office as an example. Imagine that when you send a file from one computer to another it's like a letter being sent. It first goes to your post office, where it is examined, and it's decided if it's intended for someone that has an address that post office serves. If it's not, then it's forwarded to another post office for examination again. Eventually it will arrive at a post office that says, "Oh, I know where the post office that this letter is intended for is located" and it's forwarded directly to the correct one, which then sends it on to the intended recipient. Long winded but you get the idea. Well, a router is effectively a post office. It sorts files that come in and decides what to do with them and where to send them. This is very different from the phone system where you end up with a direct link between you and who you are calling. With routers, there is no direct link. Incidentally, there is a common myth that states that the original APRANET and by default, the Internet, was designed to withstand a nuclear war, so that if one machine was taken out, then others would still be able to communicate, since there was no one route that everyone depended on. Having researched this, there is no actual proof that this was ever an original requirement. It would be able to withstand loosing a large portion of it's connecting machines, but that would appear to be more of a side benefit than an original requirement.
Anyway, back to packet switching. When you send a file over the Internet, you don't actually send the whole thing in one big chunk. It's broken up into small packets - like postcards if you want to continue the post office simile - and each one is transmitted one after the other. The beauty of this is that the routers can handle many, many of these little packets, without ever having to know what's in them (or indeed, the order they are transmitted in). So your packets get mixed in with someone else's, and the data stream gets maximum efficiency. All your machine has to do is create the little packets, number them, so they get re-assembled on the other end in the correct order, and send them out to the router. Of course they need an address too. That's where IP's come in. An IP is a unique address for your machine on the Internet. It's a 4-digit number, all of which are between 0 and 255. For instance 204.57.198.32. All those www.whatever.com are actually converted into IP addresses when packets are exchanged with another machine on the net. Sometimes these are specific and constant on one machine, more often than not they are dynamically allocated by the host system. Every time you log onto you service provider, they send you an IP address they have free from a range that's been allocated to them. For instance your ISP may have the range 204.198.32. 0 to 255, which gives them 256 possible IP addresses. 256 people can all be using the system at once, but no more than that. When you log in, the system looks to see what IP's are free, and sends you one. That way more than 256 people can be on the books for this Host, but only 256 can use it at once.
The alternative to this would be the phone system approach, which would mean creating dedicated routers that would reserve an entire line for you to send data to and from the other computer, but that would not get used most of the time, especially if you are doing stuff like typing in real time. You may think you are a fast typist, but in the time between a message going to and from your machine to another, the network could have transmitted War and Peace several times. A good simile that I heard used would be "like reserving the entire Interstate road system to drive a car from Washington DC to LA". You would never dream of doing that, instead you share it with other car drivers. Just like on the Internet. Maybe that dumb 'super highway' label thing has some merit after all
I'm sure you can see how the sharing lines with others, and breaking messages into small packets is the most efficient use of network time and data streams. The same system is in use today as was originally designed for the ARPANET way back when. Why? Cos it works real well
Where do ISPs come into this? Let's think of it this way. The routers are machines that sit attached to mainframes and stuff that we are treating as big post offices. An ISP (Internet Service Provider) is one step removed from that - like the postman himself. They are attached to a machine that often has a router (not always), but they also have a ton of modems attached to them. Your little PC at home uses its modem to call up the modem attached to the ISP's machine, which then accepts your packets and then forwards them, in bulk and mixed in with everyone else's, to the Internet with a capital I.
Cable modems, ISDN and DSL are pretty much the same thing, except that the modem-to-modem part is removed, and faster bandwidth communication devices are used instead. In fact DSL is basically just a faster modem with a better phone line anyway
One of the worst things about all this is that there is nothing that you, as the user, can do about this. The Internet was designed to be robust, in real time, but not instant. It's a shame, but Quake wasn't on their minds at design time.
UDP and TCP are higher layers that accept the packet of data from you, the coder or you, the game and decide what to do with it. The difference between UDP and TCP is that TCP guarantees delivery of the packets, in order, and UDP doesn't. UDP is effectively an access way to talk directly to IP, whereas TCP is an interface between you and IP. Complicated, but you should get the drift. It's like having a secretary between you and your mail. With UDP you would type up your letters yourself, put them in an envelope etc. With TCP you would just dictate the letter, give it to her and let her do all the work and follow up to be sure the letter arrived.
You can see TCP/IP in action right this second if you want. If you're in windows, open up an MS-DOS prompt and type PING 205.229.73.43 and press return. What you've just done is sent a message to the machine that runs this website and said "are you there?" And it's replied, "Yes, I am." The values you see there is the time taken for the packets of info to make the round trip - from you to them and back again. This is called Ping time, or Latency. Latency is one of those weird phrases that mean different things to different people. We here at Raven treat it as an average. Ping is the round trip for one packet; latency is the average round trips over the last 30 or so packets. As a rule of thumb, those hosts that you are trying to get to that have the least amount of routers to go through are the ones that will have the lowest ping. Usually these are the closest to you in physical location, but not always. If you want to see the route you have to go through to get to a particular host, type tracert 205.229.73.43 at the MS-DOS prompt. This returns all the routers your packet hit on the way to the host.
However, all this wonderful work-done-for-you comes at a cost. In order to be sure that packets that are sent via the Internet get there ok, TCP expects an Acknowledgement (an ACK in net parlance) to be sent back from the destination for every packet it sends. If it doesn't get an ACK within a certain time, then it holds up sending any new packets, re-sends the one that was lost, and will continue to do so until the destination responds. We've all seen this in action when you've gone to a web page, and half way through the download it stops for bit and then restarts. Chances are (assuming its not an ISP problem) a packet has been lost somewhere, and TCP is demanding it gets resent before any more come down the pipe.
Secondly, packet bloating. You have to be careful only to transmit that data that is required; otherwise you are just sending data for the sake of it. The larger the packet you give to the UDP system, the more you are asking the network to handle. This has a big impact in client/server setups when your packet gets to the server, since YOU are only transmitting one packet, but the SERVER is receiving many such packets. This also impacts modem bandwidth. If you are running a 28.8 and getting a pretty good sustained throughput, you need to be sure that you are not allowing the packets to exceed what it's possible to push through the modem. Too big = packets getting shunted into a buffer while the modem struggles with what it's got to send, and eventually the buffer overflows and you end up at a crawl, assuming the game hasn't already puked.
Third, packet frequency. Are you expecting packets to be sent faster than the communications infrastructure can really handle? You may be running at 60 frames per second, but you can bet that the Internet will have trouble sustaining that kind of packet rate.
Fourth is handling out of order packets (assuming you are using UDP) and dropped packets entirely. This is more involved and requires you to be cleverer than you might think. However, if you don't handle it right, you end up with missing events, missing entities, missing effects, and sometimes, completely FUBAR'd games.
Lastly, there is the aspect of online client cheating to consider. With CPL and other frag fests offering cash to winners, this is more important to consider than it used to be. So ok, we've seen the mess that is the Internet, and all the pitfalls, what can we do about them as game developers?
Rats, I knew someone was going to ask that. I thought I was done, check please. But noooo, more stuff to have to type up. Oh well.
Well, the first thing we should do is define the difference between client/server type games and peer to peer games.
Peer to peer involves two or more games talking to each other, each running the game itself and only exchanging input data. This reduces network traffic to a minimum, but brings several other problems to the table, like coping with lost traffic. This is far more important when more than one game is running, since contention occurs over who is correct and who is not. Variance in game play can get very sticky in these situations, as each game must stay synchronized with the others. Additionally, each game must wait for the input from the others before it can simulate the next frame - remember playing DOOM and it would lock up momentarily?
Client/Server involves one machine running the game and dictating to all the clients what the state of play is and what they should be displaying. Effectively the clients become pretty much dumb terminals transmitting the user input to the server, and letting it handle almost everything. They draw the scene the server tells them to display, and play the sounds the server tells them to play. Actually, it's not quite as bad as this, as the server does on occasion tend to offload functionality onto the client, but that's the basic idea.
What I'm going to discuss has more to do with Client/Server type setups than peer to peer since almost all online type games have some degree of Client/Server architecture to them - every game has to have one client that 'hosts' the game and is considered 'correct' in the case of world event contention between peers. (Unless they don't, in which case, you'd just get an "out of synch" error and quit.)
Ok, now on to our problem list - the TCP/IP selection is a no brainer - we don't have to discuss that anymore.
Packet bloating. This one can be tricky. Obviously a max packet size in the code is in order here to stop modem buffer overloading. We here at Raven are actually implementing a floating max packet size, for those people who are running over a local network, or that have large bandwidth available to them. When you hit a packet that breaks your buffer size, the secret is to split the data into two smaller chunks - only send in the first packet what is really necessary to be there that instant. Data like entity movements and so on. Stuff like chat messages can wait till the next packet, since no one is going to miss that being one packet late. Still, tough decisions need to be made as to what's important and what isn't, and sometimes this can make the game feel a little sluggish and un-responsive. This is where the floating packet size can be helpful, since it should remove that feeling from those with large bandwidth or running local games. Not the best solution, but one that's worth a try.
Other stuff that's worth thinking about includes tokenizing text messages. If your server is sending a lot of preset text messages, it makes more sense to have these pre-loaded on the client, and just send them a text string reference number rather than the whole string. This reduces out message traffic considerably. The same trick can be played with sending down filenames when the server asks the client to load something. For instance you can break down the file into path names, and then filenames. If you are asking for a bunch of sound files to be loaded, then only send the path once, and from then on, refer to the path as a token in the string. For instance we'll ask the client to load "sound/weapons/death.wav". Once the client receives this string, it will store away the path as a token, and the next time we want a sound, we send "%1pain.wav" and the client knows by the %1 to go away and use that path it got first time to load this sound. Little things, but they all help.
Something else worth considering is reducing the complexity of floating point data. Traditional floating point is 32 bits long - 4 bytes. The question is, do you really need that degree of accuracy? Reducing 32 bits to 16 of floating point is not out of the question; many games do this, but I'll bet you haven't noticed. While we are on that subject, being very sure of the size of the data you need to transmit is also a necessity here. If you are sending a value of between 0 and 170, do you really need a long word to do it? It would fit in a byte, and you've just saved 3 bytes. Obvious when you think about it, but you'd be surprised at how much it gets forgotten about when you are just getting the game working.
Only sending objects that have relevance to the scene you are displaying is helpful. Remember, the client is dumb, and doesn't need to know about what's out of the view or hearing threshold. Who cares? They aren't being rendered or heard, so what difference does it make? The server knows about them, and it's running the game, not you. This sucks of course if you are out in the open, or in a space sim, since everything is visible, but that's a game design decision that you make based on your technical abilities.
Further to that, offload special effects. Remember the client is pretty dumb, but it's smart enough to do clever effects for you. There is no reason for the server to be sending all the info on effects to the client, wasting both server time and network space. It's enough that the server says "an explosion happens here" and the client does the rest, superimposing that effect on the main display. We did this in Heretic II, which was the major reason it ran so well on the lower end machines.
Of course the biggest thing you can do to help packet size is to delta-compress info. Without giving away all of our (game developers' that is, not just Raven) technical secrets, the idea here is to only transmit data that has changed from one frame to the next. Simply keep a copy of what you sent last time, and on an object-by-object basis, compare what you want to send this frame with what you sent last, and only transmit that which has changed. Of course this doesn't work when you have a new object to transmit, since it all has to go across. But then if you figure out the percentage amount of this happening, it comes out to about between 5% and 10% of the time. That's some saving.
If you want to, you can implement some compression schemes on the resulting packet to make it even smaller, but in these cases the trade off of time to compress on the server and decompress on the client can be worse than having a slightly large packet.
Frequency - control over this is a must. Quake actually has a server that runs at 10 frames per second, transmitting data over the net at that rate. Actually, it does transmit faster than that when it's doing stuff like downloading client requested files, or responding to server info requests, but during game time, the client expects data at a 10fps rate. It runs at 10fps a) because of the amount of data it is processing for each client. And b) because this is a nice easy network packet rate to sustain.
There, that one was easy.
Out of order and missing packets. The trick here is to only treat one symptom and ignore the other. If you number your game packets (when we talk about packets here, I mean game server frame packets - IE the packet that contains a complete frame update from the server) as they go out to the client, the client can know if it gets an out of order packet. The simplest solution is to dump it, and treat it as a missed packet entirely. Doing this is a must if you are dealing with deltaed packets, since the delta values in the packet refer to the frame that came before.
If you keep a copy of the last packet you received from the server on the client, you can compare the latest one you got to it and see if an object has been dropped. At that point, you can either just dump the object immediately, or store it off into a list and check a few packets down to be sure it's still gone, and then dump it. The beauty here is that you never actually have to send a 'remove' function to the client from the server, since by omission from the game packet from the server, the object is gone. Even if you have some dropped packets, it doesn't matter since eventually you will get one and that object will still be missing in the latest packet, and thus it will get deleted. Cool eh?
Now we'll take a moment and talk about client prediction. And what a clever but nasty beast this is. In the cases where both the client misses a packet from the server, and the time between getting normal gaming packets, (think about it - the server may only be running at 10fps, but that doesn't mean you want the client side representation to), the client needs to be doing something to make it look like it IS still getting data. So we predict the world and events in it. Since we know what's going on with the client's player - after all, we are right there at the input point right? - we can predict what he is going to do. If he fires a weapon, we can show it on screen, since that's what we know he's going to do. We can also predict - to a lesser degree - what the other players are doing, at least to complete out any animations they may be in, if they are dropping still have gravity performed on them and so on. Now of course this only works for a time measured in seconds, but usually that's enough for the packet system to come back on line, and start re-receiving stuff from the server, at which time the client can correct it self for any events that it predicted wrong. At the best, it's totally on target, and you will never have known that you were missing data. At middle, the client is a bit out, so it starts correcting via a smoothing operation, that way no one 'snaps' really obviously to a new location. And at worst, you are dead via an attack you didn't even see, since it occurred while you were missing packets. However, there is no way around this situation so it's something that has to be lived with, and it's better than jerky motion and snapping updates.
However, what do you do if you miss a baseline packet? IE one that has a new object in it that wasn't there before? You've missed all the information that came down initially, but you will be getting updates from that point on. Well, to be honest - that's the trick isn't it? I've given away most of the tricks of the trade already, but some must remain. I'll give you a clue though; it is possible to fix a situation like this.
In every type of game there are some packet types that WILL require a guaranteed delivery. So be prepared to create some kind of structure to cope with this, because UDP doesn't. But be sure you don't use it too much or you will end up back with the same problems that TCP/IP has.
Online Cheating. There are a few ways to try to deal with this, but be warned, what's man-made is man-hackable. This is not so much a big deal at big frag fests since all the matches there are moderated, but it can have an impact on those that qualify for these fests. And of course, it just plain sucks to be playing on a server where someone is unbeatable because they are cheating. Cheating can occur in many ways, modifying the client to never display walls in the game, adding lights or white skins to other players, displaying a local map (if you want to get really ambitious), modifying your aim so it's always dead on other players, or simply firing a weapon at an opponent with deadly aim the moment they are in sight. All of these are hacks to the client end of the game, and when done properly, are pretty un-observable back at the server. There is some stuff you can do, checking the accuracy of each player and dumping those that go over a certain scale. You may loose some really good players that way, but it's unlikely that anyone can get over an 80% hit rate all the time. All the checks of the client in the world can really be gotten around since the result of the check has to be returned to the server at some point, and if it's intercepted there and replaced with what the server expects, the server is fooled. Using the result to decrypt the data that comes from the server is possible, but again, it's done on the client side and with enough patience and a good dis-assembler it can be gotten around. Client integrity is the key here, and keeping it the aim. The Quake 1 & 3 solution, that of a virtual machine, where instead of the client loading the game up the client 'builds' or 'compiles' the game it's going to run via instructions from the server, is a good start, since re-writing someone else's compiler is beyond all but the very best of hackers. God knows, writing it in the first place is a nightmare I wouldn't want to contemplate. But it is within the bounds of possibility. All the games developer can do is make it as difficult as he possibly can for the budding hacker and be content with that.
[size="5"] Last thoughts for Developers
Peter Lincroft who was involved with X Wings vs. TIE fighter had an article in Game Developer Magazine and did a talk at GDC last year about his experiences with net gaming, and I'd like to reiterate some of his ideas here for completeness' sake.
When testing a game, make sure you find a really horrible ISP to do some real Internet testing. Most games get built and tested to start with on the internal LAN at the developer's offices. This isn't really a fair test, since LANs rarely drop packets and have great PINGs. Find a bad ISP and do some REAL testing. This really works wonders for you later.
Emulate Internet conditions. Stick some code into your code base that emulates packet dropping. Have it settable so you know at what point your game is going to break down - is it 10% drop out or 30%? These are things you should know so you can automatically drop someone from a game if this occurs. Something to bear in mind here is that the Internet doesn't typically just drop one packet. Usually they occur in batches, so don't just dump one at a time, do them several at a time.
Remember your server is going to be sending out far more data than each client has to worry about. If client messages are around the 2k mark a second, and there are 10 clients, then the server is banging out 10x2k packets, which is 20k. Be sure that the communications infrastructure you are using is capable of supporting this.
Well there you have it, some ramblings and thoughts on Networking 101 for Games. I've probably made some mistakes, but the gist of it should be sound. Have fun out there, and be amazed it works at all