Network latency experiment results

Hi All,

This post is just informational for those that care or are curious about networking things, there’s nothing in it anyone needs to know.


Over the past few weeks we’ve been conducting some networking experiments to ensure we are making the right choice when it came networking. I figured the results of that experiment might be of interest to some of you, so here it is:

What we are testing

The online-go.com servers are located in the Google Cloud Platform (GCP), US-East 1 data center. We also use Cloudflare mitigate denial of service attacks as well as reduce the costs of bandwidth.

When it comes to the “real time” features of the site, such as sending and receiving moves in a game, we have the choice of three different “routes” to get those packets to the data center and back to the browser.

  • Via the GCP “standard” network, which is essentially completely publicly routed
  • Via the GCP “premium” network, in which packets from your browser get routed through dedicated google operated lines
  • Via the Cloudflare network, in which packets from your browser get routed through the Cloudflare network and are delivered directly to the data center via a peering agreement between Cloudflare and Google.

These options all have different prices, which differ based on total bandwidth used and destination country and whatnot, but roughly speaking GCP standard is $0.085 / GB, GCP premium is $0.12 / GB, and Cloudflare is $0.04 / GB.

Cost wise it’s an easy choice to use Cloudflare, and we have been for many years now. However, at some point I began to question whether using one of the other options might provide a better experience and be worth the extra cost, so what I did was to have everyone who connected to OGS establish 3 WebSocket connections, one along each route, and periodically (every 10 seconds) send out a “ping” along each of the routes at the same time, then report the latency observed.

Results

At a glance, all three networks were pretty similar. Below shows a map of the average latency over the past 7 days seen from each country. Green indicates latencies less than 150ms, yellow are 150-250ms, orange 250-500ms, and red 500+ms. Inspecting the minor variations between each of the graphs shows that if it was one color on one graph and another on another, they were generally within about 10ms or so of each other, and so it just happened to be that country was on the border. There were no “clear winners” that I saw, regardless of country.

I also inspected the 20th and 95th percentiles as well as the median of a few representative countries:

The conclusion I got from all of this was that it didn’t matter much. All of the times were within a few milliseconds of each other, but it does seem like both cloudflare and gcp premium did tend to both shave off a few milliseconds from that of operating over the standard network, particularly when going across the Pacific. Given that the price of going through Cloudflare is 1/3 the price of the GCP premium network, and 1/2 that of the standard network, Cloudflare remains the obvious choice for our WebSocket traffic (as well as all of our other traffic).

Anyways, I guess I was somewhat surprised that they were all so very close, although given the entities involved I suppose I shouldn’t have expected anything different. However I’m happy to have conducted the experiment so we know for sure using Cloudflare for all of our traffic is a good choice.


P.S. The other thing I attempted to measure was connection drops and reconnect events per route. However, it turns out that naively recording and reporting if a client is reconnecting vs connecting the first time is useless for this as every time a laptop is put to sleep, a mobile browser is put into the background, or in some cases a tab is left in the background, the client will be disconnected and cause it to reconnect when the device is woken up, or the browser or tab is brought to the foreground. So the only conclusion I can really draw is that if there was a difference in connection stability between any of the three routes, it was lost in the noise of reconnection events caused at the device level. Anecdotally though, monitoring my own device (which never sleeps), I did see the standard connection reset a few times, but overall all three were quite stable, so I have no reason to believe there’s a substantial difference between Cloudflare and GCP premium in terms of stability, but my hunch is that they do both offer a little bit more stability over the standard network.

30 Likes

Even though this have no material effect on us, it is delightful to read. Thank you for sharing and certainly don’t hesitate from sharing things you’re working on.

6 Likes

interesting read.
surprising there is so little activity in North America compared to the rest of the world, and that’s where the servers are located.
good to see people from all over the globe though.
so happy that the error submitting bug is getting resolved, that was super annoying.
latency makes or breaks the experience of playing, and ogs is pretty good experience compared to other servers.

Definitely a good read. It is always good to have a look into the kitchen.
Hope that from time to time you keep on giving us this little inside views from what actually makes OGS tick.

Where did you read this? The two graphics show latency per country, so if there are few dots, it’s only because there are few countries (not traffic)

2 Likes

Yep the majority of our players are either in the US or EU.

That chart might be a bit confusing because the size of the circles corresponds to the latency, not number of samples like one might intuit.

6 Likes

:rofl: that makes perfect sense now,

it looked like there was a single OGS user in USA somewhere in north Texas :laughing:

1 Like

Here’s one scaled by samples, which roughly correlates to relative number of active players. If you sum up all of the countries in Europe, it’s about the same as the US, it’s just kinda hard to see in this since it registers each country in the EU separately.

6 Likes

But why Sweden has 2 dots…? xD

At least no country has less than 0 ogs users, thats a positive thing ^___^

2 Likes

That’s Åland :slight_smile:

Ahh makes sense, indeed its better to have 2 dots for finland than for sweden :wink:

Tho i still dont believe that over 10% of ålanders are actively using ogs, guess its their weird server farms or something ^^

2 Likes

Hah no, that’s probably somewhere between 1-10 players. Those are ping sample counts (which were sent every 10 seconds why the player(s) were connected) over the course of some day a couple of weeks ago, not unique players.

4 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.