|Server Peformance Improvements are Live|
April 13, 2018
I spent quite a bit of time this week on server performance.
The old database engine, the amazingly fast and compact KISSDB, was not designed for an ever-growing data set where the newest data are accessed more than the oldest.
As players continue exploring new areas of the map, the data from older areas becomes less relevant, but that is the data that is the fastest to access in KISSDB. In fact, we were constantly wading through that old data to get to the latest stuff, which essentially ended up at the end of the list in KISSDB's append-only data structure. It got slower and slower as the data got bigger and bigger.
This drop in performance is expected when a hash table fills up, and thus KISSDB documentation recommends a table that's "large enough" for the expected data.
But the expected data in this case is unbounded. We cannot pick an appropriate size, because the data will keep growing, and we don't want performance to degrade as that happens.
A stack-based hash table is much better suited for this usage pattern. The latest and most important stuff can remain at the top for fast access. So I wrote a new database engine from scratch on Monday and Tuesday. It helped a lot.
The stack-based implementation that I came up with (thanks Chard for all the thoughtful discussion along the way) is 7x faster on average and even uses a bit less disk space (6% less). But more interestingly, it's entirely disk based, using almost no RAM. 13,000x less RAM than KISSDB on a test data set, in fact. KISSDB holds part of the data structure in RAM for performance, and that RAM usage grows as the data grows, but the stack is so much faster for accessing recent data that it doesn't matter---we can do it all via disk accesses.
The stack database actually has a flat RAM profile regardless of how big the data grows, and CPU usage on recently-used data is flat as well, regardless of how big the entire data set (including old, less-used data) gets.
The impact on server CPU usage is quite remarkable, as can be seen in this before-and-after graph (with the same 40 players on server1 the whole time). The new database went live at the 10:00 mark:
I also did some live profiling with Valgrind and found a few more hotspots that could benefit from RAM-based caching of procedurally-generated map data. And since the database now uses almost no RAM, we have RAM to spare for this kind of caching.
Where the server RAM usage used to grow to 300 MiB or more as the map data grew, it now sits steady at only 17 MB. Yes, that's 17 MiB of RAM total for hosting 40 active players.
What does this mean? First of all, it means the servers are finally lag-free, assuming that you're not experiencing true network lag.
Second, it means we can finally have more players on each server. I'll be upping the number gradually over time and keeping an eye on lag and performance. I expect we can easily get at least 80 on each server, and maybe quite a few more than that.