Welp, found the actual source of lag

jasonrohrer · 2018-07-23 19:19:30

I was being a little stupid with the way I was profiling last week, so I was essentially feeling the toe of the elephant. Some logging stuff had shown that MOVE message processing was a bit slow, so I optimized that a tone (and profiled only that part of the code). Afterwards, server1 was still slow, though.

After that, I noticed that server3, 4, etc. were all doing just fine, but that server1 and 2 were getting slow. These had the largest map databases.

Doing a full profile of server1 today, with and without a big map database, showed that the large databases are indeed the main bottleneck. DB_GET operations are 66x slower with the "full" server1 db as compared with an empty database running on the same hardware, for a test case where I'm walking around an unexplored part of the map solo.

The database has extensive caching in place, and it's also a stack, so "getting full" shouldn't slow it down in general.... except for the case of a NULL result. I.e., "a map value for (x,y) does not exist in the database."

The database is a hash table. (x,y) is run through a hash function that turns it into a bin number. Each bin has a stack of data items in it with the most-recently-used items on top of the stack. There are 1.6 x 10^19 (1 followed by 19 zeros) possible map cells, so a whole bunch of (x,y) values map to the same bin in the hash table---hash collision.

The old KISSDB was not a stack, but instead put newer items at the end of the list in the bin, meaning that fetching them had to walk through the full list for that bin each time. I rewrote it as STACKDB, which keeps a stack in each bin instead. This resulted in a dramatic speed-up, because recently-seen data was the fastest to access.

HOWEVER, none of this changes the speed of determining the NULL result. If no value for (x,y) exists in the database, you still have to walk through the whole stack in the bin to find that out. The stack doesn't solve this problem. There's one little mini optimization in there, where the most recent NULL result is remembered at the top of the stack, and there's some RAM caching in front of the database to speed up recently-accessed values in general (including NULL results), BUT, if we're exploring new parts of the map, none of this helps, because the cells have not been recently-accessed.

The hash tables currently have 80,000 bins, but on server1, it's containing something like 12,000,000 entries. That's roughly 152 entries per bin---a lot of data to wade through to find out that what you're looking for isn't there.

One solution is to make the hash table bigger. Making it 2x bigger should speed things up by a factor of 2. 20x bigger should speed things up by a factor of 20. The overhead for the map hash table on the disk is 24 bytes per bin. Currently 1.9 MiB. Taking it up to 38 MiB (20x bigger) might not be such a bad idea. Even 2x or 4x bigger would help a lot.

But in general, the problem will still remain: as the database gets bigger, access times for the NULL result will get slower. A 2x bigger database will make these operations 2x slower, etc.

Chard · 2018-07-23 19:32:15

If we're exploring a new part of the map could we perhaps load a nearby neighbourhood of cells and put them in the DB on the assumption that they'll likely be needed soon if not immediately? It sounds like that might reduce the number of NULL lookups.

Edit: And would this outweigh the cost of push things onto the top of other bins that maybe don't end up getting used?

jasonrohrer · 2018-07-23 19:58:19

The problem here is that the database is insert-only (no delete), so we don't necessarily want to store whole swaths of the map in the database. People walk around a lot. Currently, the database only stores sparse results for the cells they actually touch or change. In general, I think that a smaller database is better. I think the sparse, cherry-picked storage of info about cells is the right choice here.

But you're right that the side effect of this is that MOST cells aren't in the database, and we still need to ask the database a question about them (is anything man-made here?) before we go ahead and proc-gen them as-needed.

It's interesting that so many people talk about hash tables being O(1) average case, when in fact they are actually O(n) average case, if they are of fixed size. Here we have O( n / 80000 ), which is still O(n). Linear growth in execution time as n grows.

It seems like some adjunct data structure for indexing would help a lot here, maybe even just to deal with the NULL result, which is so common.

Or maybe a hash table is ill-suited to this application, and the other data structure could replace it entirely.

Chard · 2018-07-23 20:23:27

If we've explored N tiles then the boundary is proportional to sqrt(N) so we end up expanding the database size by 5 k sqrt(N) tiles or so. If exploration was perfectly circular then k = 2 sqrt(pi). That's the best case and we get some of that as villages are somewhat round. Worst case would be a long rectangle, there's probably some of that going on as people run in straight lines. In the worst case of all we'd about double the number of tiles in the database.

sc0rp · 2018-07-23 20:27:36

jasonrohrer wrote:

Doing a full profile of server1 today, with and without a big map database, showed that the large databases are indeed the main bottleneck. DB_GET operations are 66x slower with the "full" server1 db as compared with an empty database running on the same hardware, for a test case where I'm walking around an unexplored part of the map solo.

So things I've suggested blindly would have fixed the problem. This one solves it completely:

1. Get rid of long random access chains in db. Linear probing with doubling the size of table when load gets high should work fine. It will make it possible to run read request in parallel as a side effect (necessary for point 3).

And this one would reduce lookups hundredfold:

4. Expliot spatial locality of requests. Instead of having multiple records per tile (biome, items, subitems, etc.), clump up all information about tiles and items in 8x8 or 16x16 block in one record. For a some small write amplifictation, it would give you two-three orders of magnitude speed up on reading map data.

I think this one will be also necessary later on to get to sub ms responses:

3. Run separate thread that will prefetch map data for players. You send 32x30 tiles. So prefetch thread should pre-cache map in all directions, something like 96x90. When player does move in any direction, you should have tile and item data already in the RAM. Special case Eve spawn to precache all data before she's born.

jasonrohrer wrote:

But in general, the problem will still remain: as the database gets bigger, access times for the NULL result will get slower. A 2x bigger database will make these operations 2x slower, etc.

That's what you need:
Linear Hashing with Overflow-Handling by Linear Probing
http://www.cs.tau.ac.il/~shanir/advance … larson.pdf

Note that if you keep load <50% (and for real time app I would recommend to do so) significantly simpler algorithm would do. You can get rid of multiple passes/swipes. They are only needed if you want to save space and keep load at 80-85%. But that costs few lookups for each get and insert, so it's kind of bad tradeoff for real-time app. Disk space is cheap.

jasonrohrer wrote:

It's interesting that so many people talk about hash tables being O(1) average case, when in fact they are actually O(n) average case, if they are of fixed size. Here we have O( n / 80000 ), which is still O(n). Linear growth in execution time as n grows.

Hash tables ars O(1) whp. At this point you have bunch of linked lists, not hash table. Hash tables work fine with load up to 50%-85% (depending on how you handle collisions). You run it with load of 15200%, so it degenerated into linked list.

Last edited by sc0rp (2018-07-23 20:38:50)

Chard · 2018-07-23 20:45:17

sc0rp wrote:

3. Run separate thread that will prefetch map data for players. You send 32x30 tiles. So prefetch thread should pre-cache map in all directions, something like 96x90. When player does move in any direction, you should have tile and item data already in the RAM. Special case Eve spawn to precache all data before she's born.

The servers are currently single core I believe so this would need extra fixing.

sc0rp wrote:

Hash tables ars O(1) whp. At this point you have bunch of linked lists, not hash table. Hash tables work fine with load up to 50%-85% (depending on how you handle collisions). You run it with load of 15200%, so it degenerated into linked list.

O(1) for small N sounds like a bit of a cheat to me. Although we are only discussing compute complexity here not space complexity.

sc0rp · 2018-07-23 20:50:30

Chard wrote:

sc0rp wrote:
3. Run separate thread that will prefetch map data for players. You send 32x30 tiles. So prefetch thread should pre-cache map in all directions, something like 96x90. When player does move in any direction, you should have tile and item data already in the RAM. Special case Eve spawn to precache all data before she's born.
The servers are currently single core I believe so this would need extra fixing.

Only a mutex or two around player list and map cache. Thread is only needed to get around blocking I/O, so not CPU-bound.

Chard wrote:

sc0rp wrote:
Hash tables ars O(1) whp. At this point you have bunch of linked lists, not hash table. Hash tables work fine with load up to 50%-85% (depending on how you handle collisions). You run it with load of 15200%, so it degenerated into linked list.
O(1) for small N sounds like a bit of a cheat to me.

What do you mean by small N?

Chard wrote:

Although we are only discussing compute complexity here not space complexity.

Compute for proper hash table is O(1) whp, space is O(N) (in practice 2N is enough).

Last edited by sc0rp (2018-07-23 20:51:49)

sc0rp · 2018-07-23 20:56:00

BTW. Over the weekend I wrote most of the implementation for Linear Hashing (KissDB API compatible). If you're interested I may contribute it, once I have unittests with good coverage. I does keep O(1) gets (including misses) and inserts, no matter how much data you put there.

Chard · 2018-07-23 21:03:08

Chard wrote:

What do you mean by small N?

It was a joke. There was no N in O(1). Saying that the hash tables are O(1) if you keep them small is kinda like saying that they are O(1) if N is small. The logical extension of that solves a whole bunch of complexity problems of course. O(1) factorisation of 1-digit numbers? People don't get my sense of humour. Anyway as I followed up the key issue is one of space/compute tradeoffs and their relative complexities which I think is well understood and Jason mentioned as a work around his original post.

Kzapp · 2018-07-23 21:08:01

The typical answer to this problem for an in-memory hash table is to grow the table and re-hash all the elements if it gets too full. Could you do something similar during the on-startup sweep to remove stale map entries? If you automatically grew it to 2-4x the number of actual entries, that should keep it reasonably sparse until it's restarted again, without requiring huge amounts of overhead on quieter servers.

sc0rp · 2018-07-23 21:14:35

Chard wrote:

It was a joke. There was no N in O(1). Saying that the hash tables are O(1) if you keep them small is kinda like saying that they are O(1) if N is small.

If you keep LOAD small (as in 50%-85%). You cannot go over 100% anyway, without some additional data structure tackled on, as you need to store items somewhere. There is no limit to the size of hash table. And you can scale them easily up and down with inserts and deletes. There is also no additional space required for collisions if you use right method for handling it (e.g. double hashing, linear probing).

Chard wrote:

Anyway as I followed up the key issue is one of space/compute tradeoffs and their relative complexities which I think is well understood and Jason mentioned as a work around his original post.

There are no such tradeoffs, if you choose the right algorithm. Scale the hash table itself, don't append long collisions chains to it - this degrades into linked list. With right approach you have O(1) time whp and O(N) space no matter how big your dataset is.

Last edited by sc0rp (2018-07-23 21:15:05)

jasonrohrer · 2018-07-23 21:21:51

scOrp, none of the profiling has yet demonstrated that random access of spots on the disk is the main bottleneck. I.e., that cache misses are the main problem.

Instead, we are marching through 152 (on average) entries of a linked list to determine that a value is not present. That's a problem whether or not the data access is sequential or random.

That said, the basic linear hashing algorithm described here does sound like an elegant way to handle an ever-growing data set:

https://en.wikipedia.org/wiki/Linear_hashing

Chard · 2018-07-23 21:22:43

sc0rp wrote:

Chard wrote:
It was a joke. There was no N in O(1). Saying that the hash tables are O(1) if you keep them small is kinda like saying that they are O(1) if N is small.
If you keep LOAD small (as in 50%-85%). You cannot go over 100% anyway, without some additional data structure tackled on, as you need to store items somewhere. There is no limit to the size of hash table. And you can scale them easily up and down with inserts and deletes. There is also no additional space required for collisions if you use right method for handling it (e.g. double hashing, linear probing).
Chard wrote:
Anyway as I followed up the key issue is one of space/compute tradeoffs and their relative complexities which I think is well understood and Jason mentioned as a work around his original post.
There are no such tradeoffs, if you choose the right algorithm. Scale the hash table itself, don't append long collisions chains to it - this degrades into linked list. With right approach you have O(1) time whp and O(N) space no matter how big your dataset is.

It is clear to me that we aren't communicating well here. I'm not a computer scientist or an algorithm specialist. I'm a mathematician and I was making a joke about an ambiguity in the choice of language surrounding complexity. Clearly it didn't land, so I'm dropping this.

sc0rp · 2018-07-23 21:40:19

jasonrohrer wrote:

scOrp, none of the profiling has yet demonstrated that random access of spots on the disk is the main bottleneck. I.e., that cache misses are the main problem.

Did you clear disk cache between profiler runs? It may not be a problem now with warm cache, but it will be if db size grows larger than RAM. You'll hit very steep performance cliff when WDS gets larger that RAM.

And later on, if you would like to optimize responses below 1ms you will hit this bottleneck even with few I/O requests per user request. But that's the problem for future Jason

jasonrohrer wrote:

Instead, we are marching through 152 (on average) entries of a linked list to determine that a value is not present. That's a problem whether or not the data access is sequential or random.
That said, the basic linear hashing algorithm described here does sound like an elegant way to handle an ever-growing data set:
https://en.wikipedia.org/wiki/Linear_hashing

That's mostly what I implemented, with two important improvements from paper I quoted - linear probing instead of chains and then extending hash table on overflow in last bucket, instead of wrap around.

Last edited by sc0rp (2018-07-23 21:48:27)

jasonrohrer · 2018-07-23 22:08:21

Using the naming conventions from this article:

https://en.wikipedia.org/wiki/Linear_hashing

So, instead of linear probing or chains to handle collisions, why not always handle any collision by adding a bucket at the end and rehashing the bucket pointed to by S? Then try inserting again, if there's a collision, add a bucket, and so on?

Otherwise, you need some other metric to determine when you need to add a bucket (how long is the list in this bucket, or how many failed linear probings steps did I execute?)

Seems like a collision is a good enough signal that a new bucket is needed, and the new bucket operation is cheap enough that we don't need to do it sparingly (it's not like we're doubling the table at the first sign of a collision, we're just adding one new bucket).

sc0rp · 2018-07-23 22:09:51

Chard wrote:

It is clear to me that we aren't communicating well here. I'm not a computer scientist or an algorithm specialist.

I didn't know that.

Chard wrote:

I'm a mathematician and I was making a joke about an ambiguity in the choice of language surrounding complexity. Clearly it didn't land, so I'm dropping this.

It did land. But I thought you are programmer and your assumptions about this data structure was so off, that I felt more inclined to correct them instead of going on with the joke.

jasonrohrer · 2018-07-23 22:27:43

I think Chard's point is that it's kinda dumb to get excited about O(1) average case for hash tables, which makes them look better than O(log n) trees and such, because they can't really be compared to dynamic data structures. In this application, a tree would have been way better, because log of any imaginable number is way less than 152.

I mean, yeah, if you pre-allocate an array that is 2x as large as your entire data set, you get O(1) performance with a hash table... but... at the end of the day, you have an array. It's really O(n) under the hood.

Linear hashing may deal with some of these issues, but it's a pretty obscure algorithm, relatively speaking.

There's a prevailing "when in doubt, hash table" sentiment out there....

sc0rp · 2018-07-23 23:02:48

jasonrohrer wrote:

I think Chard's point is that it's kinda dumb to get excited about O(1) average case for hash tables, which makes them look better than O(log n) trees and such, because they can't really be compared to dynamic data structures. In this application, a tree would have been way better, because log of any imaginable number is way less than 152.

They are always faster and more space efficient that trees. You use trees only if you want in order iteration.

The wrong assumption you're making is that you should keep hash table size static. In practice you do it only if you know in advance how big you dataset will be. For every other use case, if load gets too high (say >50%), you double the size and rehash, if it gets too low (say <25%) you halve its size and rehash.

jasonrohrer wrote:

I mean, yeah, if you pre-allocate an array that is 2x as large as your entire data set, you get O(1) performance with a hash table... but... at the end of the day, you have an array. It's really O(n) under the hood.
Linear hashing may deal with some of these issues, but it's a pretty obscure algorithm, relatively speaking.

It deals with all problems there. And it pretty simple, if you know what it tries to achieve. Usually if load gets too high, you just allocate new table twice as big and rehash everything. This still gives you amortized O(1) so it's fine for most purposes. It's not fine when you have real-time application, because you then have this huge spike from time to time.

Linear hashing deamortizes it, by extending hash table by one bucket at a time instead of everything at once. It takes first unsplit bucket and moves half of elements to the new bucket at the end. Then for lookup you need to take this into account. You have three zones: first zone at the beginning holds already split buckets, middle one holds yet unsplit buckets and at the end you have section of split buckets the same size as first. So to get an item you need to figure out in which section your bucket is and use new hash for first and third section and old hash for middle one. That's all.

jasonrohrer wrote:

There's a prevailing "when in doubt, hash table" sentiment out there....

Because they are the best if you don't have some additional requirements like order of iteration. Heresey here is keeping their size static no matter the load and expecting good performance. If you go over recommended load (50-85% depending on collision handling) it degenerates from nearly perfect O(1) structure into totally useless O(N) one.

Last edited by sc0rp (2018-07-23 23:20:59)

Uncle Gus · 2018-07-23 23:46:18

Nerds.

jasonrohrer · 2018-07-24 01:02:59

But the standard "it's too full, rehash everything" approach to preventing a hash table from overloading is exactly the O(n) worst case operation that we're seeking to avoid. The balanced trees and other dynamic data structures will never have an insert operation that goes over O(log n).

Here's a worst case O(1) time data structure for you:

1. Tell me your largest key K.
2. I allocate an array A with K elements.
3. Insert(k,d) => A[k] = d
4. Lookup(k) => return A[k]

A hash table is like the randomized, shrunken cousin of this. By knowing all about your dataset ahead of time, sheer miracles are possible!

sc0rp · 2018-07-24 01:32:28

jasonrohrer wrote:

But the standard "it's too full, rehash everything" approach to preventing a hash table from overloading is exactly the O(n) worst case operation that we're seeking to avoid.

It's O(1) amortized cost. If you have 10 items and you need to double, you need to rehash all of them. But then you can insert 10 more elements. Then to double you need to rehash 20, but after that you can insert 20 more. Then to double you need to rehash 40, but after that you can insert 40 more. And so on. So amortized cost is O(1). Most applications doesn't care if it's amortized or not. It matters a lot for real-time apps though and here you need something like Linear Hashing to de-amortize it - spread the cost evenly over all inserts.

jasonrohrer wrote:

The balanced trees and other dynamic data structures will never have an insert operation that goes over O(log n).

Linear hash is dynamic data structure and it keeps all inserts O(1) whp. You can't beat it with tree, unless you need other operations that tree supports and hash does not. Tree needs both more lookups and more overhead space.

jasonrohrer wrote:

Here's a worst case O(1) time data structure for you:
1. Tell me your largest key K.
2. I allocate an array A with K elements.
3. Insert(k,d) => A[k] = d
4. Lookup(k) => return A[k]
A hash table is like the randomized, shrunken cousin of this. By knowing all about your dataset ahead of time, sheer miracles are possible!

If you know your keys in advance, there are metods to generate perfect hash - N elements will go uniquely into exactly N bins. Key size doesn't matter then at all. You'll get perfect O(1) lookup with smallest space possible. This array can be seen like final version of perfect hash when you want to keep all keys.

jasonrohrer · 2018-07-24 02:13:34

scOrp, hoping to hear your thoughts on this, from above:

So, instead of linear probing or chains to handle collisions, why not always handle any collision by adding a bucket at the end and rehashing the bucket pointed to by S? Then try inserting again, if there's a collision, add a bucket, and so on?
Otherwise, you need some other metric to determine when you need to add a bucket (how long is the list in this bucket, or how many failed linear probings steps did I execute?)
Seems like a collision is a good enough signal that a new bucket is needed, and the new bucket operation is cheap enough that we don't need to do it sparingly (it's not like we're doubling the table at the first sign of a collision, we're just adding one new bucket).

I really appreciate your insight here.

Also, on the offer of implementing this for me, I do really appreciate that as well, but I also feel like I really need to implement it myself to understand how it works on a deep level.

I used KISSDB as a black box for a long time before I hit its limitations. Rewriting it myself was really eye-opening. As a programmer, it's what I live for.

HOWEVER, I also did spend some time today looking around for simple, single C-file databases, just to see how they were doing things. Besides KISSDB, there's really nothing else out there, especially not something with a permissive license. Most of the other options are bloated, heavy-weight libraries.

So, if you finish your implementation, I'd highly recommend putting it on GitHub so that other people can use it.

It would be really interesting to see how it performs on various test data sets when compared to KISSDB and other similar embedded databases like Kyoto or LMDB.

And why not use LMDB? Looking at this header, my eyes glaze over:

http://www.openldap.org/devel/gitweb.cg … 34;hb=HEAD

YAHG · 2018-07-24 02:20:47

Uncle Gus wrote:

Nerds.

Amen

forestglade · 2018-07-24 06:25:24

Well that was light, night time reading.

wondible · 2018-07-24 11:33:57

This jogged a memory of some posts I've seen

"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set"."

https://en.wikipedia.org/wiki/Bloom_filter

One Hour One Life Forums

#1 2018-07-23 19:19:30

Welp, found the actual source of lag

#2 2018-07-23 19:32:15

Re: Welp, found the actual source of lag

#3 2018-07-23 19:58:19

Re: Welp, found the actual source of lag

#4 2018-07-23 20:23:27

Re: Welp, found the actual source of lag

#5 2018-07-23 20:27:36

Re: Welp, found the actual source of lag

#6 2018-07-23 20:45:17

Re: Welp, found the actual source of lag

#7 2018-07-23 20:50:30

Re: Welp, found the actual source of lag

#8 2018-07-23 20:56:00

Re: Welp, found the actual source of lag

#9 2018-07-23 21:03:08

Re: Welp, found the actual source of lag

#10 2018-07-23 21:08:01

Re: Welp, found the actual source of lag

#11 2018-07-23 21:14:35

Re: Welp, found the actual source of lag

#12 2018-07-23 21:21:51

Re: Welp, found the actual source of lag

#13 2018-07-23 21:22:43

Re: Welp, found the actual source of lag

#14 2018-07-23 21:40:19

Re: Welp, found the actual source of lag

#15 2018-07-23 22:08:21

Re: Welp, found the actual source of lag

#16 2018-07-23 22:09:51

Re: Welp, found the actual source of lag

#17 2018-07-23 22:27:43

Re: Welp, found the actual source of lag

#18 2018-07-23 23:02:48

Re: Welp, found the actual source of lag

#19 2018-07-23 23:46:18

Re: Welp, found the actual source of lag

#20 2018-07-24 01:02:59

Re: Welp, found the actual source of lag

#21 2018-07-24 01:32:28

Re: Welp, found the actual source of lag

#22 2018-07-24 02:13:34

Re: Welp, found the actual source of lag

#23 2018-07-24 02:20:47

Re: Welp, found the actual source of lag

#24 2018-07-24 06:25:24

Re: Welp, found the actual source of lag

#25 2018-07-24 11:33:57

Re: Welp, found the actual source of lag

Board footer