One Hour One Life Forums

a multiplayer game of parenting and civilization building

You are not logged in.

#1 Re: Main Forum » map chunk » 2018-09-26 18:18:54

FounderOne wrote:

Jason posted once a picture from a "satellite" view of the map. Could you generate such a pic with the informations stored in this file?

Yep.  Of course only areas you've visited.

FounderOne wrote:

I looked fast gzip up and tried to decode the file, but I guess it's just some text parts that are gziped?

Yes.  Replay format is undocumented (AFAIK), but pretty simple to parse - it's dump of server messages (which does have documentation), mouse movements, key presses, etc.  It's all ascii text, but some larger messages are gzipped.

#2 Re: Main Forum » map chunk » 2018-09-26 14:22:23

Map chunks and map updates include item info as well - you can see full data in replay files, not stdout.  Map chunks are gzip compressed though, so just text editor won't be enough to view it.

#3 Re: News » Update: New low-latency database engine is live » 2018-07-29 09:21:19

Chimrod wrote:
James wrote:

A non-existent cell can never be at the top of the stack, which means that we need to walk to the bottom of the stack to find out for sure that it doesn't exist.

This kind of problem can be resolved by a tree. if you want to keep recently acceded element on top of the tree you can take a look to Splay Tree, which are designed for that.

It's wasn't about "recently acceded element", but about "never added element", which is O(log n) in Splay Trees.  We ended up with linear hash, which is O(1) for both.  And furthermore Splay Trees are not even O(log n), but O(log n) amortized - they can degenerate into O(N), which is really bad for real-time apps.

#4 Re: Main Forum » Welp, found the actual source of lag » 2018-07-27 02:10:58

I find one analogy very useful and helpful in deciding how to optimize code.  Change times that processor see to human scale.  It goes like this:  Getting data from register - just recalling something quicky - 1 second.  Getting data from L1 cache - finding and reading some number on screen - few seconds.  Getting data from L2 cache - reading few sentences - 15 seconds.  Getting data from main memory - going to other room, finding book and correct chapter, reading a page or two, going back - 5 minutes.  Getting data from SSD - jumping into a car, driving to nearby state, reading some document there and driving back home - 10 hours.  Getting data from spinning disk - flying to India, going on long trip to many different places, talking to many people in search of some old manuscript, finally finding it, reading it and flying home - 1 year.  Getting data from user in Europe across Internet - launching space probe to go to Jupiter, taking photo there and flying back - 20 years.

So far you figured out that sending a novel to users word at a time wouldn't work.  They'd get lost in plot.  So when they send request, you prepare a full page and send it.  But you still think that driving 10 hours to nearby state and back to get one word at a time works great. "It just takes couple of months per user, no big deal.  Look, I have this awesome system: for every word I bring, I print a page with all information required to know where it came from: chapter, page, sentence and word number in sentence.  I have lots and lots of stacks of papers like that.  And I also found out that very frequently I'm asking about a word far into sentence, but sentence is too short.  So I also print out a papers that let me know, that in chapter 7 on page 21 in sentence 15 there is no word number 17!  This system works amazingly good!"

I've been suggesting for some time, that if you're driving to the other state, don't bring just single word every day.  Bring xero of the whole page.  In one day you'll do serveral months worth of work.  But now I have even better idea.  Bring the whole fricking book home!  So when you need to read something, you just go to the other room.  -Fit the whole book in my home?  Impossible, look a all those stacks of the papers.  I constantly run out of space and need to throw some of them out.  -Well, if you don't keep single word per page, I'm sure it'll fit.  -But I have to amend this book every day!  What if tornado hits my home?  All work will be lost!  -Send every day a letter listing all changes you've made to the other state.  So if something goes wrong, the latest version of the book can be reconstructed.  -But this will generate infinite number of papers with changes.  I couldn't store that!  -Then from time to time let someone apply all the changes and bind new edition of the book.  Then you need to store only recent changes.

No, this proposition is not a joke.  Keep state in RAM.  Currently for every 21 bytes you write, you store only maybe 2 bytes of actual data.  That's like 90% overhead.  Keys give also some infromation, but they are highly correlated.  They hardly have more than additional bit or two of entropy.  And half of the cells are empty.  So your complete state will be smaller than the table with fingerprints you prepare, to know that "nothing is there".  Open append-only file, log every change which should persist across reboots.  On restart parse change log and rebuild the state.  Then write snapshot of state, so you don't need start always from beginning.  You can also do it incrementally.  Every 100k changes switch to new file.  Let some background process go through log entries and dump resulting state - a shorter list of changes.  No database schemas to keep.  No migrations.  -What if I run out of RAM?  -Pay 5 bucks, double the RAM, reboot.

#5 Re: Main Forum » Welp, found the actual source of lag » 2018-07-27 01:46:17

jasonrohrer wrote:

So.... the way that I'm doing this, to avoid touching the disk at all during the linear probing, is to keep a 16-bit hash fingerprint of each key that's in each slot in the table.  This fingerprint is derived from mumurhash2 with a different seed, and it XORs the 64-bit hash down into 16 bits.

16 bits are probably overkill here.  8 would do, <1% chance of false positives.  That's for single lookup, not whole chain.  But keeping chains short is necessary anyway.  With linear probing probability of extending chain grows linearly with its length.  So when load gets high you won't end up with many long chains, you'll get few huge ones.  That's why keeping load below 50% is essential.

jasonrohrer wrote:

As for why no bloom filter, it seemed a bit more complicated, and it wasn't clear how it could be resized as the number of elements grows.

Rebuild bigger one during splits and query both until old one can be dropped.

jasonrohrer wrote:

And it only checks for existence.  It doesn't help with linear probing, where we want a non-false-negative-test to see "is this key possibly here in this slot?"

In this case having fingerprint is better.  But Bloom DOES help with linear probing.  If you get "no" from Bloom you don't need to do any probing - the key is nowhere to be found.  Chains or not.

jasonrohrer wrote:

All that said, I'm now wondering if the cache locality of linear probing helps anymore.  I mean, there's RAM cache locality, but I think this whole thing is IO bound on the final seek and read (or write) that happens once linear probing finds a match.

Though maybe no other method would be any better, since it's all IO-bound anyway, and the NULL-results are fast enough now that they don't really even matter... I guess some other method might make them slightly faster?

Other methods like double hashing keep chains shorter.  But they are incompatible with linear hash - there is no way to extend table in small increments.

jasonrohrer wrote:

The only place where my implementation lags behind KISS is in repeated lookup of batches of values that DO exists in the database.  KISS does this in a flash because all the values are in the same place on the disk (the test is looking up the most recently inserted values, which should all be together at the end of the file).  My code is randomly jumping around in a hash table that's the full size of the file to find these values.

I can't think of an obvious way to improve this, other than by storing 64-bit file locations in RAM like KISS does, and then having the file records be written sequentially in order of creation instead of in hash-table order.

Write big chunks of data, all info about 16x16 block of tiles in one record.  You'll get hundreds of reads worth of data with just one read.

jasonrohrer wrote:

Another idea is to keep the hash table of 64-bit file pointers on disk in a separate file.  Random access would only happen in that file, where sequential access would be more likely to happen in the separate "data block" file where the records are stored in order of creation.

Then you need to access it somehow directly.  Otherwise you'll have random read + sequential one.  Hardly better than just one random read.

#6 Re: Main Forum » Welp, found the actual source of lag » 2018-07-26 21:04:44

jasonrohrer wrote:

It seems to me that when we need to split a bucket and rehash it with the new, 2x modulus, we also need to consider the effects of previous linear probing inserts around that bucket.  Elements that should be "in" this bucket can also be further to the right, due to linear probing.

Even the element in this bucket might not really belong to this bucket.  It may belong further to the left, but ended up here due to linear probing.

Correct.

jasonrohrer wrote:

We also don't want to move elements out of this bucket and leave "holes" that will trip up future linear probing searches.  Linear probing assumes no removal.

We cannot do this blindly, but sometimes split will create hole that is legit.

jasonrohrer wrote:

So, it seems that all elements in the contiguous element segment to the left and right of this bucket (up to the first empty cell to the left or right) need to be rehashed.  They don't all need to be rehashed using the new 2x mod.  They need to be rehashed according to the latest placement rule based on the new split point.

You don't need to bother yourself with elements to the left - you've already handled them in previous splits.  And yes, you need to handle the whole island to the right, possibly moving some elements back.  They cannot be moved into arbitrary bucket, though, only as far left as their primary bucket (the one they would hash into without collisons/overflows).  This process may create holes or some elements may jump over the ones that cannot move.

IMO the best way to handle it is to split all buckets in an island in one go.  You need to analyze them anyway, figure out where they should go, write ones that move, create some holes.  And on next split you will be analysing it all over again.  So better just split all the buckets up to and including trailing empty one in one go.

jasonrohrer wrote:

I've tried to come up with some way to rehash them one at a time, while leaving the rest in place.

You can with marking element as deleted, so linear probing will know to not stop there.  But it's completely counter productive - you'll end up with longer and longer chains.  So, you want to move elements back closer to their primary bucket and create holes that stop linear probing.

jasonrohrer wrote:

It also seems that, due to wrap-around issues with linear probing, the contiguous segment of cells starting with the first cell of the table always needs to be rehashed when the table is expanded.  Previous expansions could have moved an unbounded number of cells into the new bin at the end, and many of these could have wrapped around to the first bin via linear probing.  The latest expansion could leave an empty cell at the end of the table, creating a hole for future linear probes.

The best way to handle it is to avoid it altogether - never wrap around.  When collision/overflow happens in last bucket, just create new one following it.  It should be marked as special, because it's not normal split bucket yet (so you know where next split should happen even if you quit app and restart later).  On next split, you need to take overflown elements into consideration, possibly creating next overflow bucket(s).

#7 Re: Fixed Bugs » Anti spam plugin to stop spam on forum » 2018-07-26 06:57:04

jasonrohrer wrote:

Okay, along with the auto-generated, custom-coded math problems, I've also set it up so that new users cannot post links in their posts for the first 10 posts.

It doesn't work - account has only two posts and embedded links:
https://onehouronelife.com/forums/viewt … 728#p25728

Also email and website links in left column.

#8 Re: Main Forum » Welp, found the actual source of lag » 2018-07-26 05:12:25

jasonrohrer wrote:

Not really sure why sequential writes are faster than random-access writes, though I suppose similar caching principles apply in reverse.

Because you cannot write 21 bytes to disk.  You must write whole sector, usually 4k.  That turns write into read-modify-write.  If sector is not in cache, it needs first to be brought in.

Also, for sequential writes, clib will buffer some data first, so it'll take fewer syscalls to write data, which are quite costly.  May not need as many file seek syscalls as well.

#9 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 20:48:46

jasonrohrer wrote:

The big difference that I'm seeing between the no-RAM, 100% disk linear probing implementation and the KISS implementation is that KISS is way faster to return the truly NULL result (if that slot in the hash table is actually empty), because it keeps the hash table in RAM.

So with 50% load only half the time.

jasonrohrer wrote:

But oddly, if there is a non-NULL entry in the RAM hash table, it still has to go to disk to check the key for that entry.  Yet its still storing a uint64_t file pointer for each cell in RAM.

For all in chain, to be precise.

jasonrohrer wrote:

So it seems that a simple 1-byte existence map in RAM (or maybe even smaller, a bitmap) would suffice for this purpose, assuming that the file structure is isomorphic to your hash table.

That doesn't buy much - only half of the reads are avoided.  If that will be the main hotspot, make Bloom filter.  For 10bits/key it will allow you to skip 99% of null lookups.

Other option is to store information about what other data is available in some record you read anyway, e.g. with biome data for tile.

#10 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 19:07:41

Just few hints.  Instead of 80k single item buckets use eg. 4k 20 item buckets - that will even out random spikes.  And won't be slower, because you cannot get less than a block from disk anyway.

Prefetch 4-8 buckets on read.  If bucket is full, you'll have next already in RAM.  Unless you run massively parallel I/O, you won't see any slowdown.

#11 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 16:01:29

jasonrohrer wrote:

A chunk system would get a whole page of NULL results together.  BUT, it would also be storing all these known-to-be-empty cells, which I believe(d) would be space-prohibitive in the long run.

Real time compression like LZ4 will keep it small.

jasonrohrer wrote:

2.  Even densely-filed chunks of the map (villages) have an unbounded amount of information potential, due to containers. [...] An object with 30 slots that only has 2 items in it will only have two things stored in the database (along with the number 2, to tell how many contained items are there).

You can store it the pretty much the same way it's transmited to client "123,10:5:5"

jasonrohrer wrote:

3.  Early on, I settled on a database system with fixed size keys and data, and a binary format for those keys and data.  This simplifies a lot of things, and provides a nice abstraction on top of which you can build all sorts of things very easily.  An arbitrary-length blob storage system could be an alternative (where you key by x,y and then have some variable-length encoded representation of what's there).  But that's a lot more complicated to code, and probably more complicated to make fast as a database engine, and maybe less space efficient (depending on the encoding format of the blob).

The simplest way to create variable sized db out of fixed one is indirection.  Split it into two tables: "main" with small cells, and "blob" with big cells.  If data is small, write directly to main.  If it's too big, store in main IDs of objects from blob table.  If it's huge, make top level blobs hold IDs of all necessary blobs with data.

jasonrohrer wrote:

4.  Technical debt!  I'm just one guy, the existing server is complicated, and overhauling it is a very expensive proposition.  The existing storage system is at least functional and bug-free.  The current problem is too many hash table collisions.  Much easier to solve that problem than to rewrite the entire storage system from scratch.

IMO it's comparable.  The only layer that needs to know about chunks is cache above map db.  It can easily provide illusion to the rest of the code that things come and go as single items.  So no code changes above it are required.  Only if some preformance bottleneck is identified, you can make part of server code aware of chunks to speed it up.

#14 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 11:37:13

I've run few test and I think I know what happend.  The modulo with 32 bits hash gave tiny bias (<1%), but the main thing is: if you put 15M items randomly into 30M bins, you get distribution looking more or less like this:
Counter({0: 18196821, 1: 9096015, 2: 2275788, 3: 378792, 4: 47342, 5: 4811, 6: 409, 7: 20, 8: 2})
aka birthday paradox

So the funny thing is, djb2 got so good results precisely because it is so bad smile  In that particular case it worked more like counter than hash.

I wouldn't feel safe with keeping djb2.  It turned out to be in our favor here, but may quickly degenerate into massive number of collisions as well.

The solution is simple, but not trivial.  Use smaller number of larger buckets to flatten it out.  If you use 30M 1 item buckets expected max chain length is 8.  If you use 15M 2 item buckets, it's 5.  With 7.5M of 4 items buckets it's 4.  With 1.5M 20 item buckets it's 2 (slight overflow - I've got 29 items in largest one).

Or maybe switch to StackDB with large hash table.  Having 8 items in longest chain shouldn't give much problem.

#15 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 09:20:08

jasonrohrer wrote:

I tried a few other hash functions, including Murmur3 and Jenkins "one-at-atime."  I'm seeing similar behavior with 5-6 collisions.

Does anything obvious jump out?  Am I using these incorrectly somehow?

This gives non-uniform distribution, the larger hash table size, the worse bias:
uint32_t hash = KISSDB_hash(key,db->key_size) % (uint32_t)db->hash_table_size;

Use power of 2 for quick fix.  You can mix in top hash bits as well.

#16 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 02:59:49

jasonrohrer wrote:

Wonder what's going on with Murmur2?

Without debugging I don't know.  I certainly would expect it to perform better.  Heck, djb2 is as weak as you can go.  It's just bit shift and add.  My first reaction to seeing something like that would be to change it to at least "on each round multipy by prime, take middle bits where most mixing occurs".

Without debugging it further, maybe just switch to SipHash and see what happens.  If it also behaves badly, you have some serious problem worth investigating.

#17 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 02:38:24

jasonrohrer wrote:

Well, I've bought time already by using STACKDB and taking server1 and 2 out of circulation, and turning the cap down to 40 on the other servers.

The process of getting all the servers to rebuild their DBs with a larger table size, with or without switching back to KISS, is non-trivial.  I'd like to get a better solution (low RAM, constant time, cheap collisions) before I go through the process of writing the conversion code.

Ok. So here is the deal.  I'll finish up my implementation of "Linear hashing with overflow-handling by linear probing".  I'll add all the tests to have good coverage.  From what I've seen so far I can promise "almost constant RAM usage *), O(1) lookup time including misses, cheap collisions".  It's "almost" constant only because in rare cases it needs some transient RAM to deal with overflow islands during split (it's several pages of data, less than a MB, so nothing large).  For plain inserts, lookups or misses it never needs additional memory, no matter if it has to deal with 1k of records, 1M of records, or 1B of records.

Then I'll explain all the details to you, how it exactly works, so if you encounter some problems in future, you can profile and fix it yourself.  I'll place all the code in public domain.

If it's used and works well, after some time I'll extract battle hardened version it into separate library on GitHub, so other people can use it as well.

I'll prefer to go that route, instead of making some library on GitHub first, because then I'll have "a solution looking for a problem to solve", which is never good.

#18 Re: Main Forum » name urself before naming baby eves » 2018-07-25 02:13:10

Beware that there may be a bug.  If you hold a baby, name yourself, then name baby, the baby sometimes still will be nameless.  I hit this bug at least two times already, maybe more.  I even sent replay to Jason showing it, but there is no fix yet.

So workaround for now is: name yourself, put down baby, pick up baby, name the baby.  This works fine.

#19 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 01:46:20

jasonrohrer wrote:

Does this mean Murmur2 has likely caused more collisions, just by random chance?

Looks like.  I'm quite surprised.  Before seeing that, I'd bet it'll perform better.  Are you using 32bit version?

jasonrohrer wrote:

The problem with KISS is that collisions are so expensive in terms of resource consumption.

That's why I think it's only stopgap solution, to buy some time for something better.

#20 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 01:29:00

Max bin depth = 3

I wonder how it got down from 4 to 3.  Is there some non seeded randomness involved?

#21 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 01:27:48

Thats pretty neat performance.  Just small final touch - start with 40% load to give it some breathing room for inserts.  Beyond 50% longer chains are starting to form, so it's better to avoid that - cost with KissDB is very high.

Now run it with full game, to check if there are other bottleneck elsewhere.

#22 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 00:31:45

Random lookup for 500 batchs of 3000 (1500000/1500000 hits)
Random look used 0 bytes, took 3.932000 sec

That's ~400k lookups/s - pretty good.

Last-inserted lookup for 500 batchs of 2401 (1200500/1200500 hits)
Last-inserted lookup used 0 bytes, took 3.122000 sec

Same.

Random lookup for non-existing 500 repeating batchs of 3000 (0/1500000 hits)
Random look/miss used 0 bytes, took 2.265000 sec

650k null lookups/s - when served from RAM & cache.

Random lookup for non-existing 1500000 non-repeating values (0/1500000 hits)
Random look/miss used 0 bytes, took 13.899000 sec

100k null lookups/s - when it needs to touch the disk half the time.  That's pretty good too.  SSDs can handle ~100k IOPS, so we're using 50% of capacity.  Pushing for more would require issuing requests in parallel.

Inserts for previously non-existing 500 batchs of 3000
Inserts after miss used 240001024 bytes, took 10.301000 sec

150k inserts/s.  Again, we're hitting disk.  It will be faster for small batches.  Here we see steady state when things need to be pushed on disk.

Anyway most of the time here is spent on adding 4th hash table.  We're going from 50% to 55% load.  I'd recommend to keep in under 50% all the time.  So after reducing memory usage just start with ~40% load.

#23 Re: Main Forum » Welp, found the actual source of lag » 2018-07-25 00:08:49

jasonrohrer wrote:

Okay, it finally finished after 52 minutes:

Clearly, using KISSDB in this capacity is not going to fly.

Oh, it will.  We're very close.  It used 4 buckets with ~1GB of memory total.  So it had to constantly swap in/out about 150MB of data.

2 things:

1. Change size of hash pointers from 64 to 32 bit - this will halve memory usage.  This line, and all relevant ones that deal with hash table and its I/O:

db->hash_table_size_bytes = sizeof(uint64_t) * (hash_table_size + 1); /* [hash_table_size] == next table */

2. Change hash algorithm, to murmur or siphash.  It may drop worst case chain from 4 buckets to 3 or 2.

#24 Re: Main Forum » Stonehenge 1500 BC Colorised » 2018-07-24 21:55:00

PunkyFickle wrote:

I find it funny, that's what had me make this picture, but I would fix it if I was Jason, as it is too blatant and influential as a glitch to be kept. But of course, I am not Jason.

I agree we should stop talking about it, just do it and pretend it never happens wink

#25 Re: Main Forum » Stonehenge 1500 BC Colorised » 2018-07-24 21:52:38

Cecil wrote:

It happens with graves with unattached headstones too. If you put an item on the grave and remove the headstone, the item teleports to an unoccupied tile. I think of it as a fun quirk of the game.

If we could choose direction where it goes, that would be great.  Put mutton meat in grave, pick headstone, put headstone, repeat.  I think we need a grave in the middle of grave pen wink

Board footer

Powered by FluxBB