One Hour One Life Forums

a multiplayer game of parenting and civilization building

You are not logged in.

#1 2019-01-24 02:42:55

jasonrohrer
Administrator
Registered: 2017-02-13
Posts: 4,803

Results of profiling server1 with 166 active players

I profiled the server for 4 straight minutes just now, sampling the call stack 20 times per second, using this nifty new profiler that I just wrote:

https://github.com/jasonrohrer/wallClockProfiler

The result was 3840 stack samples (so the timing is a little off, and it's not hard-and-fast 20 times per second, but close).

Here's the raw profile data:

https://pastebin.com/Lus17sXB



In particular, this is the most telling part of the profile.  Specific lines in main that occupy a large portion of the execution time:

Partial stacks of depth [1] with more than one sample:

 32.214% ===================================== (1237 samples)
         1: main   (at server.cpp:7128)
            7128:|   readySock = sockPoll.wait( (int)( pollTimeout * 1000 ) );


 14.740% ===================================== (566 samples)
         1: main   (at server.cpp:12404)
            12404:|   recomputeHeatMap( players.getElement( r ) );


  6.979% ===================================== (268 samples)
         1: main   (at server.cpp:13297)
            13297:|   playerYD );


  4.844% ===================================== (186 samples)
         1: main   (at server.cpp:7832)
            7832:|   int curOverID = getMapObject( curPos.x, curPos.y );


  4.740% ===================================== (182 samples)
         1: main   (at server.cpp:8049)
            8049:|   readSocketFull( nextPlayer->sock, nextPlayer->sockBuffer );


  3.099% ===================================== (119 samples)
         1: main   (at server.cpp:14290)
            14290:|   false, false );


  3.047% ===================================== (117 samples)
         1: main   (at server.cpp:8066)
            8066:|   nextPlayer->id, message );


  2.552% ===================================== (98 samples)
         1: main   (at server.cpp:13743)
            13743:|   false, false );


  2.292% ===================================== (88 samples)
         1: main   (at server.cpp:13601)
            13601:|   false, false );


  1.823% ===================================== (70 samples)
         1: main   (at server.cpp:12551)
            12551:|   stepMap( &mapChanges, &mapChangesPos );


  1.745% ===================================== (67 samples)
         1: main   (at server.cpp:14229)
            14229:|   playerListString );


  1.432% ===================================== (55 samples)
         1: main   (at server.cpp:12557)
            12557:|   getNewCurseTokenHolders( &newCurseTokenEmails );


  1.016% ===================================== (39 samples)
         1: main   (at server.cpp:13843)
            13843:|   false, false );

Also, note that 32% of the time is spent waiting on epoll.  This is normal and fine (it just means that the server is idle 32% of the time).

Offline

#2 2019-01-24 07:05:13

lionon
Member
Registered: 2018-11-19
Posts: 532

Re: Results of profiling server1 with 166 active players

Okay this way to profile gets you good wall clock times, at least for a single-logical thread app.

However, without summing up, I find this hard to read. This way highlights single lines where much time is spend, rather than functions, or logical blocks.

I know it's not the brightest but when I had to profile things like this, I just put in manually counters (switchable with preprocessor) in the code and summed out, writing a report at the end. At least also when also using external computing on the graphic card (CUDA) it was the only sensible approach.

Offline

#3 2019-01-24 08:16:25

lionon
Member
Registered: 2018-11-19
Posts: 532

Re: Results of profiling server1 with 166 active players

PS: If most of the time is kernel space spend waiting for I/O, one idea might also be to go more to non-blocking I/O. While this may open a can of worms for race conditions, if done well, it should considerably improve things.

I mean there is no reason why a map I/O should block the whole server thread and it couldn't handle player stuff on a far away player at the same time. Proper clunking together chunks to be logical connected is not super trivial tough.

For example the game "Eve Online" had this that it chunked players together in close proximity on "grid": And I suppose they were one logical block on server side.  If you came in effective proximity to another player your grids were joined. Two fleets could be on sperate grids or joined grids. Theoretically it should be invisible to the player, as far I recall there were some arcane effects exploited by players tough, like extending their grid by making a chain of players etc.

About profiling with non-blocking I/O pseudo-threads I don't see any reasonable profiling other than manually clock counting in interesting places of the code.

Offline

#4 2019-01-24 16:10:17

jasonrohrer
Administrator
Registered: 2017-02-13
Posts: 4,803

Re: Results of profiling server1 with 166 active players

The full profile linked above shows a break-down by functions.  Here's a sample of it:

Functions with more than one sample:
 
 99.818% ===================================== (3833 samples)
         main
 
 
 32.214% ===================================== (1237 samples)
         epoll_wait
 
 
 32.214% ===================================== (1237 samples)
         SocketPoll::wait
 
 
 14.740% ===================================== (566 samples)
         recomputeHeatMap
 
 
 13.802% ===================================== (530 samples)
         getMapObject
 
 
 11.016% ===================================== (423 samples)
         Socket::send
 
 
  9.271% ===================================== (356 samples)
         LINEARDB3_getOrPut
 
 
  9.219% ===================================== (354 samples)
         __libc_send
 
 
  7.344% ===================================== (282 samples)
         LINEARDB3_get
 
 
  7.161% ===================================== (275 samples)
         sendMapChunkMessage
 
 
  7.057% ===================================== (271 samples)
         new_do_write
 
 
  7.057% ===================================== (271 samples)
         _IO_new_do_write
 
 
  7.005% ===================================== (269 samples)
         write
 
 
  7.005% ===================================== (269 samples)
         _IO_new_file_write
 
 
  6.849% ===================================== (263 samples)
         getChunkMessage
 
 
  6.797% ===================================== (261 samples)
         LINEARDB3_considerFingerprintBucket
 
 
  6.484% ===================================== (249 samples)
         getMapObjectRaw
 
 
  6.432% ===================================== (247 samples)
         checkDecayObject
 
 
  6.250% ===================================== (240 samples)
         FileLog::logStringV
 
 
  6.250% ===================================== (240 samples)
         PrintLog::logStringV
 
 
  6.250% ===================================== (240 samples)
         AppLog::infoF
 
 
  5.260% ===================================== (202 samples)
         dbGet
 
 
  4.844% ===================================== (186 samples)
         readSocketFull
 
 
  4.818% ===================================== (185 samples)
         Socket::receive
 
 
  4.583% ===================================== (176 samples)
         timed_read
 
 
  4.245% ===================================== (163 samples)
         select
 
 
  4.062% ===================================== (156 samples)
         _IO_puts
 
 
  4.036% ===================================== (155 samples)
         getMapFloor
 
 
  4.036% ===================================== (155 samples)
         _IO_new_file_overflow
 
 
  4.036% ===================================== (155 samples)
         getPTrans
 
 
  4.010% ===================================== (154 samples)
         dbFloorGet
 
 
  3.984% ===================================== (153 samples)
         getTrans
 
 
  3.906% ===================================== (150 samples)
         fseeko
 
 
  3.802% ===================================== (146 samples)
         _IO_new_file_seekoff
 
 
  3.776% ===================================== (145 samples)
         getMapBiomeIndex
 
 
  3.646% ===================================== (140 samples)
         computeMapBiomeIndex
 
 
  3.229% ===================================== (124 samples)
         read
 
 
  3.151% ===================================== (121 samples)
         operator
 
 
  2.969% ===================================== (114 samples)
         getXYFractal
 
 
  2.917% ===================================== (112 samples)
         File::readFileContents
 
 
  2.865% ===================================== (110 samples)
         _IO_vfprintf_internal
 
 
  2.630% ===================================== (101 samples)
         getXYRandomBN
 
 
  2.630% ===================================== (101 samples)
         HashTable<double>::lookupBin
 
 
  2.578% ===================================== (99 samples)
         SettingsManager::getFloatSetting
 
 
  2.448% ===================================== (94 samples)
         HashTable<double>::lookupPointer
 
 
  2.318% ===================================== (89 samples)
         getMapBiome
 
 
  2.188% ===================================== (84 samples)
         Time::getCurrentTime
 
 
  2.031% ===================================== (78 samples)
         dbGetCached
 
 
  1.979% ===================================== (76 samples)
         LINEARDB3_put
 
 
  1.979% ===================================== (76 samples)
         _IO_new_file_xsputn
 
 
  1.979% ===================================== (76 samples)
         __printf
 
 
  1.901% ===================================== (73 samples)
         SettingsManager::getStringSetting
 
 
  1.875% ===================================== (72 samples)
         SettingsManager::getSetting
 
 
  1.823% ===================================== (70 samples)
         stepMap
 
 
  1.745% ===================================== (67 samples)
         SettingsManager::getSettingContents
 
 
  1.719% ===================================== (66 samples)
         zipCompress
 
 
  1.589% ===================================== (61 samples)
         xxTweakedHash2D
 
 
  1.536% ===================================== (59 samples)
         __GI___libc_malloc
 
 
  1.458% ===================================== (56 samples)
         SettingsManager::getIntSetting
 
 
  1.458% ===================================== (56 samples)
         stepCurses
 
 
  1.432% ===================================== (55 samples)
         getNewCurseTokenHolders
 
 
  1.432% ===================================== (55 samples)
         SimpleVector<int>::getElementDirect
 
 
  1.276% ===================================== (49 samples)
         getFirstEmptyBucketIndex
 
 
  1.224% ===================================== (47 samples)
         dbPut
 
 
  1.224% ===================================== (47 samples)
         PrintLog::generateLogMessage
 
 
  1.224% ===================================== (47 samples)
         getBaseMap
 
 
  1.172% ===================================== (45 samples)
         biomeGetCached
 
 
  1.146% ===================================== (44 samples)
         __GI___xstat
 
 
  1.146% ===================================== (44 samples)
         __tz_convert
 
 
  1.068% ===================================== (41 samples)
         SimpleVector<double>::size
 
 
  1.016% ===================================== (39 samples)
         autoSprintf
 
 
  1.016% ===================================== (39 samples)
         Socket::setNoDelay

Offline

#5 2019-01-24 16:22:52

jasonrohrer
Administrator
Registered: 2017-02-13
Posts: 4,803

Re: Results of profiling server1 with 166 active players

As far as detecting time spent in kernel space, this method will do that just fine.  In fact, that's the whole point.  Where is the program actually spending its time?  If you interrupt it right now, where is it?  Is it blocked waiting for i/o?

Counting profilers can't do this, because they can't "count" while blocked on a system call.

However, there is one case that I might be missing:  if the process is in an uninterruptable sleep state:

https://support.microfocus.com/kb/doc.php?id=7002725

GDB can't interrupt it in that case.  So the signal would be ignored for a bit, and then it would land.... where?  I'm not sure.  I hope it would land in the system call somewhere after the uninterruptable sleep is over (so it would still pinpoint the i/o operation, at least).

Offline

#6 2019-01-24 16:27:35

jasonrohrer
Administrator
Registered: 2017-02-13
Posts: 4,803

Re: Results of profiling server1 with 166 active players

Okay, another thing I realized, after looking at some logs from the server1 stress test.

The server step loop runs each time new messages arrive from clients.

This means that it runs more often if there are more players connected, because messages arrive more often.

In one sample, when there were 100 players connected, the server step loop ran 65 times in a single second.  That's a lot of updating.

Yes, there are some "realtime" things that happen in response to client messages, and those should be processed right away.

But some other "regular maintenance" things, like recomputing heat maps, are ridiculous to run 65 times per second.  I'm computing heat maps for two players per step, round-robin, and this means that we are recomputing heat maps for 130 players every second (recomputing them for everyone more than once a second).  This is obviously a waste.

So, a number of these "regular maintenance" things should be triggered on a fixed timestep, regardless of how many players are connected and how fast messages are arriving.

That will result in a dramatic reduction in CPU usage as the number of connected players grows.

Offline

#7 2019-01-24 17:00:58

Dodge
Member
Registered: 2018-08-27
Posts: 2,467

Re: Results of profiling server1 with 166 active players

"That will result in a dramatic reduction in CPU usage as the number of connected players grows."

Nice, 100+ players server incoming big_smile

Offline

#8 2019-01-24 17:50:20

pein
Member
Registered: 2018-03-31
Posts: 4,335

Re: Results of profiling server1 with 166 active players

yeah, unnecessary realism hurting game balance
temp running was op, i get it, but this high refresh on heat maps arent that necessary
so even if that would be like 5 seconds wouldn't be that much a problem, if it means that that reduces lag a bit, it still wouldn't be same as temp running

thing is most cities are desert cities, and got higher chances of keeping more people alive, but it means heat can change a lot of times
i guess it doesn't matter if recalculates anyway
jungle wont fluctuate much, so heat could be calculated on much higher area and would be same anyway


https://onehouronelife.com/forums/viewtopic.php?id=7986 livestock pens 4.0
https://onehouronelife.com/forums/viewtopic.php?id=4411 maxi guide

Playing OHOL optimally is like cosplaying a cactus: stand still and don't waste the water.

Offline

#9 2019-01-24 17:53:21

GreatShawn
Member
Registered: 2018-09-08
Posts: 381

Re: Results of profiling server1 with 166 active players

I don't even know what the data stuff means. Guess my past 10 years in china is all goin to trash.

Offline

#10 2019-01-25 03:28:54

Gederian
Member
Registered: 2018-03-28
Posts: 164

Re: Results of profiling server1 with 166 active players

I don't think heat map is actual temp of your character. It's everything about you. Unfortunate name.

Offline

Board footer

Powered by FluxBB