| 08-20-2011, 01:48 PM | #1 | ||||||||||||||||||||||||||
Okay, since people often claim how some code is super fast and other code is terribly slow I decided to set up a testing environment where I could test such vague statements and quantify them with solid numbers. Since stopwatch natives no longer work with the current patch, I had to resort to FPS tests when doing benchmarks. The problem with this is that it only tells you which of the two codes you are repeating X thousand times per second is faster, it doesn't tell you how much faster. The relationship between code execution cost and FPS is not linear, getting 20fps for one code and 40fps for another doesn't meant that the latter is twice as fast. I get around this problem by varying the number of times I repeat each code until they both result in the same FPS drop. At that point, I can simply compare the number of times I had to repeat each code to see how much longer one of them took to execute than the other. Thus, my testing environment looked like this: JASS:scope test globals private integer i endglobals private function Test takes nothing returns nothing // a few comment lines // to prevent inlining endfunction //! textmacro TenCalls call Test() call Test() call Test() call Test() call Test() call Test() call Test() call Test() call Test() call Test() //! endtextmacro private function Periodic takes nothing returns nothing set i=100 loop //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() // 100 calls //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() //! runtextmacro TenCalls() // 200 calls set i=i-1 exitwhen i==0 endloop //call BJDebugMsg("op limit not reached") // used to confirm that the code has not reached the op limit. endfunction private function Init takes nothing returns nothing call TimerStart(CreateTimer(), 0.01, true, function Periodic) endfunction endscope With a test environment like that, I would simply write the code I wanted to test, copy it ten times and put it in a textmacro, then run that textmacro a bunch of times in the periodic function. Then, I would keep changing the number of times I run the textmacro and/or repeat the loop and testing the map until I would get a specific FPS (20fps in my case). Once that was reached, I would count the number of times I ran the code and document that. Here are the results:
The above examples were all very theoretical, even when I tried to simulate practical uses I could only speculate about what users would do in their periodic triggers, so I decided to do a much more practical stress test, using an actual projectile library. I tested xemissile, a rather well optimized script designed specifically for simulation of WC3 missiles and nothing more. This is the simplest script I could think of that actually does something useful. A simple knockback system could perhaps be a bit less costly, but that's about it. I was only testing running costs, not allocation, so I simply created a bunch of very slow missiles when the map started and then watched the FPS as they moved towards their target: JASS:private function Init takes nothing returns nothing set i=105 set u=CreateUnit(Player(0), 'hfoo', 500.0,0.0,0.0) loop call xemissile.create(-1000,GetRandomReal(-500,500),0, 1000,0,0).launch(25,0.1) //call xehomingmissile.create(-1000,GetRandomReal(-500,500),0, u,0).launch(25,0.1) set i=i-1 exitwhen i==0 endloop endfunction That's all I have so far. You're welcome to make your own tests similar to these, but before you use the FC unit to measure processing costs please first repeat some of my tests and check if you get the same ratios between results in order to confirm that FC is a reliable way to measure this. |
| 09-19-2011, 02:13 PM | #2 |
I would like to request a "newer" benchmark of array reading against hash table reading. If the hash table turns out to be very slow, then I would also like to see a comparison between 2D arrays and hash tables. Furthermore, you could test if the read/write speed of a hash table depends on the amount of objects stored within it (it shouldn't, but still...). I would do those benchmarks myself, but I have no WC3 at the moment. |
| 09-20-2011, 02:31 PM | #3 | ||||||||||||||||||||||||||||||
I did the benchmarks you requested. Since you can't have a variable read on its own, I decided to test it as part of an if statement, since I have already done some such tests before. Note that I was comparing practical costs, so each array read also involves one variable read and each hashtable read involves three variable reads (one for the hashtable, two for the keys). My test cases thus pretty much represent what Table inlines and optimizes to. Here are my results: Table:
Comments:
Conclusions:
|
| 10-12-2011, 10:34 PM | #4 | ||||||||||||||||||||||||||||||||||||||||||||||
I did some more tests. I decided to test different methods of running code for units in an area. As it turns out, now that group enumerations with a null filter no longer leak, FirstOfGroup based loops perform the best.
During the course of testing, I occasionally repeated my reference test of 20000 function calls on a function with no arguments and a name 10 characters long. As I was testing different numbers of units, I noticed that the target FPS has changed, so the tests I did on the FirstOfGroup loops with different numbers of units were not as accurate. How big was the error? I ran some tests on an empty map and a map with 50 footmen. To get the same FPS, I needed to reduce the number of function calls by 3% when I was testing with 50 stationary units on the map. This isn't such a big deal, in my tests I had 1 to 20 units so my results are still fairly accurate. For fun, I also tested what happened when I ordered those 50 footmen to patrol to a nearby point. The reduction in the number of function calls needed to maintain the same FPS was 5%, so those 50 patrolling footmen cost me 1000 FC. |
| 10-13-2011, 09:10 PM | #5 | |||||||||||||||||||
I recently stumbled upon the following statement: Quote:
|
| 10-13-2011, 11:32 PM | #6 |
How hard is it to actually update the stopwatch natives? Seems like this patch is gonna be the current one for a while. |
| 10-16-2011, 04:11 AM | #7 | ||
Quote:
Thank you lol. I never understood the logic of that statement. In a logical sense, they would be equal in terms of speed, which apparently is so. Quote:
I think the source code/dl link of the stopwatch natives is down so it would probably be difficult to do unless someone gets in contact with SFilip. |
| 10-16-2011, 11:07 AM | #8 |
Anitarf, could you once again benchmark GroupEnum + filter (performing actions) with GroupEnum + FirstOfGroup enumeration but with differend circumstances? What I mean is moving those closer to real environtment when user uses multiple comparisons (for unit selection) and with higher unit count. Meaby even with unit-type mix. I'm just unsure of current results since those are 'perfect circumstances' which to be honest never happen. Jassfags usually, while writing spells/systems implement ton of selective stuff to consider all the options making thier work bugfree. Thanks in advance. |
| 10-16-2011, 05:22 PM | #9 | |
Quote:
|
| 10-16-2011, 07:28 PM | #10 |
Anitarf, I somewhat agree in case situation it's similar to the "N==0 and 0==N" issue, although I started to be unsure again after reading random responses. And if you scroll down a bit to Bribe's post, you can see that he is mentioning drop in speed in favour of FirstOfGroup to two times instead of three compared to GroupEnum + filter. What I thought is that, the more filter code we add (selection stuff) the smaller the difference between GroupEnum/FoG and GroupEnum/filter becomes. Although it's still ends up being two times faster. You know, those benchmarks have greatly changed the sight on some major things, especialy group enumeration methods. |
| 10-16-2011, 09:28 PM | #11 | |
Quote:
As for other comments in that thread, yes, I did only do these tests on one computer, it's the only one I have. I'm not sure what people who object to this expect me to do, go out and buy a new computer so I can do more WC3 benchmarks? I don't think so. When I first posted this thread I did invite others to replicate my tests on different hardware so if you feel my results are suspect then by all means, repeat my tests. If you don't then you don't get to complain. The comment about the op limit may be a valid concern, but there are other ways of avoiding it besides moving some code to enum filters. For example, in a missile system, you could run the periodic update function of every missile that does collision checks with an .evaluate instead of a regular function call. The cost of doing so will easily be offset by using the faster FirstOfGroup loop as long as there are enough units in range (which there should be since the whole justification for why the op limit might be a problem was that missile collision checks could be enumerating large amounts of units). |
| 10-26-2011, 11:00 PM | #12 | |||||||||
I ran some more tests to verify the feasibility of dummy unit recycling. First, I wrote a simple dummy unit recycling library that maintains multiple queues of dummies at different facing angles so that the dummies are suitable for use as projectiles: JASS:library xedummy requires xebasic globals private constant integer ANGLE_RESOLUTION = 12 private constant integer DUMMY_PRELOAD_COUNT = 5 // Not yet implemented. endglobals // END OF CALIBRATION SECTION // ================================================================ private struct recycleQueue extends array recycleQueue next recycleQueue prev real angle integer size xedummy first xedummy last static method onInit takes nothing returns nothing local integer i=0 loop exitwhen i==ANGLE_RESOLUTION set i=i+1 set recycleQueue(i).prev=recycleQueue(i-1) set recycleQueue(i).next=recycleQueue(i+1) set recycleQueue(i).angle=(i-0.5)*(360.0/ANGLE_RESOLUTION) endloop set recycleQueue(1).prev=recycleQueue(i) set recycleQueue(i).next=recycleQueue(1) endmethod static method get takes real angle returns recycleQueue return recycleQueue(R2I(angle/360.0*ANGLE_RESOLUTION)+1) endmethod endstruct // ================================================================ struct xedummy private static group g=CreateGroup() private unit u // ---------------------------------------------------------------- private xedummy next private method queueInsert takes recycleQueue q returns nothing call SetUnitFacing(.u, q.angle) if q.size==0 then set q.first=this else set q.last.next=this endif set q.last=this set .next=0 // Check adjacent queues to see if they have fewer dummies than this queue. // If they do, move the first dummy from this queue to an adjacent queue. // This operation is recursive so it will find the local minimum queue and // increase the size of that rather than a neighbouring larger queue. if q.size>q.next.size then set this=q.first set q.first=.next call .queueInsert(q.next) elseif q.size>q.prev.size then set this=q.first set q.first=.next call .queueInsert(q.prev) else set q.size=q.size+1 endif endmethod private static method queueRemove takes recycleQueue q returns xedummy // Check adjacent queues to see if they have more dummies than this queue. // If they do, move the first dummy from that queue to this one. // This operation is recursive so it will find the local maximum queue and // decrease the size of that rather than a neighbouring smaller queue. local xedummy this if q.size<q.next.size then set this=q.last set q.last=.queueRemove(q.next) set .next=q.last call SetUnitFacing(q.last.u, q.angle) elseif q.size<q.prev.size then set this=q.last set q.last=.queueRemove(q.prev) set .next=q.last call SetUnitFacing(q.last.u, q.angle) else set q.size=q.size-1 if q.size==0 then set q.last=0 endif endif set this=q.first set q.first=.next set .next=0 return this endmethod // ---------------------------------------------------------------- private static method create takes unit u returns xedummy local xedummy this if GetUnitTypeId(u)!=XE_DUMMY_UNITID then debug call BJDebugMsg("xedummy.release error: Method called on a unit of an incorrect type.") elseif IsUnitInGroup(u, .g) then debug call BJDebugMsg("xedummy.release error: Method called on an already released unit.") else set this=.allocate() set .u=u call GroupAddUnit(.g, u) call .queueInsert(recycleQueue.get(GetUnitFacing(u))) call ShowUnit(u, false) return this endif return 0 endmethod private method destroy takes nothing returns nothing call GroupRemoveUnit(.g, .u) call ShowUnit(.u, true) set .u=null call .deallocate() endmethod // ---------------------------------------------------------------- private static unit dummy static method new takes player p, real x, real y, real face returns unit local recycleQueue q local xedummy this loop exitwhen face>0.0 set face=face+360.0 endloop loop exitwhen face<360.0 set face=face-360.0 endloop set q=recycleQueue.get(face) if q.size==0 then set .dummy = CreateUnit(p, XE_DUMMY_UNITID, x,y,face) call UnitAddAbility(this.dummy,XE_HEIGHT_ENABLER) call UnitAddAbility(this.dummy,'Aloc') call UnitRemoveAbility(this.dummy,XE_HEIGHT_ENABLER) else set this=.queueRemove(q) set .dummy=.u call .destroy() call SetUnitX(.dummy, x) call SetUnitY(.dummy, y) call SetUnitFacing(.dummy, face) call SetUnitOwner(.dummy, p, true) endif return .dummy endmethod static method release takes unit u returns nothing call .create(u) endmethod endstruct endlibrary
|
| 10-30-2011, 12:35 PM | #13 | ||||||||||
Table:
Second statement is slower because it has implicit conversion done by compiler: if LoadInteger(H,N,N)>=0.0 then ==> if I2R(LoadInteger(H,N,N))>=0.0 then You should have used integer zero instead of real zero: if LoadInteger(H,N,N)>=0 then Quote:
|
| 10-31-2011, 05:29 AM | #14 |
excellent suggestion, you get right on it ![]() |
| 11-01-2011, 11:04 PM | #15 | |||||||||||||||
Following up on my previous post, I decided to make a comparison between the dummy recycling library I wrote above and MissileRecycler that was submitted recently by Bribe. Since Bribe uses timers and I don't I was expecting his library to be slightly slower, but not significantly. Before I could benchmark the libraries, though, I had some tweaking to do. Since MissileRecycler's units did not immediately become available for recycling, I had to preload enough of them for my stress test. To avoid creating too many units, I reduced the number of directions to 3, this is a useless number in practice but the speed of recycling shouldn't really change with a higher number so my results are valid. I preloaded 70 units per direction so I had 210 units on my map in total. During testing, I was surprised that my library was performing significantly worse than Bribe's. It turned out that the culprit was the ShowUnit native: I was hiding my units on the recycle queue and Bribe wasn't. Since there's really no need to hide the dummy units, I removed the ShowUnit calls from my library and repeated the test. Table:
|
