| 07-18-2006, 12:50 PM | #1 |
On wc3sear.ch sombody submitted this peace of code, and guessed this being faster then the natives.(I doubt this, but no matter) I opened this thread to ask if it does actually work for all values for x, and how to get to the algorithm. I know Pow is dumb and very slow, I just copied what was posted on wc3sear.ch. JASS://sin(x) = x- (x^3)/3+ (x^5)/5! + (x^7)/7 function TSin takes real x returns real return (x - Pow(x,3)/3 + Pow(x,5)/5 + Pow(x,7)/7) endfunction //cos(x) = 1- (x^2)/2+ (x^4)/4 - (x^6)/6 function TCos takes real x returns real return (1 - Pow(x,2)/2 + Pow(x,4)/4 - Pow(x,6)/6) endfunction Thx for reply. EDIT: I hope I corrected the formula right right now. |
| 07-18-2006, 01:19 PM | #2 |
3 pows + non-native function call sure are slower than native trig functions. Time to test. Should it take radians or degrees? |
| 07-18-2006, 01:27 PM | #3 |
Go for degrees. And I actually asked you if this method would be faster a while ago Vex, but nevermind. It's also a lot more imprecise, but I've no idea how much that will affect stuff ingame, as the sum of the infinite progession of those is actually equal to sin and cos. That's probably how the native's calculate it, but I don't know how many they take into account |
| 07-18-2006, 01:28 PM | #4 |
Tests show: Sin : 11.21 seconds TSin : 36.06 seconds 5000 times:function yyY takes nothing returns nothing local real x=-2*bj_PI local real L=-x local real y loop exitwhen x>L set y=(T)Sin(x) set x=x+0.01 endloop set udg_log=udg_log+1 endfunction Edit: Should I test in degrees instead then, ok I will? |
| 07-18-2006, 01:38 PM | #5 |
Even multiplying Sin's argument with bj_RADTODEG and not doing so on TSin, and using values from -360 to 360, TSin takes 21.28 seconds and Sin 8.53 seconds, would like to know who said that TSin was faster than Sin |
| 07-18-2006, 01:40 PM | #6 |
sounds like a stupid idea to start with anyway.. pow is sooo slow. |
| 07-18-2006, 01:50 PM | #7 |
Actually, those funcs there have nothing to do with sin and cos now that I look at them, they should be sin => x - pow(x,3)/6 + pow(x,5)/120 - pow(x,7)/5040 cos => 1 - pow(x,2)/2 + pow(x,4)/24 - pow(x,6)/720 And you could speed them up by swapping pow(x,3) with x*x*x and the like |
| 07-18-2006, 06:00 PM | #8 |
Yes there were locla var versions without pow too, but the primaray aspect was to get some one on hand, to explain me WHY it returns the same values as sin/cos and how to get, to expire or to "invent" such a algorithm. @Mezzer: Have look at the top, edited the func for easier reading. @Vexorian: I have no clue if it should take degrees or radians. |
| 07-18-2006, 06:21 PM | #9 |
Pow(x,3/6) is NOT (x*x*x)/6 Pow(x,3)/6 is. Oh and, no it doesnt work for all values of x, sin/cos is an polynomial (as it seems, also dont know if its the correct term for it, im rather sloppy on my terms :) ) and you would have to continue with x^9/9! etc. |
| 07-18-2006, 06:27 PM | #10 |
Hehe, I remember when those were submitted. Not only are they WAY SLOWER but they're wrong in a few different ways. The usual algorithm: 1. transform angle to range 0..pi/2 or at least 0..2pi 2. run a few iterations of an angle reduction identity like sin3x = 3sinx - 4sin^3x 3. terminate with a few terms of the taylor series or a polynomial that approximates the trig function close to zero. for sin, just 'x' can be good enough, for cos, just 1. You could study blizzard's routine as well. It's faked in integer arithmetic but the structure of the algorithm should be apparent. Code:
.text:6F395D70 game_sin proc near ; CODE XREF: sub_6F08B1A0+E2p .text:6F395D70 ; sub_6F1F19E0+17Ep ... .text:6F395D70 .text:6F395D70 var_4 = dword ptr -4 .text:6F395D70 .text:6F395D70 push ebp .text:6F395D71 mov ebp, esp .text:6F395D73 push ecx .text:6F395D74 push ebx .text:6F395D75 push esi .text:6F395D76 push edi .text:6F395D77 mov edi, ecx .text:6F395D79 push offset dword_6F874BD8 .text:6F395D7E lea ecx, [ebp+var_4] ; float * .text:6F395D81 call game_multiply .text:6F395D86 mov eax, [eax] .text:6F395D88 lea ecx, [ebp+var_4] .text:6F395D8B mov [ebp+var_4], eax .text:6F395D8E call sub_6F395060 .text:6F395D93 mov ecx, eax .text:6F395D95 and ecx, 0FFh .text:6F395D9B mov esi, ecx .text:6F395D9D shl esi, 8 .text:6F395DA0 or ecx, esi .text:6F395DA2 mov edx, eax .text:6F395DA4 sar eax, 12h .text:6F395DA7 mov esi, ecx .text:6F395DA9 and eax, 3 .text:6F395DAC sar edx, 8 .text:6F395DAF shl esi, 10h .text:6F395DB2 mov ebx, eax .text:6F395DB4 and edx, 3FFh .text:6F395DBA or ecx, esi .text:6F395DBC test bl, 1 .text:6F395DBF jz short loc_6F395DE2 .text:6F395DC1 lea eax, ds:0[edx*4] .text:6F395DC8 mov edx, offset unk_6F726E68 .text:6F395DCD sub edx, eax .text:6F395DCF mov esi, [edx] .text:6F395DD1 mov edx, offset unk_6F726E64 .text:6F395DD6 sub edx, eax .text:6F395DD8 mov eax, esi .text:6F395DDA sub eax, [edx] .text:6F395DDC mul ecx .text:6F395DDE neg edx .text:6F395DE0 jmp short loc_6F395DF4 .text:6F395DE2 ; --------------------------------------------------------------------------- .text:6F395DE2 .text:6F395DE2 loc_6F395DE2: ; CODE XREF: game_sin+4Fj .text:6F395DE2 mov esi, ds:dword_6F725E68[edx*4] .text:6F395DE9 mov eax, ds:dword_6F725E6C[edx*4] .text:6F395DF0 sub eax, esi .text:6F395DF2 mul ecx .text:6F395DF4 .text:6F395DF4 loc_6F395DF4: ; CODE XREF: game_sin+70j .text:6F395DF4 shl ebx, 1Eh .text:6F395DF7 add edx, esi .text:6F395DF9 sar ebx, 1Fh .text:6F395DFC xor edx, ebx .text:6F395DFE sub edx, ebx .text:6F395E00 lea ecx, [ebp+var_4] .text:6F395E03 call I2R .text:6F395E08 mov eax, [eax] .text:6F395E0A mov ecx, eax .text:6F395E0C and ecx, 7F800000h .text:6F395E12 neg ecx .text:6F395E14 sbb ecx, ecx .text:6F395E16 and ecx, 0F800000h .text:6F395E1C sub eax, ecx .text:6F395E1E mov [edi], eax .text:6F395E20 mov eax, edi .text:6F395E22 pop edi .text:6F395E23 pop esi .text:6F395E24 pop ebx .text:6F395E25 mov esp, ebp .text:6F395E27 pop ebp .text:6F395E28 retn .text:6F395E28 game_sin endp .text:6F395E28 .text:6F395060 sub_6F395060 proc near ; CODE XREF: sub_6F008FB0+143p .text:6F395060 ; sub_6F008FB0+285p ... .text:6F395060 push esi .text:6F395061 mov esi, [ecx] .text:6F395063 mov edx, esi .text:6F395065 shr edx, 17h .text:6F395068 and edx, 0FFh .text:6F39506E cmp edx, 7Fh .text:6F395071 jnb short loc_6F395077 .text:6F395073 xor eax, eax .text:6F395075 pop esi .text:6F395076 retn .text:6F395077 ; --------------------------------------------------------------------------- .text:6F395077 .text:6F395077 loc_6F395077: ; CODE XREF: sub_6F395060+11j .text:6F395077 mov eax, esi .text:6F395079 and eax, 7FFFFFh .text:6F39507E mov ecx, 96h .text:6F395083 or eax, 800000h .text:6F395088 sub ecx, edx .text:6F39508A js short loc_6F39509D .text:6F39508C shr eax, cl .text:6F39508E mov ecx, esi .text:6F395090 sar ecx, 1Fh .text:6F395093 pop esi .text:6F395094 mov edx, eax .text:6F395096 mov eax, ecx .text:6F395098 xor eax, edx .text:6F39509A sub eax, ecx .text:6F39509C retn .text:6F39509D ; --------------------------------------------------------------------------- .text:6F39509D .text:6F39509D loc_6F39509D: ; CODE XREF: sub_6F395060+2Aj .text:6F39509D neg ecx .text:6F39509F shl eax, cl .text:6F3950A1 mov ecx, esi .text:6F3950A3 sar ecx, 1Fh .text:6F3950A6 pop esi .text:6F3950A7 mov edx, eax .text:6F3950A9 mov eax, ecx .text:6F3950AB xor eax, edx .text:6F3950AD sub eax, ecx .text:6F3950AF retn .text:6F3950AF sub_6F395060 endp |
| 07-18-2006, 06:27 PM | #11 |
Ok, this seems to get even more usless, without any reachable aim ... Vex, plx close this thread or just delete it. Thx. |
| 07-18-2006, 06:42 PM | #12 |
You should change it in the code Bert since what you are doing there is taking x to the 0.5th power and the like and not to the 3rd power and the dividing by 6 as you should be. And infrane, you don't need to do the whole thing, the first few will suffice to a certain degree of error |
| 07-18-2006, 06:46 PM | #13 |
actually if you want to improve speed you could do what old school game developers have been doing over the time and use a look up table. Wait a second some one is calling, I am gonna edit this post soon with more info note to self: learn asm |
| 07-18-2006, 06:51 PM | #14 | |
Quote:
So if you want it for all x you must include the whole polynomial, which you cant so you must use either a radial/degree normalization. Or just use a table, indeed, which is alot faster. |
| 07-18-2006, 06:54 PM | #15 |
Ok, true enough. Still, if he means to use it, he need the actual values in there |
