Topic : The GFA-Basic Compendium
Author : GFA Systemtechnik GmbH
Version : GFABasic.HYP v2.98 (12/31/2023)
Subject : Documentation/Programming
Nodes : 899
Index Size : 28056
HCP-Version : 3
Compiled on : Atari
@charset : atarist
@lang :
@default : Document not found
@help : Help
@options : +g -i -s +z
@width : 75
@hostname : STRNGSRV
@hostname : CAB
@hostname : HIGHWIRE
@hostname : THING
View Ref-FileThe optimization used by the compiler with multiplication contains a special
feature. The Atari ST's processor does have multiplication instructions, but it
requires a large number of tact cycles for their execution and only knows these
instructions for two-byte values. Therefore the 3.0 compiler has to
coordinate the multiplication of a constant with a four-byte variable while
avoiding the use of the assembler instructions mulu and muls.
The line
x%=y%*4
for example, can be simply translated into the code:
move.l -$8000(a5),d0
lsl.l #$2,d0
move.l d0,-$7ffc(a5)
since 4 is the square of 2. Coding the statement:
x%=y%*21
is somewhat more involved, since 21 equals binary 10101. To be able to better
understand the code generated by the compiler, let us assume in the comments
that y%=100. The code generated is:
move.l -$8000(a5),d0 ;y%, that is 100, into register d0
move.l d0,d2 ;copy of y% into register d2
lsl.l #$2,d0 ;d0 times 4, d0 contains 400, d2=100
add.l d0,d2 ;add 400 to 100, therefore d0=400,
;d2=500
lsl.l #$2,d0 ;d0 times 4, therefore d0=1600,
;d2=500
add.l d2,d0 ;add d2 to d0: d0=2100, d2=500
move.l d0,-$7ffc(a5) ;writes result 2100 to x%
This code uses an extra 6 bytes of memory compared to a muls #$15,d0 (which,
however, does not exist for four-byte values), but is faster with 52 tact
cycles execution time (without the first and the last move instruction).
By contrast, the statement
x&=y&*21
was coded using muls. In the case of 100,000 multiplications with 21,
therefore, the four-byte variables statement is a little faster:
Statement Time
x%=y%*21 0.975
x&=y&*21 1.08
However, if the multiplication is not with 21 but with values whose translation
into the assembler statements add, sub, lsl and asl requires relatively lengthy
code, the four-bytes variant not only generates considerably more code, but
also executes more slowly than the two-bytes statement working with muls. For
example, multiplying with 21845 (binary 101010101010101) instead of 21, the
execution times for 100,000 repeats are:
Statement Time
x%=y%*21845 2.26
x&=y&*21845 1.34
So far we have only looked at multiplications of one variable with one
constant. When multiplying two integer variables, however, the fact that the
mul statement only processes two-bytes values is also of great importance.
The following list shows the code generated by different integer
multiplications and their execution times for 25000 multiplications.
move.l -$7ff4(a5),d0 ;x%=y%*z%, 7.685
move.l -$7ff8(a5),d1
bsr LMUL
move.l d0,-$7ffc(a5)
move.w -$7ff0(a5),d0 ;x%=y%*z&, 6.165
ext.l d0
move.l -$7ff8(a5),d1
bsr LMUL
move.l d0,-$7ffc(a5)
move.w -$7ff0(a5),d0 ;x%=y&*z&, 2.44
muls -$7fee(a5),d0
move.l d0,-$7ffc(a5)
move.w -$7ff0(a5),d0 ;x&=y&*z&, 2.315
muls -$7fee(a5),d0
move.w d0,-$7fec(a5)
The great speed differences between the first and the last two stems from the
fact that in the last two examples, muls could be used instead of the LMUL
routine. The absolute timings also depend on the values found in y%, z%, y&,
and z&. The relative speeds between the different statements, however, remain
broadly the same.
In these two cases, where a four-bytes variable is involved, the $*& compiler
option can also force a use of muls. It has to be remembered, however, that
only two bytes of the four-byte values processed enter into the calculation. A
more detailed description can be found in the chapter on compiler options
(integer multiplication).
The small differences between the two LMUL and the two muls examples stem from
the move.w versus move.l and the ext.l instructions only. When multiplying
integer variables, it is therefore worthwhile using two-byte instead of
four-byte variables.