A Study of VDISYS

Topic       : The GFA-Basic Compendium
Author      : GFA Systemtechnik GmbH
Version     : GFABasic.HYP v2.98 (12/31/2023)
Subject     : Documentation/Programming
Nodes       : 899
Index Size  : 28056
HCP-Version : 3
Compiled on : Atari
@charset    : atarist
@lang       : 
@default    : Document not found
@help       : Help
@options    : +g -i -s +z
@width      : 75
@hostname   : STRNGSRV
@hostname   : CAB     
@hostname   : HIGHWIRE
@hostname   : THING   
View Ref-FileA Study of VDISYS
by Lonny PursellWWW: http://gfabasic.net/

Rev 3  3/12/2018

Is it possible to shorten the bindings and/or squeeze more speed out of them?

Lets face it, the interface to the VDI is clunky at best, but we are stuck with
it if we want to write clean GEM applications. Every little bit of speed gained
helps, especially when rendering text.

Perhaps like most beginners I found an archive of bindings on the internet.
They seemed to work and I assumed that was the proper way to write them and
have been doing so ever since. I never really bothered to study the manual,
which is ironic because it actually does explain VDISYS in detail. Here's a
typical binding as I would normally write one. Let's call this long form.

' example #1: long form
PROCEDURE pcircle(x&,y&,r&)
  CONTRL(0)=11  !opcode
  CONTRL(1)=3   !ptsin count
  CONTRL(3)=0   !intin count
  CONTRL(5)=4   !sub-opcode
  PTSIN(0)=x&
  PTSIN(1)=y&
  PTSIN(4)=r&
  VDISYS
RETURN

Here's the same binding written in what I call short form.

' example #2: short form
PROCEDURE pcircle(x&,y&,r&)
  PTSIN(0)=x&
  PTSIN(1)=y&
  PTSIN(4)=r&
  VDISYS 11,0,3,4  !opcode,intin count,ptsin count,sub-opcode
RETURN

Yeah, Frank flipped the two middle parameters. ;o)

If you don't need the sub-opcode, just leave out. If you need some extra
CONTRL() parameter beyond 5, which is rare, just add the appropriate
CONTRL(a)=b line.

As you can see the CONTRL() array is completely omitted and passed in one line
of code. Is it really faster? I'm having trouble figuring that out as the
timer in aranym is utter crap. I get varying results, sometimes its faster,
sometimes its slow. :-P

Now if we look deeper at the resulting assembler output, I can safely say its
faster. CONTRL(a)=b produces a library call, thus a BSR and some addition just
to set one element. So that's four BSR calls and some addition (ADD.W) before
we even get to the VDISYS call.

The variant with the parameters in one line converts all four parameters to
MOVEQ #x,dx and inside the VDISYS call it does MOVE.W dx directly to where it
needs to go, without using any addition. You also get a smaller binary.

Another interesting study is PTSIN(a)=b. Although these built-in arrays make
the code far more readable I'm not so sure its the most efficient method if
you are a speed junkie.

PTSIN(4)=r&  versus  WORD{PTSIN+8}=r&

Just like CONTRL(), the line PTSIN(a)=b produces a BSR and some math
instructions. Here's two more examples using the old school method, before
Frank introduced the fancy built in arrays:

' example #3: alt long form
PROCEDURE pcircle3(x&,y&,r&)
  WORD{CONTRL}=11
  WORD{CONTRL+2}=3
  WORD{CONTRL+6}=0
  WORD{CONTRL+10}=4
  WORD{PTSIN}=x&
  WORD{PTSIN+2}=y&
  WORD{PTSIN+8}=r&
  VDISYS
RETURN

' example #4: alternate short form
PROCEDURE pcircle(x&,y&,r&)
  WORD{PTSIN}=x&
  WORD{PTSIN+2}=y&
  WORD{PTSIN+8}=r&
  VDISYS 11,0,3,4  !opcode
RETURN

The interesting thing here is a study of the assembler output. I was expecting
to see some math, but instead the compiler does a cool optimization like so:

WORD{PTSIN+8}=r&  !becomes...

BSR      PTSIN          ;adr of ptsin array -> d0
MOVEA.L  d0,a0          ;save -> a0
MOVEA.L  -$8000(a5),a2  ;address of r&
MOVE.W   (a2),d0        ;value of r& -> d0
MOVE.W   d0,8(a0)       ;stuff it in the ptsin array at offset 8!

This produces less code than the long form, but slightly more than the short
form.

I  fire up the TT030 in plain TOS. MiNT seems to cause inconsistent results,
perhaps something to do with task switching. Here's the results for all four
bindings using 1000 iterations:

example #1 - short form:     15.085
example #2 - long form:      15.075
example #3 - alt short form: 15.085
example #4 - alt long form:  15.075

From these results we can see that both versions of the short form end up
faster. There doesn't seem to be any advantage to the old school poking method.

You will also notice none of my examples set CONTRL(6). This is because GFA's
startup code has already copied the value of V~H to CONTRL(6). This scheme
relies on the fact that the VDI won't alter CONTRL(6). If you add a line like
CONTRL(6)=V~H you are wasting CPU cycles. ;o)

One last point to remember. Always use the built in graphics and VDI commands
if at all possible and only use a binding if you absolutely have to. User
routines use the stack for parameter passing whereas the built-in commands use
registers, which is faster.

DEFTEXT however is an exception to this rule. Internally it calls
vqt_attributes() regardless of the parameters, even if some are omitted. This
is done to keep track of the attributes used inside it's own fake windows,
which no one ever uses. If one is doing a lot of text colors and effects, the
internal call to vqt_attributes() is wasting cpu cycles. There's a new compiler
option ($D~) to speed up DEFTEXT. This option removes the needless call to
vqt_attributes().