Writing OS-Friendly Games
by Chris Green & Spencer Shanson
(c) Copyright 1992-93 Commodore-Amiga, Inc. All Rights Reserved
Many Amiga programmers believe that the only way to write a whiz-bang, speed-of-light games
is to bypass the operating system and go straight to the metal. A better approach is to write
games that are OS-friendly. The combination of OS 3.0 and the AA chipset makes this more
possible than ever.
Reasons to use the OS for games:
- Why reinvent the wheel? Spend your time doing things that only you can do.
- Compatibility with future chipsets. For instance, some planned future
chipsets are not register-level compatible with AA.
- Easier adaptation to future hardware. For instance, it takes
less time to convert a 16 color ECS game which uses the OS into
a 256 color AGA game than it does to convert a hardware-banging
ECS game
- RTG compatibility possible for some games.
- The OS automatically supports pre-ECS, ECS, AA, and future chips.
- Easier integration with other system components (CD-ROM, networks, serial ports, etc).
- Easy hard-disk install.
- Less code to write. OS has routines for handling all screen positions
and scrolls, mouse movement, etc. = less development time. You can spend
more time making the game more playable and less on getting the hardware
to work.
- More robustness. For instance, the OS floppy-disk code is far less
picky about drive parameters than 99% of custom floppy i/o code.
- Hides bugs and quirks of the chipset. The AA chip set has a few bugs which
the OS hides from you.
- The code runs out of ROM, which is faster than running the code out of
CHIP RAM.
- Multiple platforms. OS code will run on all Amiga-based machines,
whatever their flavour.
- Tools exist to help you debug your code rely on the OS being around (eg
Mungwall and Enforcer).
- A properly written game can be promoted, and thus work on cheap VGA
monitors.
Things the OS can't currently do:
- Scroll individual scanlines of a viewport
- AA colour copperlist fades
- dynamically update user copper lists.
All these are planned to be addressed in future OS releases.
One of our goals is to make it possible to perform as many amiga tricks
in normal intuition screens as is possible.
ECS-AA incompatibilities that the OS handles:
- Vertical counter behaves differently in programmable beam modes.
- No SuperHires color scrambling.
- Bitplane alignment problems.
Future envisioned chip changes that the OS will handle correctly:
- Chips with no fetch-mode selections. All selections automatic.
- Different DDFSTART/STOP calculations.
- Color loading differences
- Exact copper timings differences
- No SuperHires
- Multiple blitters
Game programming problems and solutions:
Q: What is the graphics rendering routines are much slower than my own
blitter code?
A: Use the blitter yourself. Call OwnBlitter, do setup, call WaitBlit(),
poke the blitter registers, and then DisownBlitter() when all blits are
done.
Note: OwnBlitter() is only 3 instructions (counting RTS) when no-one
else is trying to use the blitter.
Q: What if input.device eats too many cycles?
A: Install a high priority input handler which chokes off all events. This
handler is also a convenient way to get keys and mouse events yourself.
Simply store the raw keypresses and mouse moves in your own variables.
Q: How do I change both bitmap pointers and colors in sync?
A: Use a user-copper list to cause a copper interrupt on line 0 of your
viewport. The copper interrupt handler will signal a high-priority
task which calls LoadRGB32 and ChangeVPBitMap (or ScrollVPort) to
cause the changes. This allows perfect 60hz animation on an A1200,
even while moving the mouse as fast as possible, and inserting floppy
disks.
Under 3.0, you can also do this in an exclusive screen. You can
tell if it was your screen which caused the copper interrupt by
checking the flag VP_HIDE in your ViewPort->Modes.
Q: I need to use the blitter in an interrupt driven manner instead of
polling it for completion. Aren't the QBlit routines too slow?
A: The QBlit/QBSBlit system was completely re-written for 3.0, and now
has quite low overhead.
Q: How do I determine elapsed time in my game?
A: A simple, low overhead way to determine elapsed time is to call ReadEClock.
This returns a 64 bit timer value which counts E Clocks, and returns how
many EClocks happen per second. If you use these results properly,
you can ensure that your game runs at the proper speed regardless of
CPU type, chip speed, or PAL/NTSC clocking.
A1200 speed issues
The A1200 has a fairly large number of wait-states when accessing chip-ram.
ROM is zero wait-states. Due to the slow RAM speed, it may be better to use
calculations for some things that you might have used tables for on the A500.
Add-on RAM will probably be faster than chip-ram, so it is worth segmenting
your game so that parts of it can go into fast-ram if available.
For good performance, it is critical that you code your important loops
to execute entirely from the on-chip 256-byte cache. A straight line loop 258 bytes
long will execute far slower than a 254 byte one.
The '020 is a 32 bit chip. Longword accesses will be twice as fast when they
are aligned on a long-word boundary. Aligning the entry points of routines on
32 bit boundaries can help, also. You should also make sure that the stack is always
long-word aligned.
Write-accesses to chip-ram incur wait-states. However, other processor
instructions can execute while results are being written to memory:
move.l d0,(a0)+ ; store x coordinate
move.l d1,(a0)+ ; store y coordinate
add.l d2,d0 ; x+=deltax
add.l d3,d1 ; y+=deltay
will be slower than:
move.l d0,(a0)+ ; store x coordinate
add.l d2,d0 ; x+=deltax
move.l d1,(a0)+ ; store y coordinate
add.l d3,d1 ; y+=deltay
The 68020 adds a number of enhancements to the 68000 architecture, including
new addressing modes and instructions. Some of these are unconditional speedups, while
others only sometimes help.
Adressing modes
- Scaled Indexing
The 68000 addressing mode (disp,An,Dn) can have a scale
factor of 2,4,or 8 applied to the data register on the 68020. This is totally
free in terms of instruction length and execution time. An example is:
68000 68020
----- -----
add.w d0,d0 move.w (0,a1,d0.w*2),d1
move.w (0,a1,d0.w),d1
- 16 bit offsets on An+Rn modes
The 68000 only supported 8 bit displacements
when using the sum of an address register and another register as a memory
address. The 68020 supports 16 bit displacements. This costs one extra cycle
when the instruction is not in cache, but is free if the instruction is in
cache. 32 bit displacements can also be used, but they cost 4 additional clock
cycles.
- Data registers can be used as addresses.
(d0) is 3 cycles slower than (a0),
and it only takes 2 cycles to move a data register to an address register,
but this can help in situations where there is not a free address register.
- Memory indirect addressing.
These instructions can help in some circumstances
when there are not any free register to load a pointer into. Otherwise,
they lose.
New instructions
- Extended precision divide an multiply instructions
The 68020 can perform
32x32->32, 32x32->64 multiplication and 32/32 and 64/32 division. These
are significantly faster than the multi-precision operations which are
required on the 68000.
- EXTB. Sign extend byte to longword
Faster than the equivalent EXT.W EXT.L sequence on the 68000.
- CMPI and TST
Compare immediate and TST work in program-counter relative mode on the 68020.
- Bit field instructions
BFINS inserts a bitfield, and is faster than 2 MOVEs
plus and AND and an OR. This instruction can be used nicely in fill routines
or text plotting. BFEXTU/BFEXTS can extract and optionally sign-extend a bitfield
on an arbitrary boundary. BFFFO can find the highest order bit set in a field.
BFSET, BFCHG, and BFCLR can set, complement, or clear up to 32 bits at arbitrary
boundaries.
- Shift instructions
On the 020, all shift instructions execute in the same amount of time,
regardless of how many bits are shifted. Note that ASL and ASR are slower
than LSL and LSR. The break-even point on ADD Dn,Dn versus LSL is at
two shifts.
Hardware resources
- Blitter
Use OwnBlitter()/DisownBlitter() to claim and relinquish ownership of
the blitter.
You must use the graphics.library WaitBlit(). This is as fast as
possible, uses no CPU registers, and knows about blitter bugs. You
cannot possibly write one that is more efficient and works on all
Amigas.
- Copper
If you really have to take over the copper, get the LoadView(NULL),
do 2 WaitTOF()s, and then install your own copperlists in the cop1/2jmp
registers. I do not recommend this though. Future chipsets may have
faster and more efficient coppers with 32 bits, and we will
want to use these. If you load the old copper registers behind
graphics' back, we have no way of switching back to the old 16-bit mode.
temp=GfxBase->ActiView;
LoadView(NULL);
WaitTOF();
WaitTOF();
/* custom.cop1lc = ??? */
...
WaitTOF();
WaitTOF();
LoadView(temp);
custom.cop1lc=GfxBase->copinit;
- Audio
Use the Audio device. There are functions to change the volume, period,
frequency, data etc that is played on any of the channels. If you must
hit the audio hardware, you can ask for the channel you need with the
highest priority (127), and the audio channel will never be stolen from
you until you give the channel back to the system.
- Timers
Use the timer device. Some of the timer.device functions work as
libraries, and so are easy to use. This allows you to be compatible
should we use a 3rd cia time, say.
The vertical blank can be used as a special low-frequency timer. See
below.
CIA timers can be allocated via the resource allocation calls. The
"Resources" chapter of the V37 RKM: Devices manual has a good example.
- Input
Input will usually come from keyboard, mouse, joystick, infra-red etc.
Mouse and joystick can be easily read from the hardware keyboard input
could come from the keyboard.device, which knows how to handle keyboard
timings, but it is easier by far to open an intuition window and read
either RAWKEY or VANILLAKEY IDCMP messages. These either give the raw
key number pressed, or the character the key pressed represents
(useful for international games).
- Interrupts
Set up interrupt servers with high priority. Your server will
then be the first called.
- Disk drives
Just use the DOS.library. It's so much easier, works on all possible
drives, past, present and future, and makes s/w so much more friendly
to the user. Floppy based copy protection can be accomplished by
allocating the blitter and inhibiting the drive while checking for
the key track.
Do's and Don'ts
- DO clear unused bits when writing, and mask out unused or unneeded bits
when reading.
- DON'T use timing loops. The reasons should be obvious.
- DON'T write self-modifying code unless you know how instruction caches
work.
- DON'T steal memory. You can always call CloseWorkbench().
- If you are hardware banging, don't assume anything about the initial
contents of the display registers when your program is started.
Initialize everything.
- If using ViewPorts, be sure to have a properly allocated ViewPortExtra.
Some graphics calls are faster when one is present.
- DO note well the warning around the copinit structure.
CPU Differences
- Caches.
- Copyback and write-through modes.
- Access to CHIP RAM.
- '020, '030, '040 instruction and effective addresses.
- MMUs and FPUs
Conclusion
Writing OS-friendly games benefits you, the developer, and the gamers who buy your games.
It ensures the viability of your code on all Amiga platforms, not just the one you used for
development. By allowing the OS to do the mundane details of opening screens and windows, you
give yourself more time to do the creative portions of your game. It does not matter how fast
your game is if the gameplay is lousy and the artwork looks like a finger painting from a
pre-schooler.
An OS-friendly game is no longer something to be avoided. We've provided ample tools in
the OS to support your gaming needs, and we will continue to do so. The sooner your write
OS-friendly games, the sooner you'll be prepared for the future.
Uploaded on Sunday 2021-02-14