Commodore Hacking Issue 17

Published in

· 25 Apr 2019

  

                   ######## 
             ################## 
         ######            ###### 
      ##### 
    #####  ####  ####      ##      #####   ####  ####  ####  ####  ####   ##### 
  #####    ##    ##      ####    ##   ##   ##  ###     ##    ####  ##   ##   ## 
 #####    ########     ##  ##   ##        #####       ##    ## ## ##   ## 
#####    ##    ##    ########  ##   ##   ##  ###     ##    ##  ####   ##   ## 
#####  ####  ####  ####  ####  #####   ####  ####  ####  ####  ####   ###### 
#####                                                                    ## 
 ######            ######           Issue #17 
   ##################            November 15, 1998 
       ######## 

 
............................................................................... 

	"The words of the wise are as goads, and as nails fastened by 
	 the masters of assemblies."  Ecclesiastes 12 

	"Before criticizing a man I try to walk a mile in his shoes.  That 
	 way, if he gets mad he's a mile away and barefoot."  John Ianetta 

............................................................................... 

BSOUT 

	For me, fall is a time for reflection.  The trees descend into their  
golden time and then seem to die.  And yet, under the surface they are quite  
alive, and teeming with activity at a smaller, less-visible scale, waiting to  
burst forwards again in full bloom.  I think there's a great metaphor for C=  
in this.  But I have no idea what it is. 
	In fact things are totally hectic around here, and I haven't given 
more than a few moments thought towards the 64, so this will be a mighty 
short editorial.  Between a PhD thesis and begging for jobs there hasn't been 
much 64-time, but with a little here and a little there this issue is finally 
squeaking out.  Everybody worked hard over the summer, and my goal was 
to get it out in September.  Well, you know, these days, if something in 
the 64 world is only two months late it's doing pretty good, so no big whoop. 
C=Hacking ought to appear reguarly after December, though. 
	On the down side, some things, such as the challenge problem from 
last time, will have to wait until the next issue.  I also stayed pretty 
low-key while putting this issue together, but future issues will be more  
public in soliciting articles (e.g. on comp.sys.cbm). 
	In other news, a C64 C-compiler has finally appeared!  This outstanding 
effort is the work of Ullrich von Bassewitz, uz@musoftware.de, the force  
behind Elite128 among other things.  The cc65 webpage is at 

        http://www.von-bassewitz.de/uz/cc65/ 

so have a look, and tell Ullrich what a stellar guy he is :).  Also, as 
most of you know, the Chicago Expo took place on October 24, and was a real 
hoot!  Check out  

	http://driven.c64.org/ 

for some nice pictures from the Expo, taken by Mark Seelye. 

	Meanwhile, sit back, relax, and enjoy these latest musings from  
these, our masters of assembly. 

....... 
.... 
.. 
.                                    C=H #17 

::::::::::::::::::::::::::::::::::: Contents :::::::::::::::::::::::::::::::::: 

BSOUT 
	o Voluminous ruminations from your unfettered editor. 

 
Jiffies 
	o Is it a bug or a feature? 

 
The C=Hallenge 

	o To be continued... 

 
Side Hacking 

	o "SuperCPU Software Repair", by S. Judd <sjudd@nwu.edu>. 
	   An amateur's excursion into correcting errant wares. 

 
Main Articles 

	o "An Optimizing Hybrid LZ77 RLE Data Compression Program, aka 
	  Improving Compression Ratio for Low-Resource Decompression", 
	  by Pasi 'Albert' Ojala <albert@cs.tut.fi> 

	  Part two of a two-part article on data compression, giving a 
	  detailed description of the compression algorithms used in  
	  pucrunch, not to mention the decompression code. 

	o "VIC-20 Kernel ROM Disassembly Project", by Richard Cini 
	  <rcini@email.msn.com> 

	  This is the first in a series of articles which aims to 
	  present a complete, commented disassembly of the VIC-20 
	  ROMs. 

	o Masters Class: "NTSC/PAL fixing, part I", by Russell Reed  
	  <rreed@egypt.org>, Robin Harbron <macbeth@tbaytel.net>, and S. Judd. 

	  Sit up straight and pay attention.  In the Masters Class, a 
	  Commodore luminary attempts to instruct a couple of ignorant 
	  plebians in his art.  In this case, Robin and I set out to 
	  learn NTSC/PAL fixing from one of the greats, Decomp/Style. 
	  Our first fix, a demo from the obscure Finnish group Pu-239, 
	  is included, along with detailed descriptions of our experiences. 

	o "The Herd Mentality", by Bil Herd <bherd@zeus.jersey.net> 

	  This is a collection of entertaining musings on Commodore and the  
	  development of the C128, as provided by Bil Herd (and that's no 
	  bull).  If you don't know who Bil Herd _is_, why not type 
	  SYS 32800,123,45,6 on a 128 sometime... 

	   
.................................. Credits ................................... 

Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd 
C=Hacking logo by.......................... Mark Lawrence 

Special thanks to Marko Makela, Olaf Seibert, and the rest of the cbm-hackers 
for their many otherwise unacknowledged contributions. 

Legal disclaimer: 
	1) If you screw it up it's your own damn fault!   
	2) If you use someone's stuff without permission you're a dork! 

For information on the mailing list, ftp and web sites, send some email 
to chacking-info@jbrain.com. 

About the authors: 

Pasi 'Albert' Ojala is a 29 year old software engineer, currently 
working at a VLSI design company on a RISC DSP core C compiler. 
Around 1984 a friend introduced him to the VIC-20, and a couple 
of years later he bought a 64+1541 to replace a broken Spectrum48K. 
He began writing his own BBS, using ML routines for speed, and 
later wrote a series of demos under the Pu-239 label.  In addition 
to pucrunch and his many C=Hacking articles, Pasi was written an Amiga 
1581 filesystem, a graphics conversion package, a C64 burstloader, and a  
number of demos.  He is currently uses his 64 for hobbyist pursuits, and 
is contemplating multipart demos for the 64 and the VIC-20, in addition 
to future C=Hacking articles.  Pasi is also a huge Babylon-5 fan, and 
has a B5 quote page at http://www.cs.tut.fi/~albert/Quotes/B5-quotes.html 

Richard Cini is a 31 year old senior loan underwriter who first became 
involved with Commodore 8-bits in 1981, when his parents bought him a 
VIC-20 as a birthday present.  Mostly he used it for general BASIC 
programming, with some ML later on, for projects such as controlling 
the lawn sprinkler system, and for a text-to-speech synthesyzer.  All  
his CBM stuff is packed up right now, along with his other "classic"  
computers, including a PDP11/34 and a KIM-1.  In addition to collecting 
old computers Richard enjoys gardening, golf, and recently has gotten 
interested in robotics.  As to the C= community, he feels that it 
is unique in being fiercely loyal without being evangelical, unlike 
some other communities, while being extremely creative in making the  
best use out of the 64. 

Robin Harbron is a 26 year old internet tech support at a local 
independent phone company.  He first got involved with C= 8-bits 
in 1980, playing with school PETs, and in 1983 his parents convinced 
him to spend the extra money on a C64 instead of getting a VIC-20. 
Like most of us he played a lot of games, typed in games out of 
magazines, and tried to write his own games.  Now he writes demos, 
dabbles with Internet stuff, writes C= magazine articles, and, yes, 
plays games.  He is currently working on a few demos and a few games, 
as well as the "in-progress-but-sometimes-stalled-for-a-real-long-time- 
until-inspiration-hits-again Internet stuff".  He is also working on 
raising a family, and enjoys music (particularly playing bass and guitar),  
church, ice hockey and cricket, and classic video games. 

 
................................... Jiffies .................................. 

 
0 REM SHIFT-L 
by the cbm.hackers 

	Everybody knows the old REM shift-L trick in BASIC 2.0, which 
generates a syntax error upon listing.  But why does it work?  The 
answer turns out to be quite interesting. 
	Normally, when the BASIC interpreter tokenizes a line it strips  
out characters with the high bit set.  One exception is characters within 
quotes; the other is characters after REM.  In those cases, characters 
are embedded literally into the program line. 
	Now, BASIC tokens all have the high bit set.  When LIST 
encounters a character with the high bit set, it checks whether 
it is in quote mode.  If it is, the character is outputted as normal. 
If not, the character is treated as a token, which is expanded by 
QPLOP (located at $A717).  The part of QPLOP which prints keywords looks  
like this: 

:LOOP1   DEX              ;Traverse the keyword table 
         BEQ :PLOOP 
:LOOP2   INY              ;read a keyword 
         LDA RESLST,Y 
         BPL :LOOP2 
         BMI :LOOP1 
:PLOOP   INY              ;Print out the keyword 
         LDA RESLST,Y 
         BMI LISTENT1     ;Exit if on last char 
         JSR $AB47        ;Print char, AND #$FF 
         BNE :PLOOP 

The keyword strings in RESLST all stored dextral character inverted 
(the last char has the high bit set), and the above code just moves  
forward through the table until it has counted out .X keywords.  At 
that point, :PLOOP prints out the next keyword to the screen, and 
hops back into LIST. 

	Shift-L is character code 204, or $CC.  When LIST sees this 
inside of a REM, it sees the high bit set and de-tokenizes it. 
As it turns out, though, the last token is $CB, which is keyword GO 
(so that GO TO works).  It also turns out that RESLST, the list of 
BASIC keywords, uses 255 characters.  The 256th character is zero 
(value zero, not character zero). 
	So, the above code goes through the list, and then prints 
out token $CC, the first character of which is a null.  This zero 
is sent to $AB47.  $AB47 sends it to JSR $FFD2 (which does nothing 
with the character), performs an AND #$FF, and exits.  But that makes  
the BNE :PLOOP branch _not_ get taken, and the code erroneously moves  
forwards into...  the code which executes the FOR statement! 
	And the first thing FOR does is look for a valid variable. 
When you LIST a program, the character immediately following the 
LIST is a statement terminator (a colon : or else an end of line null). 
LET (which is called by FOR) reads this character, decides it's an 
invalid variable name, and generates a ?SYNTAX  ERROR. 
	Because QPLOP uses absolute addressing (LDA RESLST,Y), .Y 
can wrap around through 255 to traverse the list again.  This is 
why shift-M shift-N etc. all generate valid keywords.  Only shift-L 
will force an error, and it is all due to the zero in the keyword 
table. 
	Similar things can happen in other BASICs.  In BASIC 4.0, 
token $DB does the trick.  In BASIC 1.0, $CB ought to do it. 
The problem seems to have been fixed in BASIC 7.0; at least the 
trick doesn't seem to work on a 128. 

	Finally, like most things on the 64, embedding tokens into 
REM statements can naturally be put to some creative use.  For example, 
RUN once ran a contest where readers submitted stories and riddles 
and such, which were read by LISTing the program.  They were very clever 
and entertaining, and I close this summary with the one I've remembered  
since high school: 

10 REM WHAT'S AN APPLECOSTA? 
20 REM {C=-V}T A {INST CTRL-0}E 

 
............................... The C=Hallenge ............................... 

Wait until next time! 

 

................................ Side Hacking ................................ 

		SuperCPU software repair 
		---------------------------> by S. L. Judd 

	One of the feature articles in this issue deals with NTSC/PAL 
fixing.  But have you ever thought about SCPU fixing?  You know how 
it goes: you have that program that could really benefit from 
the speed boost, but doesn't work, and usually because of some silly 
little thing. 
	Well, it really bugs me to have programs not like my nice 
hardware for dumb reasons, so I decided I would try my hand at fixing  
up some programs.  The one that really did it for me was the game  
"Stunt Car Racer" -- I had never played it before, but after getting  
ahold of it it was clear that here was a game that would be just great  
with a SuperCPU.  I had never done something like this before, but it  
seemed a doable problem and so I jumped in head first, and this article 
sums up my inexpert experience to date. 

	By the way, stuntcar-scpu is totally cool :). 

	To date I have fixed up just three games: Stunt Car Racer, Rescue 
on Fractalus, and Stellar 7.  My goal was really to "CMD-fix" these programs, 
to make them run off of my FD-2000 as well as my SCPU.  Although these are  
all games, the techniques should apply equally well to application programs 
with a bad attitude.  Before discussing the fixes, it is probably worthwhile 
to discuss a few generalities. 

	I also note that programmers who don't have a SuperCPU might find 
some of this information helpful in designing their programs to work with 
SCPUs. 

	Finally, my fixes are available in the Fridge. 

Tools and Process 
----------------- 

The tools I used were: 

  o Action Replay 
  o Wits 
  o Paper for taking notes (backs of receipts/envelopes work) 

I think this is all that is necessary, although a good sector editor 
can come in handy for certain things. 

After trying a number of different approaches to the problem, the process 
I've settled on goes roughly like the following: 

	- Have an idea of what will need fixing 
	- Familiarize yourself with the program 
	- Track down the things that need fixing 
	- Figure out free areas of memory 
	- Apply patches, and test 

	Most programs work in more or less the same way: there are 
some initialization routines, there's a main loop, and there's an  
interrupt routine or series of routines.  The interrupts are easy to  
find, via the vectors at either $FFFA or at $0314 and friends.  The  
initialization routine can be tougher, but can be deduced from  
the loader or decompressor; also, some programs point the NMI vector to  
the reset code, so that RESTORE restarts the program.  Finding the 
things that need fixing usually involves freezing the program at the 
appropriate time, and doing a little disassembly.  Sometimes a hunt for 
things like LDA $DC01 is helpful, too.  Figuring out free areas of 
memory is easy, by either getting a good feel for the program, or 
filling some target memory with a fill byte and then checking it 
later, to see if it was overwritten.  Once the patch works on the 64, 
all that remains is to test it on the SCPU, and it's all done! 

Diagnosis 
--------- 

	It seems to me that, at the fundamental level, the SCPU is different  
from a stock machine in three basic ways: it is a 65816, it runs at 20MHz,  
and it has hardware registers/different configurations.  There are also  
some strange and mysterious problems that can arise. 

	All possible opcodes are defined on the 65816, which means that 
"illegal" or quasi-opcodes will not work correctly.  On the 65xx chips, the  
quasi-opcodes aren't real opcodes -- they are like little holes in the cpu,  
and things going through those holes fall through different parts of the  
normal opcode circuitry.  Although used by very few programs, a number of  
copy protection schemes make use of them, so sometimes the program works fine 
with a SCPU but the copy protection makes it choke -- how very annoying  
(example: EA's Lords of Conquest).  Naturally, disk-based protection methods  
mean it won't work on an FD-2000, either. 

	Running at 20Mhz makes all sorts of problems.  Any kind of software 
loop will run too fast -- delay loops, countdown loops, input busy-loops, 
etc.  Also main program loops, so that the game runs unplayably fast 
(most 3D games never had to worry about being too fast).  It can also 
lead to flickering screens, as we shall see later, and the "play" of some  
games is designed with 1Mhz in mind -- velocities, accelerations, etc.   
What looks smooth at the low frame rate might look poor at the high, as we  
shall also see later.  Finally, fastloaders invariably fail at 20Mhz, 
like any other code using software-based timing. 

	The SuperCPU also has a series of configuration registers located 
at $D07x and $D0Bx, which determine things like software speed and VIC 
optimization modes (which areas of memory are mirrored/copied to the C64's  
RAM).  Note also that enabling hardware registers rearranges $E000 ROM  
routines.  Although it is possible for programs to accidentally reconfigure 
the SCPU, it is awfully unlikely, since the enable register, which switches 
the hardware registers in, is sandwiched between disable registers: 

	$D07D	Hardware register disable 
	$D07E	Enable hardware registers 
	$D07F	Hardware register disable 

Strangely enough, though, different hardware configurations can sometimes  
cause problems.  For example, newer (v2) SCPUs allow selective mirroring of  
the stack and zero page, and by default have that mirroring turned OFF.   
For some totally unknown reason, this caused major problems with an early  
attempt of mine to fix Stunt Car Racer -- I am told that the old version  
would slow down to just double-speed, flicker terribly, and more.  Turning  
mirroring back on apparently fixes up the problem.  (I have an older SCPU,  
and hence did not have this problem).  So before going after a big fix, it  
is worthwhile to invest a few minutes in trying different configurations. 

	Finally, there are other strange problems that can arise.  For 
example, I have two 128s: one is a flat 128, one a 128D.  With my 128D,  
if $D030 is set then the SCPU sometimes -- but not always -- freaks out  
and locks up.  The flat 128 does not have this problem.  One reason this  
is important is that many decompressors INC $D030 to enable 2MHz mode. 
A simple BIT ($2C) fixes this problem up, but the point is that the SCPU has  
to interact with the computer, so perhaps that interaction can lead to  
problems in obscure cases. 

	Now, if the goal is to CMD-fix the program, there may be a few 
disk-related things that may need fixing.  In addition to stripping out 
(or possibly fixing up) any fastloaders, most programs annoyingly assume 
drive #8 is the only drive in town.  Also, if the program uses a track-based  
loader (instead of a file-based loader), then that will need to fixed up 
as well, and any disk-based copy protection will have to be removed. 

	There's one other thing to consider, before you fix: is the 
program really busted?  For example, if you've tried a chess program 
with the SCPU, chances are that you saw no speed improvement.  Why 
not?  It turns out that most chess programs use a timer-based search 
algorithm -- changing the playing strength changes the amount of 
time the program spends searching, and not the depth of the search. 
(The reason is to make the gameplay flow a little better -- otherwise 
you have very slow play at the beginning, when there are many more 
moves to consider).  So although it might look like it isn't working 
right with the SCPU, it is actually working quite well. 

	And that pretty much covers the basic ideas.  The first program 
I fixed up was Stunt Car Racer. 

Stunt Car Racer 
--------------- 

	Stunt Car Racer, in case you don't know, is a 3D driving game, 
and quite fun.  It is also too fast, unplayably fast, at 20MHz.  Therefore,  
it needs to be slowed down!   
	My first plan... well, suffice to say that most of my original 
plans were doomed to failure, either from being a bad idea, or from 
poor implementation.  It is clear enough that some sort of delay is 
needed, though, in the main loop, or perhaps by intercepting the joystick 
reading routine. 
	The program has a main loop and an interrupt loop as well. 
The interrupt handles the display and other things, and all of the 
game calculations are done in the main loop, which flows like 

	Do some calculations 
	Draw buffer 1 
	Swap buffers 
	Do some calculations 
	Draw buffer 2 
	Swap buffers 
	JMP loop 

One of my first thoughts was to intercept the joystick I/O, which is 
easy to find by hunting for LDA $DC01 (or DC00, whichever joystick 
is used).  The patch failed, and possibly because I didn't check that  
the memory was safe, and possibly because it was in the interrupt routine 
(I simply don't remember). 
	Before patching, it is very important to make sure that the 
patch will survive, and not interfere with the program, so it is 
very important to find an area of memory that is not used by the 
program.  It took me a little while to figure this out!  Finding 
unused memory was pretty easy -- I just filled the suspect areas with 
a fill byte, ran the program, and checked that memory.  Mapping out the  
memory areas also aids in saving the file, as un-needed areas don't 
need to be saved, or can be cleared out to aid compression. 
	The first free area of memory I found was at $C000.  It turns 
out that this is a sprite, though, and so put some garbage on the 
screen.  The second I tried was $8000, which worked great in practice 
mode but got overwritten in competition mode -- always test your 
patches thoroughly!  (I had only tested in practice mode).  Finally, 
I found a few little spots in low memory that survived, and placed the  
patch there.  The program does a whole lot of memory moving, and uses  
nearly all memory.  I also left some initialization code at $8000, since 
it only needed to be run once, at the beginning (to turn on mirroring  
in v2 SCPUs). 
	Recall that the main loop has two parts -- one for buffer 1, and  
one for buffer 2.  The trick is to find some code that is common to both  
sections, like a subroutine call: 

	JSR SomeRoutine 
	Draw buffer 1 
	JSR SomeRoutine 
	Draw buffer 2 

The patch routine I used was a simple delay loop, redirected from those 
two JSRs: 

	LDX #$FF 
	CPX $D012 
	BNE *-5 
	DEX 
	CPX #$FC 
	BNE *-10 
	JMP SomeRoutine 

Of course, this will also slow the program down at 1Mhz; later on I became 
smarter about my patches, but this one works pretty well. 
	To save the game and patches, I simply froze it from AR.  Just 
saving from the monitor generally failed; the initialization routine 
doesn't initialize all I/O settings.  Part of the freezing process 
involves RLE compression, so if you freeze it is a good idea to 
fill all unused portions of memory -- temporary areas, bitmaps, etc. 
Another thing to do is to set a freeze point at the init routine, 
and then JMP there from the monitor.  By clearing the screen, you 
won't have to look at all the usual freezer ugliness, and at this 
point freezing isn't any different than saving from the ML monitor 
and RLE-packing the file.  Once saved, I tested a few times from the 
64 side, to make sure things worked right. 

	Whether freezing or saving from the monitor, if the file size 
is larger than 202 blocks or so, it can't be loaded on the SCPU without 
a special loader -- unless you compress it first.  I naturally recommend 
using pu-crunch for that purpose, but if you want to do it on the 64 
then I recommend using ABCrunch, which works well with the SCPU and 
gives about as good compression as you can get without an REU. 

	The result was stuntcar-scpu, which is *awfully* fun when fixed. 

 
Rescue on Fractalus 
------------------- 

	Next on my list was Rescue on Fractalus, an older (and quite cool)  
Lucasfilm game that just didn't cut it in the 64 conversion, for a number 
of reasons (that perhaps could have been avoided).  There are at least two 
versions of the game, one of which doesn't even work on a 128 (good 'ol $D030), 
but I have the older version, which does work. 

	With a SuperCPU, though, there are a number of problems.  The display 
flickers terribly.  The gameplay is smooth and not at all too fast -- in fact, 
it is too slow.  Specifically, the velocities and turning rates and such do 
not give a convincing illusion of speed or excitement.  The game is copy- 
protected and uses a track-based fastloader, loaded from disk via B-E, which 
also saves the high scores to disk.  Clearly, this one is a bigger job: the  
display is too fast, the game constants need adjusting, and the highscore code 
needs to be replaced by some kernal calls. 

	The structure of this code is a little different.  The main loop 
handles the (double-buffered) display -- it does all the calculations and  
draws to the two buffers.  The multi-part interrupt loop does the rest --  
it swaps buffers, changes the display in different parts of the screen, 
reads the joystick, and performs the game updates which change your 
position and direction.  It also handles enemies such as saucers, but 
doesn't handle the bunkers which fire at you from the mountains (the main 
loop takes care of those). 
	What does all this mean?  First, that the game can be a good ten 
steps ahead of the screen, which makes things like targeting very 
difficult.  Second, that the bunkers almost never fire at you at 1MHz 
(they go crazy at 20).  Third, that things like velocity and turning 
rate are rather low, because advancing or turning too quickly would 
get the game way out of sync (unfortunately, they are still too fast 
for 1MHz, making targeting difficult and movement clunky).  On the 
other hand, having the movement in the interrupt is the reason that 
the game does not become unplayably fast at 20MHz, and means that 
something besides a delay loop is needed. 
	The interrupt swaps buffers, but the main loop draws them, 
and because it draws so quickly it can start clearing and drawing to  
the visible buffer.  To make sure this was what I was seeing, I reversed 
the buffer swap code in the interrupt, so that the drawing buffer was 
always on-screen.  Sure enough, that's what the 20Mhz version looked 
like. 
	It turned out to be pretty easy to force the main loop to wait 
on the interrupt.  Although I messed around (unsuccessfully) with 
intercepting the interrupt loop, the buffer swap code actually 
modifies a zero-page variable upon swapping.  So all the main loop 
has to do is wait on that variable before charging ahead.  I may have 
made it wait for two frames, because it made the game play a little 
better. 

	Now, how to find the velocity and turn code?  Well it takes 
a keypress to change the velocity, so by hunting for LDA $DC01, and 
tracing back, the routine can be found; at the very least the 
affected variables may be found, and hunted for.  For example, if 
the result is stored in $D0, then you can search for LDA $D0.  The 
point is to locate the keypress processing code.  From there, a little 
trial and error (setting freeze points and pressing the velocity key) 
locates the piece of code which deals with changing the velocity, and  
in particular which variable corresponds to velocity.  Finally, from  
there it just takes another hunt for LDA velocity, ADC velocity, etc. 
to figure out where the code for updating position and direction is. 
	In this case, I was pretty sure I had found it, as it went 
something like 

	LDA velocity 
	LSR 
	LSR 
	ADC #$20 

and this was added to the position.  To check that this was the code, 
I just changed the ADC, or removed an LSR, to see that the speed changed. 
The code for turning left and right and moving up and down was similar, 
and again after a little trial and error it was clear what code did 
what.  Again, it wasn't necessary to totally understand how these 
routines worked exactly -- just the general idea of them, in this case 
to see that a multiple of the velocity was used to change the position 
and orientation of the player. 
	So, to fix it up, I just changed that multiple -- probably I 
NOPed out an LSR above, to basically double the speed, and changed the  
turning rates similarly.  This took a little experimentation, as it 
not only needed to be playably fast, but also couldn't overflow at 
high speeds, etc. 

	But once that was working, all that remained was the highscore 
table.  Finding the table location was pretty easy -- I just got a high 
score, and while entering my name froze the program, and figured out 
what got stored where.  From there it was pretty easy to figure out 
what was saved to disk.  From the original loader, I also knew where 
the highscores needed to be loaded to initially (the highscore table 
gets copied around a lot -- it doesn't just stay at that one location). 
Figuring out the exact number of bytes to save took a little bit of 
effort (saving either too many or too few bytes screws it up), but 
from there it was clear what memory needed to be saved. 
	So all that remained was to add the usual SETLFS etc. kernal 
calls, right?  Wrong.  The program uses all the usual kernal variables 
(from $90-$C0) for its own purposes.  Also recall that I wanted the 
program to work with device 9, etc.  To get around this, I did two 
things.  First, when the program first starts, I save some of the 
relevant variables to an unused part of memory -- in particular, I 
save the current drive number.  Second, before saving the highscore 
file, I actually copy all zero page variables from $90-$C2 or so 
to a temporary location, and then copy them back after saving. 
That way there are no worries about altering important locations. 
	Finding memory for the load/save patch was easy -- I just used 
the area which was previously used for the fastload load/save code. 
There was enough for the necessary code as well as temporary space 
for saving the zero page variables. 

	Finally, I changed some text from Rescue on Fractalus to 
Behind Jaggi Lines, to distinguish it from the original, and that 
was that.  Works great!  And is now more playable and challenging; 
in short, more the game it always should have been. 

Stellar 7 
--------- 

	Finally, I tried my hand at Stellar 7.  Stellar 7 had several 
problems.  At the main screen, a delay loop tosses you to the mission 
screen after a while, if no keys are pressed.  This is a software loop, 
and so passes very quickly.  The game itself is too fast, so some sort 
of delay is needed.  The mission display is also too fast, and has 
software delay loops, so that needs fixing.  Finally, the game uses  
kernal calls for loading and saving, but is attached to drive #8; 
also, my version was split into a bunch of files, and I wanted to 
cut the number of files down. 

	Well, by this time it was all pretty straightforward.  From 
the loader, it was easy to figure out which files went where.  The 
mission and main displays were loaded in when needed, and swapped 
into unused parts of memory when not, so I loaded them in and 
adjusted the swap variable accordingly -- this left just the highscore 
and seven level files. 

	Finding the delay loops was easy -- I just went to the relevant 
sections of code, froze, and took a look at the loops.  There were your 
basic 

:LOOP 
	LDA $D4		;Check for keypress 
	BMI :key 
	DEX 
	BNE :LOOP 
	DEY 
	BNE :LOOP 
	DEC counter 
	BNE :LOOP 
:key 	LDX #$00 
	... 

Luckily, all routines were pretty much the same as the above.  The 
interrupt routine is in the $0314 vector, and the same routine is 
used during gameplay.   
	So the patch is very easy at this point.  First, change the 
IRQ code which does a JMP $EA7B to JMP $CE00 

. CE00  $EE INC $CFFF 
. CE03  $4C JMP $EA7B 

To fix up the keypress routines, the idea is to change the LDA $D0 
into a JSR patch.  How to substitute 3 bytes for 2 bytes?  The 
trick is to place the LDX #$00 into the patch routine: 

. CE06  $20 JSR $CE15		;Wait for $CFFF 
. CE09  $A5 LDA $D4 
. CE0B  $10 BPL $CE11 
. CE0D  $A2 LDX #$00		;If key pressed, then LDX #$00 
. CE0F  $29 AND #$FF 
. CE11  $60 RTS 

The actual delay is accomplished by waiting on $CFFF: 

. CE15  $AD LDA $CFFF 
. CE18  $C9 CMP #$04 
. CE1A  $90 BCC $CE15 
. CE1C  $A9 LDA #$00 
. CE1E  $8D STA $CFFF 
. CE21  $60 RTS 

As you can see, I waited a (default) of 4 frames.  The patch in the 
game/mission rendering routine works similarly -- I just patched 
the rendering code to basically JSR $CE15.  I also decided to 
try something new: let the user be able to change that CMP #$04 
to make things faster or slower, to suit their tastes.  The keyscan 
values were pretty easy to figure out, so this just required a little  
patch to check for the "+" and "-" keys, and change location $CE19 
accordingly. 

	Well, that about sums it up.  Perhaps if you do some fixing, 
you might send me a little email describing your own experiences? 

....... 
.... 
.. 
.                                    C=H #17 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 

An Optimizing Hybrid LZ77 RLE Data Compression Program, aka 
Improving Compression Ratio for Low-Resource Decompression 
=========================================================== 
by Pasi Ojala <albert@cs.tut.fi> 

Short: 

Pucrunch is a Hybrid LZ77 and RLE compressor, uses an Elias Gamma Code for 
lengths, mixture of Gamma Code and linear for LZ77 offset, and ranked RLE 
bytes indexed by the same Gamma Code.  Uses no extra memory in decompression. 

 
-------------------------------------------------------------------------- 

 
Introduction 
------------ 

Since I started writing demos for the C64 in 1989 I have always wanted to 
program a compression program.  I had a lot of ideas but never had the 
time, urge or knowledge to create one.  In retrospect, most of the ideas I 
had then were simply bogus ("magic function theory" as Mark Nelson nicely 
puts it).  But years passed, I gathered more knowledge and finally got an 
irresistible urge to finally realize my dream. 

The nice thing about the delay is that I don't need to write the actual 
compression program to run on a C64 anymore.  I can write it in portable 
ANSI-C code and just program it to create files that would uncompress 
themselves when run on a C64.  Running the compression program outside of 
the target system provides at least the following advantages. 
  * I can use portable ANSI-C code.  The compression program can be 
    compiled to run on a Unix box, Amiga, PC etc.  And I have all the 
    tools to debug the program and gather profiling information to see 
    why it is so slow :-) 
  * The program runs much faster than on C64.  If it is still slow, there 
    is always multitasking to allow me to do something else while I'm 
    compressing a file. 
  * There is 'enough' memory available.  I can use all the memory I 
    possibly need and use every trick possible to increase the compression 
    ratio as long as the decompression remains possible on a C64. 
  * Large files can be compressed as easily as shorter files.  Most C64 
    compressors can't handle files larger than around 52-54 kilobytes 
    (210-220 disk blocks). 
  * Cross-development is easier because you don't have to transfer a 
    file into C64 just to compress it. 

 
Memory Refresh and Terms for Compression 
---------------------------------------- 

Statistical compression 
       Uses the uneven probability distribution of the source symbols 
       to shorten the average code length.  Huffman code and arithmetic 
       code belong to this group.  By giving a short code to symbols 
       occurring most often, the number of bits needed to represent 
       the symbols decreases.  Think of the Morse code for example:  the 
       characters you need more often have shorter codes and it takes 
       less time to send the message. 

Dictionary compression 
       Replaces repeating strings in the source with shorter 
       representations.  These may be indices to an actual dictionary 
       (Lempel-Ziv 78) or pointers to previous occurrences (Lempel-Ziv 
       77).  As long as it takes fewer bits to represent the reference 
       than the string itself, we get compression.  LZ78 is a lot like 
       the way BASIC substitutes tokens for keywords:  one-byte tokens 
       expand to whole words like PRINT#.  LZ77 replaces repeated 
       strings with (length,offset) pairs, thus the string VICIICI can 
       be encoded as VICI(3,3) -- the repeated occurrence of the 
       string ICI is replaced by a reference. 

Run-length encoding 
       Replaces repeating symbols with a single occurrence of the 
       symbol and a repeat count.  For example assembly compilers have 
       a .zero keyword or equivalent to fill a number of bytes with 
       zero without needing to list them all in the source code. 

Variable-length code 
       Any code where the length of the code is not explicitly known 
       but changes depending on the bit values.  Some kind of end 
       marker or length count must be provided to make a code a prefix 
       code (uniquely decodable).  Compare with ASCII (or Latin-1) text, 
       where you know you get the next letter by reading a full byte from  
       the input.  A variable-length code requires you to read part of the  
       data to know how many bits to read next. 

Universal codes 
       Universal codes are used to encode integer numbers without the 
       need to know the maximum value.  Smaller integer values usually 
       get shorter codes.  Different universal codes are optimal for 
       different distributions of the values.  Universal codes include 
       Elias Gamma and Delta codes, Fibonacci code, and Golomb and 
       Rice codes. 

Lossless compression 
       Lossless compression algorithms are able to exactly reproduce 
       the original contents unlike lossy compression, which omits 
       details that are not important or perceivable by human sensory 
       system.  This article only talks about lossless compression. 

My goal in the pucrunch project was to create a compression system in which 
the decompressor would use minimal resources (both memory and processing 
power) and still have the best possible compression ratio.  A nice bonus 
would be if it outperformed every other compression program available. 
These understandingly opposite requirements (minimal resources and good 
compression ratio) rule out most of the state-of-the-art compression 
algorithms and in effect only leave RLE and LZ77 to be considered.  Another 
goal was to learn something about data compression and that goal at least 
has been satisfied. 

I started by developing a byte-aligned LZ77+RLE compressor/decompressor and 
then added a Huffman backend to it.  The Huffman tree took 384 bytes and 
the code that decoded the tree into an internal representation took 100 
bytes.  I found out that while the Huffman code gained about 8% in my 
40-kilobyte test files, the gain was reduced to only about 3% after 
accounting the extra code and the Huffman tree. 

Then I started a more detailed analysis of the LZ77 offset and length 
values and the RLE values and concluded that I would get better compression 
by using a variable-length code.  I used a simple variable-length code and 
scratched the Huffman backend code, as it didn't increase the compression 
ratio anymore.  This version became pucrunch. 

Pucrunch does not use byte-aligned data, and is a bit slower than the 
byte-aligned version because of this, but is much faster than the original 
version with the Huffman backend attached.  And pucrunch still does very 
well compression-wise.  In fact, it does very well indeed, beating even 
LhA, Zip, and GZip in some cases.  But let's not get too much ahead of 
ourselves. 

To get an improvement to the compression ratio for LZ77, we have only a few 
options left.  We can improve on the encoding of literal bytes (bytes that 
are not compressed), we can reduce the number of literal bytes we need to 
encode, and shorten the encoding of RLE and LZ77.  In the algorithm 
presented here all these improvement areas are addressed both collectively 
(one change affects more than one area) and one at a time. 

 1. By using a variable-length code we can gain compression for even 
    2-byte LZ77 matches, which in turn reduces the number of literal 
    bytes we need to encode.  Most LZ77-variants require 3-byte matches 
    to get any compression because they use so many bits to identify 
    the length and offset values, thus making the code longer than the 
    original bytes would've taken. 
 2. By using a new literal byte tagging system which distinguishes 
    uncompressed and compressed data efficiently we can reduce number 
    of extra bits needed to make this distinction (the encoding 
    overhead for literal bytes).  This is especially important for 
    files that do not compress well. 
 3. By using RLE in addition to LZ77 we can shorten the encoding for 
    long byte run sequences and at the same time set a convenient 
    upper limit to LZ77 match length.  The upper limit performs two 
    functions: 
       + we only need to encode integers in a specific range 
       + we only need to search strings shorter than this limit (if we 
         find a string long enough, we can stop there) 
    Short byte runs are compressed either using RLE or LZ77, whichever 
    gets the best results. 
 4. By doing statistical compression (more frequent symbols get 
    shorter representations) on the RLE bytes (in this case symbol 
    ranking) we can gain compression for even 2-byte run lengths, 
    which in turn reduces the number of literal bytes we need to 
    encode. 
 5. By carefully selecting which string matches and/or run lengths to 
    use we can take advantage of the variable-length code.  It may be 
    advantageous to compress a string as two shorter matches instead 
    of one long match and a bunch of literal bytes, and it can be 
    better to compress a string as a literal byte and a long match 
    instead of two shorter matches. 

This document consists of several parts, which are: 
  * C64 Considerations - Some words about the target system 
  * Escape codes - A new tagging system for literal bytes 
  * File format - What are the primaries that are output 
  * Graph search - How to squeeze every byte out of this method 
  * String match - An evolution of how to speed up the LZ77 search 
  * Some results on the target system files 
  * Results on the Calgary Corpus Test Suite 
  * The Decompression routine - 6510 code with commentary 

 
-------------------------------------------------------------------------- 

 
Commodore 64 Considerations 
--------------------------- 

Our target environment (Commodore 64) imposes some restrictions which we 
have to take into consideration when designing the ideal compression 
system.  A system with a 1-MHz 3-register 8-bit processor and 64 kilobytes 
of memory certainly imposes a great challenge, and thus also a great sense 
of achievement for good results. 

First, we would like it to be able to decompress as big a program as 
possible.  This in turn requires that the decompression code is located in 
low memory (most programs that we want to compress start at address 2049) 
and is as short as possible.  Also, the decompression code must not use any 
extra memory or only very small amounts of it.  Extra care must be taken to 
make certain that the compressed data is not overwritten during the 
decompression before it has been read. 

Secondly, my number one personal requirement is that the basic end address 
must be correctly set by the decompressor so that the program can be 
optionally saved in uncompressed form after decompression (although the 
current decompression code requires that you say "clr" before saving). 
This also requires that the decompression code is system-friendly, i.e. 
does not change KERNAL or BASIC variables or other structures.  Also, the 
decompressor shouldn't rely on file size or load end address pointers, 
because these may be corrupted by e.g.  X-modem file transfer protocol 
(padding bytes may be added). 

When these requirements are combined, there is not much selection in where 
in the memory we can put the decompression code.  There are some locations 
among the first 256 addresses (zeropage) that can be used, the (currently) 
unused part of the processor stack (0x100..0x1ff), the system input buffer 
(0x200..0x258) and the tape I/O buffer plus some unused bytes (0x334-0x3ff). 
The screen memory (0x400..0x7ff) can also be used if necessary.  If we can  
do without the screen memory and the tape buffer, we can potentially  
decompress files that are located from 0x258 to 0xffff. 

The third major requirement is that the decompression should be relatively 
fast.  After 10 seconds the user begins to wonder if the program has crashed  
or if it is doing anything, even if there is some feedback like border color 
flashing.  This means that the arithmetic used should be mostly 8- or 9-bit 
(instead of full 16 bits) and there should be very little of it per each 
decompressed byte.  Processor- and memory-intensive algorithms like 
arithmetic coding and prediction by partial matching (PPM) are pretty much 
out of the question, and that is saying it mildly.  LZ77 seems the only 
practical alternative.  Still, run-length encoding handles long byte runs 
better than LZ77 and can have a bigger length limit.  If we can easily 
incorporate RLE and LZ77 into the same algorithm, we should get the best 
features from both. 

A part of the decompressor efficiency depends on the format of the 
compressed data.  Byte-aligned codes, where everything is aligned into byte 
boundaries, can be accessed very quickly; non-byte-aligned variable length 
codes are much slower to handle, but provide better compression.  Note that 
byte-aligned codes can still have other data sizes than 8.  For example you 
can use 4 bits for LZ77 length and 12 bits for LZ77 offset, which preserves 
the byte alignment. 

 
-------------------------------------------------------------------------- 

 
The New Tagging System 
---------------------- 

I call the different types of information my compression algorithm outputs 
primaries.  The primaries in this compression algorithm are: 
  * literal (uncompressed) bytes and escape sequences, 
  * LZ77 (length,offset)-pairs, 
  * RLE (length,byte)-pairs, and 
  * EOF (end of file marker). 

Literal bytes are those bytes that cannot be represented by shorter codes,  
unlike a part of previously seen data (LZ77), or a part of a longer 
sequence of the same byte (RLE). 

Most compression programs handle the selection between compressed data and 
literal bytes in a straightforward way by using a prefix bit.  If the bit 
is 0, the following data is a literal byte (uncompressed).  If the bit is 
1, the following data is compressed.  However, this presents the problem 
that non-compressible data will be expanded from the original 8 bits to 9 
bits per byte, i.e. by 12.5 %.  If the data isn't very compressible, this 
overhead consumes all the little savings you may have had using LZ77 or 
RLE. 

Some other data compression algorithms use a value (using variable-length 
code) that indicates the number of literal bytes that follow, but this is 
really analogous to a prefix bit, because 1-byte uncompressed data is very 
common for modestly compressible files.  So, using a prefix bit may seem 
like a good idea, but we may be able to do even better.  Let's see what we 
can come up with.  My idea was to somehow use the data itself to mark 
compressed and uncompressed data and thus not need any prefix bits. 

Let's assume that 75% of the symbols generated are literal bytes.  In this 
case it seems viable to allocate shorter codes for literal bytes, because 
they are more common than compressed data.  This distribution (75% are 
literal bytes) suggests that we should use 2 bits to determine whether the 
data is compressed or a literal byte.  One of the combinations indicates 
compressed data, and three of the combinations indicate a literal byte.  At 
the same time those three values divide the literal bytes into three 
distinct groups.  But how do we make the connection between which of the 
three bit patters we have and what are the literal byte values? 

The simplest way is to use a direct mapping.  We use two bits (let them be 
the two most-significant bits) _from the literal bytes themselves_ to 
indicate compressed data.  This way no actual prefix bits are needed.  We 
maintain an escape code (which doesn't need to be static), which is 
compared to the bits, and if they match, compressed data follows.  If the 
bits do not match, the rest of the literal byte follows.  In this way the 
literal bytes do not expand at all if their most significant bits do not 
match the escape code, and fewer bits are needed to represent the literal 
bytes. 

Whenever those bits in a literal byte would match the escape code, an 
escape sequence is generated.  Otherwise we could not represent those 
literal bytes which actually start like the escape code (the top bits 
match).  This escape sequence contains the offending data and a new escape 
code.  This escape sequence looks like 

     # of escape bits    (escape code) 
     3                   (escape mode select) 
     # of escape bits    (new escape bits) 
     8-# of escape bits  (rest of the byte) 
     = 8 + 3 + # of escape bits 
     = 13 for 2-bit escapes, i.e. expands the literal byte by 5 bits. 

Read further to see how we can take advantage of the changing escape code. 

You may also remember that in the run-length encoding presented in the 
previous article two successive equal bytes are used to indicate compressed 
data (escape condition) and all other bytes are literal bytes.  A similar 
technique is used in some C64 packers (RLE) and crunchers (LZ77), the only 
difference is that the escape condition is indicated by a fixed byte value. 
My tag system is in fact an extension to this.  Instead of a full byte, I 
use only a few bits. 

We assumed an even distribution of the values and two escape bits, so 1/4 
of the values have the same two most significant bits as the escape code. 
I call this probability that a literal byte has to be escaped the "hit rate". 
Thus, literal bytes expand in average 25% of the time by 5 bits, making the 
average length 25% * 13 + 75% * 8 = 9.25.  Not surprising, this is longer 
than using one bit to tag the literal bytes.  However, there is one thing 
we haven't considered yet.  The escape sequence has the possibility to 
change the escape code.  Using this feature to its optimum (escape 
optimization), the average 25% hit rate becomes the -maximum- hit rate. 

Also, because the distribution of the literal byte values is seldom flat  
(some values are more common than others) and there is locality (different  
parts of the file only contain some of the possible values), from which we  
can also benefit, the actual hit rate is always much smaller than that.   
Empirical studies on some test files show that for 2-bit escape codes the  
actual realized hit rate is only 1.8-6.4%, while the theoretical maximum  
is the already mentioned 25%. 

Previously we assumed the distribution of 75% of literal bytes and 25% of 
compressed data (other primaries).  This prompted us to select 2 escape 
bits.  For other distributions (differently compressible files, not 
necessarily better or worse) some other number of escape bits may be more 
suitable.  The compressor tries different number of escape bits and select 
the value which gives the best overall results.  The following table 
summarizes the hit rates on the test files for different number of escape 
bits. 

   1-bit   2-bit   3-bit   4-bit   File 
   50.0%   25.0%   12.5%   6.250%  Maximum 
   25.3%    2.5%    0.3%   0.090%  ivanova.bin 
   26.5%    2.4%    0.8%   0.063%  sheridan.bin 
   20.7%    1.8%    0.2%   0.041%  delenn.bin 
   26.5%    6.4%    2.5%   0.712%  bs.bin 
    9.06    8.32    8.15   8.050   bits/Byte for bs.bin 

As can be seen from the table, the realized hit rates are dramatically 
smaller than the theoretical maximum values.  A thought might occur that we 
should always select 4-bit (or longer) escapes, because it reduces the hit 
rate and presents the minimum overhead for literal bytes.  Unfortunately 
increasing the number of escape bits also increases the code length of the 
compressed data.  So, it is a matter of finding the optimum setting. 

If there are very few literal bytes compared to other primaries, 1-bit 
escape or no escape at all gives very short codes to compressed data, but 
causes more literal bytes to be escaped, which means 4 bits extra for each 
escaped byte (with 1-bit escapes).  If the majority of primaries are 
literal bytes, for example a 6-bit escape code causes most of the literal 
bytes to be output as 8-bit codes (no expansion), but makes the other 
primaries 6 bits longer.  Currently the compressor automatically selects 
the best number of escape bits, but this can be overridden by the user with 
the -e option. 

The cases in the example with 1-bit escape code validates the original 
suggestion:  use a prefix bit.  A simple prefix bit would produce better 
results on three of the previous test files (although only slightly).  For 
delenn.bin (1 vs 0.828) the escape system works better.  On the other hand, 
1-bit escape code is not selected for any of the files, because 2-bit 
escape gives better overall results. 

-Note:- for 7-bit ASCII text files, where the top bit is always 0 (like 
most of the Calgary Corpus files), the hit rate is 0% for even 1-bit 
escapes.  Thus, literal bytes do not expand at all.  This is equivalent to 
using a prefix bit and 7-bit literals, but does not need separate algorithm 
to detect and handle 7-bit literals. 

For Calgary Corpus files the number of tag bits per primary (counting the 
escape sequences and other overhead) ranges from as low as 0.46 (book1) to 
1.07 (geo) and 1.09 (pic).  These two files (geo and pic) are the only ones 
in the suite where a simple prefix bit would be better than the escape 
system.  The average is 0.74 tag bits per primary. 

In Canterbury Corpus the tag bits per primary ranges from 0.44 
(plrabn12.txt) to 1.09 (ptt5), which is the only one above 0.85 (sum). 
The average is 0.61 tag bits per primary. 

 
-------------------------------------------------------------------------- 

 
Primaries Used for Compression 
------------------------------ 

The compressor uses the previously described escape-bit system while 
generating its output.  I call the different groups of bits that are 
generated primaries, whether it is the correct term or not.  You are 
welcome to suggest a better term for them.  The primaries in this 
compression algorithm are:  literal byte (and escape sequence), LZ77 
(length,offset)-pair, RLE (length, byte)-pair, and EOF (end of file 
marker). 

If the top bits of a literal byte do not match the escape code, the byte is 
output as-is.  If the bits match, an escape sequence is generated, with the 
new escape code.  Other primaries start with the escape code. 

The Elias Gamma Code is used extensively.  This code consists of two parts: 
a unary code (a one-bit preceded by zero-bits) and a binary code part.  The 
first part tells the decoder how many bits are used for the binary code 
part.  Being a universal code, it produces shorter codes for small integers 
and longer codes for larger integers.  Because we expect we need to encode 
a lot of small integers (there are more short string matches and shorter 
equal byte runs than long ones), this reduces the total number of bits 
needed.  See the previous article for a more in-depth delve into 
statistical compression and universal codes.  To understand this article, 
you only need to keep in mind that small integer value equals short code. 
The following discusses the encoding of the primaries. 

The most frequent compressed data is LZ77.  The length of the match is 
output in Elias Gamma code, with "0" meaning the length of 2, "100" length 
of 3, "101" length of 4 and so on.  If the length is not 2, a LZ77 offset 
value follows.  This offset takes 9 to 22 bits.  If the length is 2, the 
next bit defines whether this is LZ77 or RLE/Escape.  If the bit is 0, an 
8-bit LZ77 offset value follows.  (Note that this restricts the offset for 
2-byte matches to 1..256.) If the bit is 1, the next bit decides between 
escape (0) and RLE (1). 

The code for an escape sequence is thus e..e010n..ne....e, where E is the 
byte, and N is the new escape code.  Example: 
  * We are using 2-bit escapes 
  * The current escape code is "11" 
  * We need to encode a byte 0xca == 0b11001010 
  * The escape code and the byte high bits match (both are "11") 
  * We output the current escape code "11" 
  * We output the escaped identification "010" 
  * We output the new escape bits, for example "10" (depends on escape 
    optimization) 
  * We output the rest of the escaped byte "001010" 
  * So, we have output the string "1101010001010" 

When the decompressor receives this string of bits, it finds that the first 
two bits match with the escape code, it finds the escape identification 
("010") and then gets the new escape, the rest of the original byte and 
combines it with the old escape code to get a whole byte. 

The end of file condition is encoded to the LZ77 offset and the RLE is 
subdivided into long and short versions.  Read further, and you get a 
better idea about why this kind of encoding is selected. 

When I studied the distribution of the length values (LZ77 and short RLE 
lengths), I noticed that the smaller the value, the more occurrences. 
The following table shows an example of length value distribution. 

         LZLEN   S-RLE 
  2       1975     477 
  3-4     1480     330 
  5-8      492     166 
  9-16     125      57 
  17-32     31      33 
  33-64      8      15 

The first column gives a range of values.  The first entry has a single 
value (2), the second two values (3 and 4), and so on.  The second column 
shows how many times the different LZ77 match lengths are used, the last 
column shows how many times short RLE lengths are used.  The distribution 
of the values gives a hint of how to most efficiently encode the values. 
We can see from the table for example that values 2-4 are used 3455 times, 
while values 5-64 are used only 656 times.  The more common values need to 
get shorter codes, while the less-used ones can be longer. 

Because in each "magnitude" there are approximately half as many values 
than in the preceding one, it almost immediately occurred to me that the 
optimal way to encode the length values (decremented by one) is: 

  Value      Encoding       Range     Gained 
  0000000    not possible 
  0000001    0              1         -6 bits 
  000001x    10x            2-3       -4 bits 
  00001xx    110xx          4-7       -2 bits 
  0001xxx    1110xxx        8-15      +0 bits 
  001xxxx    11110xxxx      16-31     +2 bits 
  01xxxxx    111110xxxxx    32-63     +4 bits 
  1xxxxxx    111111xxxxxx   64-127    +5 bits 

The first column gives the binary code of the original value (with x 
denoting 0 or 1, xx 0..3, xxx 0..7 and so on), the second column gives the 
encoding of the value(s).  The third column lists the original value range 
in decimal notation. 

The last column summarizes the difference between this code and a 7-bit 
binary code.  Using the previous encoding for the length distribution 
presented reduces the number of bits used compared to a direct binary 
representation considerably.  Later I found out that this encoding in fact 
is Elias Gamma Code, only the assignment of 0- and 1-bits in the prefix is 
reversed, and in this version the length is limited.  Currently the maximum 
value is selectable between 64 and 256. 

So, to recap, this version of the Gamma code can encode numbers from 1 to 
255 (1 to 127 in the example).  LZ77 and RLE lengths that are used start 
from 2, because that is the shortest length that gives us any compression. 
These length values are first decremented by one, thus length 2 becomes 
"0", and for example length 64 becomes "11111011111". 

The distribution of the LZ77 offset values (pointer to a previous 
occurrence of a string) is not at all similar to the length distribution. 
Admittedly, the distribution isn't exactly flat, but it also isn't as 
radical as the length value distribution either.  I decided to encode the 
lower 8 bits (automatically selected or user-selectable between 8 and 12 
bits in the current version) of the offset as-is (i.e.  binary code) and 
the upper part with my version of the Elias Gamma Code.  However, 2-byte 
matches always have an 8-bit offset value.  The reason for this is 
discussed shortly. 

Because the upper part can contain the value 0 (so that we can represent 
offsets from 0 to 255 with a 8-bit lower part), and the code can't directly 
represent zero, the upper part of the LZ77 offset is incremented by one 
before encoding (unlike the length values which are decremented by one). 
Also, one code is reserved for an end of file (EOF) symbol.  This restricts 
the offset value somewhat, but the loss in compression is negligible. 

With the previous encoding 2-byte LZ77 matches would only gain 4 bits (with 
2-bit escapes) for each offset from 1 to 256, and 2 bits for each offset 
from 257 to 768.  In the first case 9 bits would be used to represent the 
offset (one bit for gamma code representing the high part 0, and 8 bits for 
the low part of the offset), in the latter case 11 bits are used, because 
each "magnitude" of values in the Gamma code consumes two more bits than 
the previous one. 

The first case (offset 1..256) is much more frequent than the second case, 
because it saves more bits, and also because the symbol source statistics 
(whatever they are) guarantee 2-byte matches in recent history (much better 
chance than for 3-byte matches, for example).  If we restrict the offset 
for a 2-byte LZ77 match to 8 bits (1..256), we don't lose so much 
compression at all, but instead we could shorten the code by one bit.  This 
one bit comes from the fact that before we had to use one bit to make the 
selection "8-bit or longer".  Because we only have "8-bit" now, we don't 
need that select bit anymore. 

Or, we can use that select bit to a new purpose to select whether this code 
really is LZ77 or something else.  Compared to the older encoding (which 
I'm not detailing here, for clarity's sake.  This is already much too 
complicated to follow, and only slightly easier to describe) the codes for 
escape sequence, RLE and End of File are still the same length, but the 
code for LZ77 has been shortened by one bit.  Because LZ77 is the most 
frequently used primary, this presents a saving that more than compensates 
for the loss of 2-byte LZ77 matches with offsets 257..768 (which we can no 
longer represent, because we fixed the offset for 2-byte matches to use 
exactly 8 bits). 

Run length encoding is also a bit revised.  I found out that a lot of bits 
can be gained by using the same length encoding for RLE as for LZ77.  On 
the other hand, we still should be able to represent long repeat counts as 
that's where RLE is most efficient.  I decided to split RLE into two modes: 
  * short RLE for short (e.g. 2..128) equal byte strings 
  * long RLE for long equal byte strings 

The Long RLE selection is encoded into the Short RLE code.  Short RLE only 
uses half of its coding space, i.e.  if the maximum value for the gamma 
code is 127, short RLE uses only values 1..63.  Larger values switches the 
decoder into Long RLE mode and more bits are read to complete the run 
length value. 

For further compression in RLE we rank all the used RLE bytes (the values 
that are repeated in RLE) in the decreasing probability order.  The values 
are put into a table, and only the table indices are output.  The indices 
are also encoded using a variable length code (the same gamma code, 
surprise..), which uses less bits for smaller integer values.  As there are 
more RLE's with smaller indices, the average code length decreases.  In 
decompression we simply get the gamma code value and then use the value as 
an index into the table to get the value to repeat. 

Instead of reserving full 256 bytes for the table we only put the top 31 
RLE bytes into the table.  Normally this is enough.  If there happens to be 
a byte run with a value not in the table we use a similar technique as for 
the short/long RLE selection.  If the table index is larger than 31, it 
means we don't have the value in the table.  We use the values 32..63 to 
select the 'escaped' mode and simultaneously send the 5 most significant 
bits of the value (there are 32 distinct values in the range 32..63).  The 
rest 3 bits of the byte are sent separately. 

If you are more than confu

  
sed, forget everything I said in this chapter and 
look at the decompression pseudo-code later in this article. 

 
-------------------------------------------------------------------------- 

 
Graph Search - Selecting Primaries 
---------------------------------- 

In free-parse methods there are several ways to divide the file into parts, 
each of which is equally valid but not necessary equally efficient in terms 
of compression ratio. 

     "i just saw justin adjusting his sting" 

     "i just saw", (-9,4), "in ad", (-9,6), "g his", (-25,2), (-10,4) 
     "i just saw", (-9,4), "in ad", (-9,6), "g his ", (-10,5) 

The latter two lines show how the sentence could be encoded using literal 
bytes and (offset, length) pairs.  As you can see, we have two different 
encodings for a single string and they are both valid, i.e.  they will 
produce the same string after decompression.  This is what free-parse is: 
there are several possible ways to divide the input into parts.  If we are 
clever, we will of course select the encoding that produces the shortest 
compressed version.  But how do we find this shortest version?  How does 
the data compressor decide which primary to generate in each step? 

The most efficient way the file can be divided is determined by a sort of a 
graph-search algorithm, which finds the shortest possible route from the 
start of the file to the end of the file.  Well, actually the algorithm 
proceeds from the end of the file to the beginning for efficiency reasons, 
but the result is the same anyway:  the path that minimizes the bits 
emitted is determined and remembered.  If the parameters (number of escape 
bits or the variable length codes or their parameters) are changed, the 
graph search must be re-executed. 

     "i just saw justin adjusting his sting" 
                 \___/    \_____/    \_|___/ 
                  13       15        11 13 
                                      \____/ 
                                       15 

Think of the string as separate characters.  You can jump to the next 
character by paying 8 bits to do so (not shown in the figure), unless the 
top bits of the character match with the escape code (in which case you 
need more bits to send the character "escaped").  If the history buffer 
contains a string that matches a string starting at the current character 
you can jump over the string by paying as many bits as representing the 
LZ77 (offset,length)-pair takes (including escape bits), in this example 
from 11 to 15 bits.  And the same applies if you have RLE starting at the 
character.  Then you just find the least-expensive way to get from the 
start of the file to the end and you have found the optimal encoding.  In 
this case the last characters " sting" would be optimally encoded with 
8(literal " ") + 15("sting") = 23 instead of 11(" s") + 13("ting") = 24 
bits. 

The algorithm can be written either cleverly or not-so.  We can take a real 
short-cut compared to a full-blown graph search because we can/need to only 
go forwards in the file:  we can simply start from the end!  Our accounting 
information which is updated when we pass each location in the data 
consists of three values: 
 1. the minimum bits from this location to the end of file. 
 2. the mode (literal, LZ77 or RLE) to use to get that minimum 
 3. the "jump" length for LZ77 and RLE 

For each location we try to jump forward (to a location we already 
processed) one location, LZ77 match length locations (if a match exists), 
or RLE length locations (if equal bytes follow) and select the shortest 
route, update the tables accordingly.  In addition, if we have a LZ77 or 
RLE length of for example 18, we also check jumps 17, 16, 15, ...  This 
gives a little extra compression.  Because we are doing the "tree traverse" 
starting from the "leaves", we only need to visit/process each location 
once.  Nothing located after the current location can't change, so there is 
never any need to update a location. 

To be able to find the minimal path, the algorithm needs the length of the 
RLE (the number of the identical bytes following) and the maximum LZ77 
length/offset (an identical string appearing earlier in the file) for each 
byte/location in the file.  This is the most time-consuming -and- 
memory-consuming part of the compression.  I have used several methods to 
make the search faster.  See String Match Speedup later in this 
article.  Fortunately these searches can be done first, and the actual 
optimization can use the cached values. 

Then what is the rationale behind this optimization?  It works because you 
are not forced to take every compression opportunity, but select the best 
ones.  The compression community calls this "lazy coding" or "non-greedy" 
selection.  You may want to emit a literal byte even if there is a 2-byte 
LZ77 match, because in the next position in the file you may have a longer 
match.  This is actually more complicated than that, but just take my word 
for it that there is a difference.  Not a very big difference, and only 
significant for variable-length code, but it is there and I was after every 
last bit of compression, remember. 

Note that the decision-making between primaries is quite simple if a 
fixed-length code is used.  A one-step lookahead is enough to guarantee 
optimal parsing.  If there is a more advantageous match in the next 
location, we output a literal byte and that longer match instead of the 
shorter match.  I don't have time or space here to go very deeply on that, 
but the main reason is that in fixed-length code it doesn't matter whether 
you represent a part of data as two matches of lengths 2 and 8 or as 
matches of lengths 3 and 7 or as any other possible combination (if matches 
of those lengths exist).  This is not true for a variable-length code 
and/or a statistical compression backend.  Different match lengths and 
offsets no longer generate equal-length codes. 

Note also that most LZ77 compression algorithms need at least 3-byte match 
to break even, i.e.  not expanding the data.  This is not surprising when 
you stop to think about it.  To gain something from 2-byte matches you need 
to encode the LZ77 match into 15 bits.  This is very little.  A generic 
LZ77 compressor would use one bit to select between a literal and LZ77, 12 
bits for moderate offset, and you have 2 bits left for match length.  I 
imagine the rationale to exclude 2-byte matches also include "the potential 
savings percentage for 2-byte matches is insignificant".  Pucrunch gets 
around this by using the tag system and Elias Gamma Code, and does indeed 
gain bits from even 2-byte matches. 

After we have decided on what primaries to output, we still have to make 
sure we get the best results from the literal tag system.  Escape 
optimization handles this.  In this stage we know which parts of the data 
are emitted as literal bytes and we can select the minimal path from the 
first literal byte to the last in the same way we optimized the primaries. 
Literal bytes that match the escape code generate an escape sequence, thus 
using more bits than unescaped literal bytes, and we need to minimize these 
occurrences. 

For each literal byte there is a corresponding new escape code which 
minimizes the path to the end of the file.  If the literal byte's high bits 
match the current escape code, this new escape code is used next.  The 
escape optimization routine proceeds from the end of the file to the 
beginning like the graph search, but it proceeds linearly and is thus much 
faster. 

I already noted that the new literal byte tagging system exploits the 
locality in the literal byte values.  If there is no correlation between 
the bytes, the tagging system does not do well at all.  Most of the time, 
however, the system works very well, performing 50% better than the 
prefix-bit approach. 

The escape optimization routine is currently very fast.  A little 
algorithmic magic removed a lot of code from the original version.  A fast 
escape optimization routine is quite advantageous, because the number of 
escape bits can now vary from 0 (uncompressed bytes always escaped) to 8 
and we need to run the routine again if we change the number of escape bits 
used to select the optimal escape code changes. 

Because escaped literal bytes actually expand the data, we need a safety 
area, or otherwise the compressed data may get overwritten by the 
decompressed data before we have used it.  Some extra bytes need to be 
reserved for the end of file marker.  The compression routine finds out how 
many bytes we need for safety buffer by keeping track of the difference 
between input and output sizes while creating the compressed file. 

            $1000 .. $2000 
            |OOOOOOOO|           O=original file 

        $801 .. 
        |D|CCCCC|                C=compressed data (D=decompressor) 

 $f7..      $1000     $2010 
 |D|            |CCCCC|          Before decompression starts 
            ^    ^ 
            W    R               W=write pointer, R=read pointer 

If the original file is located at $1000-$1fff, and the calculated safety 
area is 16 bytes, the compressed version will be copied by the 
decompression routine higher in memory so that the last byte is at $200f. 
In this way, the minimum amount of other memory is overwritten by the 
decompression.  If the safety are would exceed the top of memory, we need a 
wrap buffer.  This is handled automatically by the compressor.  The read 
pointer wraps from the end of memory to the wrap buffer, allowing the 
original file to extend up to the end of the memory, all the way to $ffff. 
You can get the compression program to tell you which memory areas it uses 
by specifying the "-s" option.  Normally the safety buffer needed is less 
than a dozen bytes. 

To sum things up, Pucrunch operates in several steps: 
 1. Find RLE and LZ77 data, pre-select RLE byte table 
 2. Graph search, i.e.  which primaries to use 
 3. Primaries/Literal bytes ratio decides how many escape bits to use 
 4. Escape optimization, which escape codes to use 
 5. Update RLE ranks and the RLE byte table 
 6. Determine the safety area size and output the file. 

 
-------------------------------------------------------------------------- 

 
String Match Speedup 
-------------------- 

To be able to select the most efficient combination of primaries we of 
course first need to find out what kind of primaries are available for 
selection.  If the file doesn't have repeated bytes, we can't use RLE.  If 
the file doesn't have repeating byte strings, we can't use LZ77.  This 
string matching is the most time-consuming operation in LZ77 compression 
simply because of the amount of the comparison operations needed.  Any 
improvement in the match algorithm can decrease the compression time 
considerably.  Pucrunch is a living proof on that. 

The RLE search is straightforward and fast:  loop from the current position 
(P) forwards in the file counting each step until a different-valued byte 
is found or the end of the file is reached.  This count can then be used as 
the RLE byte count (if the graph search decides to use RLE).  The code can 
also be optimized to initialize counts for all locations that belonged to 
the RLE, because by definition there are only one-valued bytes in each one. 
Let us mark the current file position by P. 

 unsigned char *a = indata + P, val = *a++; 
 int top = inlen - P; 
 int rlelen = 1; 

 /* Loop for the whole RLE */ 
 while(rlelen<top && *a++ == val) 
     rlelen++; 

 for(i=0;i<rlelen-1;i++) 
     rle[P+i] = rlelen-i; 

With LZ77 we can't use the same technique as for RLE (i.e.  using the 
information about current match to skip subsequent file locations to speed 
up the search).  For LZ77 we need to find the longest possible, and 
-nearest- possible, string that matches the bytes starting from the current 
location.  The nearer the match, the less bits are needed to represent the 
offset from the current position. 

Naively, we could start comparing the strings starting from P-1 and P, 
remembering the length of the matching part and then doing the same at P-2 
and P, P-3 and P, ..  P-j and P (j is the maximum search offset).  The 
longest match and its location (offset from the current position) are then 
remembered and initialized.  If we find a match longer or equal than the 
maximum length we can actually use, we can stop the search there.  (The 
code used to represent the length values may have an upper limit.) 

This may be the first implementation that comes to your (and my) mind, and 
might not seem so bad at first.  In reality, it is a very slow way to do 
the search: the -Brute Force- method.  It could take somewhere about (n^3) 
byte compares to process a file of the length n (a mathematically inclined 
person would probably give a better estimate).  However, using the already 
determined RLE value to our advantage permits us to rule out the worst-case 
projection, which happens when all bytes are the same value.  We only 
search LZ77 matches if the current file position has shorter RLE sequence 
than the maximum LZ77 copy length. 

The first thing I did to improve the speed is to remember the position 
where each byte has last been seen.  A simple 256-entry table handles that. 
Using this table, the search can directly start from the first potential 
match, and we don't need to search for it byte-by-byte anymore.  The table 
is continually updated when we move toward to the end of the file. 

That didn't give much of an improvement, but then I increased the table to 
256*256 entries, making it possible to locate the latest occurrence of any 
byte -pair- instead.  The table indexed with the byte values and the table 
contents directly gives the position in file where these two bytes were 
last seen.  Because the shortest possible string that would offer any 
compression (for my encoding of LZ77) is two bytes long, this byte-pair 
history is very suitable indeed.  Also, the first (shortest possible, i.e. 
2-byte) match is found directly from the byte-pair history.  This gave a 
moderate 30% decrease in compression time for one of my test files (from 28 
minutes to 17 minutes on a 25 MHz 68030). 

The second idea was to quickly discard the strings that had no chance of 
being longer matches than the one already found.  A one-byte hash value 
(sort of a checksum here, it is never used to index a hash table in this 
algorithm, but I rather use "hash value" than "checksum") is calculated 
from each three bytes of data.  The values are calculated once and put into 
a table, so we only need two memory fetches to know if two 3-byte strings 
are different.  If the hash values are different, at least one of the data 
bytes differ.  If the hash values are equal, we have to compare the 
original bytes.  The hash values of the strategic positions of the strings 
to compare are then ..  compared.  This strategic position is the location 
two bytes earlier than the longest match so far.  If the hash values 
differ, there is no chance that the match is longer than the current one. 
It may be not even be as long, because one of the two earlier bytes may be 
different.  If the hash values are equal, the brute-force byte-by-byte 
compare has to be done.  However, the hash value check already discards a 
huge number of candidates and more than generously pays back its own memory 
references.  Using the hash values the compression time shortens by 50% 
(from 17 minutes to 8 minutes). 

Okay, the byte-pair table tells us where the latest occurrence of any byte 
pair is located.  Still, for the latest occurrence before -that- one we 
have to do a brute force search.  The next improvement was to use the 
byte-pair table to generate a linked list of the byte pairs with the same 
value.  In fact, this linked list can be trivially represented as a table, 
using the same indexing as the file positions.  To locate the previous 
occurrence of a 2-byte string starting at location P, look at backSkip[P]. 

 /* Update the two-byte history & backSkip */ 
 if(P+1<inlen) 
 { 
     int index = (indata[P]<<8) | indata[P+1]; 

     backSkip[P] = lastPair[index]; 
     lastPair[index] = P+1; 
 } 

Actually the values in the table are one bigger than the real table 
indices.  This is because the values are of type unsigned short (can only 
represent non-negative values), and I wanted zero to mean "not occurred". 

This table makes the search of the next (previous) location to consider 
much faster, because it is a single table reference.  The compression time 
was reduced from 6 minutes to 1 minute 10 seconds.  Quite an improvement 
from the original 28 minutes! 

     backSkip[]   lastPair[] 
 ___  _______  ____ 
    \/       \/    \ 
 ...JOVE.....JOKA..JOKER 
    ^        ^     ^ 
    |        |     | 
    next     |     position 
             current match (3) 
    C        B     A 

In this example we are looking at the string "JOKER" at location A.  Using 
the lastPair[] table (with the index "JO", the byte values at the current 
location A) we can jump directly to the latest match at B, which is "JO", 2 
bytes long.  The hash values for the string at B ("JOK") and at A ("JOK") 
are compared.  Because they are equal, we have a potential longer match (3 
bytes), and the strings "JOKE.." and "JOKA.." are compared.  A match of 
length 3 is found (the 4th byte differs).  The backSkip[] table with the 
index B gives the previous location where the 2-byte string "JO" can be 
found, i.e.  C.  The hash value for the strategic position of the string in 
the current position A ("OKE") is then compared to the hash value of the 
corresponding position in the next potential match starting at C ("OVE"). 
They don't match, so the string starting at C ("JOVE..") can't include a 
longer match than the current longest match at B. 

There is also another trick that takes advantage of the already determined 
RLE lengths.  If the RLE lengths for the positions to compare don't match, 
we can directly skip to the next potential match.  Note that the RLE bytes 
(the data bytes) are the same, and need not be compared, because the first 
byte (two bytes) are always equal on both positions (our backSkip[] table 
guarantees that).  The RLE length value can also be used to skip the start 
of the strings when comparing them. 

Another improvement to the search code made it dramatically faster than 
before on highly redundant files (such as pic from the Calgary Corpus 
Suite, which was the Achilles' heel until then).  Basically the new search 
method just skips over the RLE part (if any) in the search position and then 
checks if the located position has equal number (and value) of RLE bytes 
before it. 

backSkip[]      lastPair[] 
     _____  ________ 
          \/        \ 
       ...AB.....A..ABCD    rle[p] # of A's, B is something else 
          ^      ^  ^ 
          |      |  | 
          i      p  p+rle[p]-1 

The algorithm searches for a two-byte string which starts at p + rle[p]-1, 
i.e.  the last rle byte ('A') and the non-matching one ('B').  When it 
finds such location (simple lastPair[] or backSkip[] lookup), it checks if 
the rle in the compare position (i-(rle[p]-1)) is long enough (i.e.  the 
same number of A's before the B in both places).  If there are, the normal 
hash value check is performed on the strings and if it succeeds, the 
brute-force byte-compare is done. 

The rationale behind this idea is quite simple.  There are dramatically 
less matches for "AB" than for "AA", so we get a huge speedup with this 
approach.  We are still guaranteed to find the most recent longest match 
there is. 

Note that a compression method similar to RLE can be realized using just 
LZ77.  You just emit the first byte as a literal byte, and output a LZ77 
code with offset 1 and the original RLE length minus 1.  You can thus 
consider RLE as a special case, which offers tighter encoding of the 
necessary information.  Also, as my LZ77 limits the copy size to 64/128/256 
bytes, a RLE version providing lengths up to 32 kilobytes is a big 
improvement, even if the code for it is somewhat longer. 

 
-------------------------------------------------------------------------- 

 
The Decompression Routine 
------------------------- 

Any lossless compression program is totally useless unless there exists a 
decompression program which takes in the compressed file and -- using only 
that information -- generates the original file exactly.  In this case the 
decompression program must run on C64's 6510 microprocessor, which had its 
impact on the algorithm development also.  Regardless of the algorithm, 
there are several requirements that the decompression code must satisfy: 
 1. Correctness - the decompression must behave accurately to 
    guarantee lossless decompression 
 2. Memory usage - the less memory is used the better 
 3. Speed - fast decompression is preferred to slower one 

The latter two requirements can be and are complementary.  A somewhat 
faster decompression for the same algorithm is possible if more memory can 
be used (although in this case the difference is quite small).  In any case 
the correctness of the result is the most important thing. 

A short pseudo-code of the decompression algorithm follows before I go to 
the actual C64 decompression code. 

     copy the decompression code to low memory 
     copy the compressed data forward in memory so that it isn't 
       overwritten before we have read it (setup safety & wrap buffers) 
     setup source and destination pointers 
     initialize RLE byte code table, the number of escape bits etc. 
     set initial escape code 
     do forever 
         get the number of escape bits "bits" 
         if "bits" do not match with the escape code 
             read more bits to complete a byte and output it 
         else 
             get Elias Gamma Code "value" and add 1 to it 
             if "value" is 2 
                 get 1 bit 
                 if bit is 0 
                     it is 2-byte LZ77 
                     get 8 bits for "offset" 
                     copy 2 bytes from "offset" bytes before current 
                         output position into current output position 
                 else 
                     get 1 bit 
                     if bit is 0 
                         it is an escaped literal byte 
                         get new escape code 
                         get more bits to complete a byte with the 
                             current escape code and output it 
                         use the new escape code 
                     else 
                         it is RLE 
                         get Elias Gamma Code "length" 
                         if "length" larger or equal than half the maximum 
                             it is long RLE 
                             get more bits to complete a byte "lo" 
                             get Elias Gamma Code "hi", subtract 1 
                             combine "lo" and "hi" into "length" 
                         endif 
                         get Elias Gamma Code "index" 
                         if "index" is larger than 31 
                             get 3 more bits to complete "byte" 
                         else 
                             get "byte" from RLE byte code table from 
                                 index "index" 
                         endif 
                         copy "byte" to the output "length" times 
                     endif 
                 endif 
             else 
                 it is LZ77 
                 get Elias Gamma Code "hi" and subtract 1 from it 
                 if "hi" is the maximum value - 1 
                     end decompression and start program 
                 endif 
                 get 8..12 bits "lo" (depending on settings) 
                 combine "hi" and "lo" into "offset" 
                 copy "value" number of bytes from "offset" bytes before 
                     current output position into current output position 
             endif 
         endif 
     end do 

The following routine is the pucrunch decompression code.  The code runs on 
the C64 or C128's C64-mode and a modified version is used for Vic20 and 
C16/Plus4.  It can be compiled by at least DASM V2.12.04.  Note that the 
compressor automatically attaches this code to the packet and sets the 
different parameters to match the compressed data.  I will insert 
additional commentary between strategic code points in addition to the 
comments that are already in the code. 

Note that at this point it is only possible to make the decompression code 
shorter by removing features.  At least I think that it is now so.  If I'm 
wrong, feel free to point it out to me.  Tim Rogers <timr@eurodltd.co.uk> 
did manage to snip off 2 bytes, thanks!  However, there are some features 
you may consider unnecessary.  The code can be shortened by: 
  * No basic end address set: 8 bytes 
  * No 2 MHz mode set/reset: 6 bytes 
  * No wrap option: 12 bytes 

Actually, if the wrap option is not used, the compressor automatically 
selects the shorter decompression code (only for the C64 version). 

        processor 6502 

BASEND  EQU $2d         ; start of basic variables (updated at EOF) 
LZPOS   EQU $2d         ; temporary, BASEND *MUST* *BE* *UPDATED* at EOF 

bitstr  EQU $f7         ; Hint the value beforehand 
xstore  EQU $c3         ; tape load temp 

WRAPBUF EQU $004b       ; 'wrap' buffer, 22 bytes ($02a7 for 89 bytes) 

        ORG $0801 
        DC.B $0b,8,$ef,0        ; '239 SYS2061' 
        DC.B $9e,$32,$30,$36 
        DC.B $31,0,0,0 

        sei             ; disable interrupts 
        inc $d030       ; or "bit $d030" if 2MHz mode is not enabled 
        inc 1           ; Select ALL-RAM configuration 

        ldx #0          ;** parameter - # of overlap bytes-1 off $ffff 
overlap lda $aaaa,x     ;** parameter start of off-end bytes 
        sta WRAPBUF,x   ; Copy to wrap/safety buffer 
        dex 
        bpl overlap 

        ldx #block200-end-block200+1    ; $54   ($59 max) 
packlp  lda block200-1,x 
        sta block200--1,x 
        dex 
        bne packlp 

        ldx #block-stack-end-block-stack+1      ; $b3   (stack! ~$e8 max) 
packlp2 lda block-stack-1,x 
        dc.b $9d                ; sta $nnnn,x 
        dc.w block-stack--1     ; (ZP addressing only addresses ZP!) 
        dex 
        bne packlp2 

        ldy #$aa        ;** parameter SIZE high + 1 (max 255 extra bytes) 
cploop  dex             ; ldx #$ff on the first round 
        lda $aaaa,x     ;** parameter DATAEND-0x100 
        sta $ff00,x     ;** parameter ORIG LEN-0x100+ reserved bytes 
        txa             ;cpx #0 
        bne cploop 
        dec cploop+6 
        dec cploop+3 
        dey 
        bne cploop 
        jmp main 

The first part of the code contains a sys command for the basic 
interpreter, two loops that copy the decompression code to zeropage/stack 
($f7-$1aa) and to the system input buffer ($200-$253).  The latter code 
segment contains byte, bit and Gamma Code input routines and the RLE byte 
code table, the former code segment contains the rest. 

This code also copies the compressed data forward in memory so that it 
won't be overwritten by the decompressed data before we have had a change 
to read it.  The decompression starts at the beginning and proceeds upwards 
in both the compressed and decompressed data.  A safety area is calculated 
by the compression routine.  It finds out how many bytes we need for 
temporary data expansion, i.e.  for escaped bytes.  The wrap buffer is used 
for files that extend upto the end of memory, and would otherwise overwrite 
the compressed data with decompressed data before it has been read. 

This code fragment is not used during the decompression itself.  In fact 
the code will normally be overwritten when the actual decompression starts. 

The very start of the next code block is located inside the zero page and 
the rest fills the lowest portion of the microprocessor stack.  The zero 
page is used to make the references to different variables shorter and 
faster.  Also, the variables don't take extra code to initialize, because 
they are copied with the same copy loop as the rest of the code. 

 
block-stack 
#rorg $f7       ; $f7 - ~$1e0 
block-stack- 

bitstr  dc.b $80        ; ZP    $80 == Empty 
esc     dc.b $00        ; ** parameter (saves a byte when here) 

OUTPOS = *+1            ; ZP 
putch   sta $aaaa       ; ** parameter 
        inc OUTPOS      ; ZP 
        bne 0$          ; Note: beq 0$; rts; 0$: inc OUTPOS+1;rts would be 
; $0100                 ;       faster, but 1 byte longer 
        inc OUTPOS+1    ; ZP 
0$      rts 

putch is the subroutine that is used to output the decompressed bytes.  In 
this case the bytes are written to memory.  Because the subroutine call 
itself takes 12 cycles (6 for jsr and another 6 for rts), and the routine 
is called a lot of times during the decompression, the routine itself 
should be as fast as possible.  This is achieved by removing the need to 
save any registers.  This is done by using an absolute addressing mode 
instead of indirect indexed or absolute indexed addressing (sta $aaaa 
instead of sta ($zz),y or sta $aa00,y).  With indexed addressing you would 
need to save+clear+restore the index register value in the routine. 

Further improvement in code size and execution speed is done by storing the 
instruction that does the absolute addressing to zero page.  When the 
memory address is incremented we can use zero-page addressing for it too. 
On the other hand, the most time is spent in the bit input routine so 
further optimization of this routine is not feasible. 

 
newesc  ldy esc         ; remember the old code (top bits for escaped byte) 
        ldx #2          ; ** PARAMETER 
        jsr getchkf     ; get & save the new escape code 
        sta esc 
        tya             ; pre-set the bits 
        ; Fall through and get the rest of the bits. 
noesc   ldx #6          ; ** PARAMETER 
        jsr getchkf 
        jsr putch       ; output the escaped/normal byte 
        ; Fall through and check the escape bits again 
main    ldy #0          ; Reset to a defined state 
        tya             ; A = 0 
        ldx #2          ; ** PARAMETER 
        jsr getchkf     ; X=2 -> X=0 
        cmp esc 
        bne noesc       ; Not the escape code -> get the rest of the byte 
        ; Fall through to packed code 

The decompression code is first entered in main.  It first clears the 
accumulator and the Y register and then gets the escape bits (if any are 
used) from the input stream.  If they don't match with the current escape 
code, we get more bits to complete a byte and then output the result.  If 
the escape bits match, we have to do further checks to see what to do. 

        jsr getval      ; X = 0 
        sta xstore      ; save the length for a later time 
        cmp #1          ; LEN == 2 ? 
        bne lz77        ; LEN != 2      -> LZ77 
        tya             ; A = 0 
        jsr get1bit     ; X = 0 
        lsr             ; bit -> C, A = 0 
        bcc lz77-2      ; A=0 -> LZPOS+1 
        ;***FALL THRU*** 

We first get the Elias Gamma Code value (or actually my independently 
developed version).  If it says the LZ77 match length is greater than 2, it 
means a LZ77 code and we jump to the proper routine.  Remember that the 
lengths are decremented before encoding, so the code value 1 means the 
length is 2.  If the length is two, we get a bit to decide if we have LZ77 
or something else.  We have to clear the accumulator, because get1bit does 
not do that automatically. 

If the bit we got (shifted to carry to clear the accumulator) was zero, it 
is LZ77 with an 8-bit offset.  If the bit was one, we get another bit which 
decides between RLE and an escaped byte.  A zero-bit means an escaped byte 
and the routine that is called also changes the escape bits to a new value. 
A one-bit means either a short or long RLE. 

        ; e..e01 
        jsr get1bit     ; X = 0 
        lsr             ; bit -> C, A = 0 
        bcc newesc      ; e..e010 
        ;***FALL THRU*** 

        ; e..e011 
srle    iny             ; Y is 1 bigger than MSB loops 
        jsr getval      ; Y is 1, get len, X = 0 
        sta xstore      ; Save length LSB 
        cmp #64         ; ** PARAMETER 63-64 -> C clear, 64-64 -> C set.. 
        bcc chrcode     ; short RLE, get bytecode 

longrle ldx #2          ; ** PARAMETER  111111xxxxxx 
        jsr getbits     ; get 3/2/1 more bits to get a full byte, X = 0 
        sta xstore      ; Save length LSB 

        jsr getval      ; length MSB, X = 0 
        tay             ; Y is 1 bigger than MSB loops 

The short RLE only uses half (or actually 1 value less than a half) of the 
gamma code range.  Larger values switches us into long RLE mode.  Because 
there are several values, we already know some bits of the length value. 
Depending on the gamma code maximum value we need to get from one to three 
bits more to assemble a full byte, which is then used as the less 
significant part for the run length count.  The upper part is encoded using 
the same gamma code we are using everywhere.  This limits the run length to 
16 kilobytes for the smallest maximum value (-m5) and to the full 64 
kilobytes for the largest value (-m7). 

Additional compression for RLE is gained using a table for the 31 
top-ranking RLE bytes.  We get an index from the input.  If it is from 1 to 
31, we use it to index the table.  If the value is larger, the lower 5 bits 
of the value gives us the 5 most significant bits of the byte to repeat. 
In this case we read 3 additional bits to complete the byte. 

 
chrcode jsr getval      ; Byte Code, X = 0 
        tax             ; this is executed most of the time anyway 
        lda table-1,x   ; Saves one jump if done here (loses one txa) 

        cpx #32         ; 31-32 -> C clear, 32-32 -> C set.. 
        bcc 1$          ; 1..31 -> the byte to repeat is in A 

        ; Not ranks 1..31, -> 111110xxxxx (32..64), get byte.. 
        txa             ; get back the value (5 valid bits) 
        jsr get3bit     ; get 3 more bits to get a full byte, X = 0 

1$      ldx xstore      ; get length LSB 
        inx             ; adjust for cpx#$ff;bne -> bne 
dorle   jsr putch 
        dex 
        bne dorle       ; xstore 0..255 -> 1..256 
        deym 
        bne dorle       ; Y was 1 bigger than wanted originally 
mainbeq beq main        ; reverse condition -> jump always 

After deciding the repeat count and decoding the value to repeat we simply 
have to output the value enough times.  The X register holds the lower part 
and the Y register holds the upper part of the count.  The X register value 
is first incremented by one to change the code sequence dex ; cpx #$ff ; 
bne dorle into simply dex ; bne dorle.  This may seem strange, but it saves 
one byte in the decompression code and two clock cycles for each byte that 
is outputted.  It's almost a ten percent improvement.  :-) 

The next code fragment is the LZ77 decode routine and it is used in the 
file parts that do not have equal byte runs (and even in some that have). 
The routine simply gets an offset value and copies a sequence of bytes from 
the already decompressed portion to the current output position. 

 
lz77    jsr getval      ; X=0 -> X=0 
        cmp #127        ; ** PARAMETER  Clears carry (is maximum value) 
        beq eof         ; EOF 

        sbc #0          ; C is clear -> subtract 1  (1..126 -> 0..125) 
        ldx #0          ; ** PARAMETER (more bits to get) 
        jsr getchkf     ; clears Carry, X=0 -> X=0 

lz77-2  sta LZPOS+1     ; offset MSB 
        ldx #8 
        jsr getbits     ; clears Carry, X=8 -> X=0 
                        ; Note: Already eor:ed in the compressor.. 
        ;eor #255       ; offset LSB 2's complement -1 (i.e. -X = ~X+1) 
        adc OUTPOS      ; -offset -1 + curpos (C is clear) 
        sta LZPOS 

        lda OUTPOS+1 
        sbc LZPOS+1     ; takes C into account 
        sta LZPOS+1     ; copy X+1 number of chars from LZPOS to OUTPOS 
        ;ldy #0         ; Y was 0 originally, we don't change it 

        ldx xstore      ; LZLEN 
        inx             ; adjust for cpx#$ff;bne -> bne 
lzloop  lda (LZPOS),y 
        jsr putch 
        iny             ; Y does not wrap because X=0..255 and Y initially 0 
        dex 
        bne lzloop      ; X loops, (256,1..255) 
        beq mainbeq     ; jump through another beq (-1 byte, +3 cycles) 

There are two entry-points to the LZ77 decode routine.  The first one 
(lz77) is for copy lengths bigger than 2.  The second entry point (lz77-2) 
is for the length of 2 (8-bit offset value). 

 
        ; EOF 
eof     lda #$37        ; ** could be a PARAMETER 
        sta 1 
        dec $d030       ; or "bit $d030" if 2MHz mode is not enabled 
        lda OUTPOS      ; Set the basic prg end address 
        sta BASEND 
        lda OUTPOS+1 
        sta BASEND+1 
        cli             ; ** could be a PARAMETER 
        jmp $aaaa       ; ** PARAMETER 
#rend 
block-stack-end 

Some kind of a end of file marker is necessary for all variable-length 
codes.  Otherwise we could not be certain when to stop decoding.  Sometimes 
the byte count of the original file is used instead, but here a special EOF 
condition is more convenient.  If the high part of a LZ77 offset is the 
maximum gamma code value, we have reached the end of file and must stop 
decoding.  The end of file code turns on BASIC and KERNEL, turns off 2 MHz 
mode (for C128) and updates the basic end addresses before allowing 
interrupts and jumping to the program start address. 

The next code fragment is put into the system input buffer.  The routines 
are for getting bits from the encoded message (getbits) and decoding the 
Elias Gamma Code (getval).  The table at the end contains the ranked RLE 
bytes.  The compressor automatically decreases the table size if not all of 
the values are used. 

 
block200 
#rorg   $200    ; $200-$258 
block200- 

getnew  pha             ; 1 Byte/3 cycles 
INPOS = *+1 
        lda $aaaa       ;** parameter 
        rol             ; Shift in C=1 (last bit marker) 
        sta bitstr      ; bitstr initial value = $80 == empty 
        inc INPOS       ; Does not change C! 
        bne 0$ 
        inc INPOS+1     ; Does not change C! 
        bne 0$ 
        ; This code does not change C! 
        lda #WRAPBUF    ; Wrap from $ffff->$0000 -> WRAPBUF 
        sta INPOS 
0$      pla             ; 1 Byte/4 cycles 
        rts 

 
; getval : Gets a 'static huffman coded' value 
; ** Scratches X, returns the value in A ** 
getval  inx             ; X must be 0 when called! 
        txa             ; set the top bit (value is 1..255) 
0$      asl bitstr 
        bne 1$ 
        jsr getnew 
1$      bcc getchk      ; got 0-bit 
        inx 
        cpx #7          ; ** parameter 
        bne 0$ 
        beq getchk      ; inverse condition -> jump always 

; getbits: Gets X bits from the stream 
; ** Scratches X, returns the value in A ** 
get1bit inx             ;2 
getbits asl bitstr 
        bne 1$ 
        jsr getnew 
1$      rol             ;2 
getchk  dex             ;2              more bits to get ? 
getchkf bne getbits     ;2/3 
        clc             ;2              return carry cleared 
        rts             ;6+6 

 
table   dc.b 0,0,0,0,0,0,0 
        dc.b 0,0,0,0,0,0,0,0 
        dc.b 0,0,0,0,0,0,0,0 
        dc.b 0,0,0,0,0,0,0,0 

#rend 
block200-end 

 
-------------------------------------------------------------------------- 

 
Target Application Compression Tests 
------------------------------------ 

The following data compression tests are made on my four C64 test files: 
   bs.bin is a demo part, about 50% code and 50% graphics data 
   delenn.bin is a BFLI picture with a viewer, a lot of dithering 
   sheridan.bin is a BFLI picture with a viewer, dithering, black areas 
   ivanova.bin is a BFLI picture with a viewer, dithering, larger black areas 

   Packer                 Size     Left    Comment 
   =============================================== 
   bs.bin                 41537 
   ----------------------------------------------- 
   ByteBonker 1.5         27326   65.8%    Mode 4 
   Cruelcrunch 2.2        27136   65.3%    Mode 1 
   The AB Cruncher        27020   65.1% 
   ByteBoiler (REU)       26745   64.4% 
   RLE + ByteBoiler (REU) 26654   64.2% 
   PuCrunch               26415   63.6%    -m5 
   =============================================== 
   delenn.bin             47105 
   ----------------------------------------------- 
   The AB Cruncher        N/A       N/A    Crashes 
   ByteBonker 1.5         21029   44.6%    Mode 3 
   Cruelcrunch 2.2        20672   43.9%    Mode 1 
   ByteBoiler (REU)       20371   43.2% 
   RLE + ByteBoiler (REU) 19838   42.1% 
   PuCrunch               19734   41.9%    -p2 
   =============================================== 
   sheridan.bin           47105 
   ----------------------------------------------- 
   ByteBonker 1.5         13661   29.0%    Mode 3 
   Cruelcrunch 2.2        13595   28.9%    Mode H 
   The AB Cruncher        13534   28.7% 
   ByteBoiler (REU)       13308   28.3% 
   PuCrunch               12526   26.6%    -p2 
   RLE + ByteBoiler (REU) 12478   26.5% 
   =============================================== 
   ivanova.bin            47105 
   ----------------------------------------------- 
   ByteBonker 1.5         11016   23.4%    Mode 1 
   Cruelcrunch 2.2        10883   23.1%    Mode H 
   The AB Cruncher        10743   22.8% 
   ByteBoiler (REU)       10550   22.4% 
   PuCrunch                9844   20.9%    -p2 
   RLE + ByteBoiler (REU)  9813   20.8% 
   LhA                     9543   20.3%    Decompressor not included 
   gzip -9                 9474   20.1%    Decompressor not included 
   ----------------------------------------------- 

 
-------------------------------------------------------------------------- 

 
Calgary Corpus Suite 
-------------------- 

The original compressor only allows files upto 63 kB.  To be able to 
compare my algorithm to others I modified the compressor to allow bigger 
files.  I then got some reference results using the Calgary Corpus test 
suite. 

Note that the decompression code is included in the compressed files, 
although it is not valid for files over 63k (compressed or uncompressed 
size).  About 34 bytes are decompression parameters, the rest (approx.  300 
bytes) is 6510 machine language.  Kolmogorov complexity, anyone ?:-) 

To tell you the truth, the results surprised me, because the compression 
algorithm -IS- developed for a very special case in mind.  It only has a 
fixed code for LZ77/RLE lengths, not even a static one (fixed != static != 
adaptive)!  Also, it does not use arithmetic code (or Huffman) to compress 
the literal bytes.  Because most of the big files are ASCII text, this 
somewhat handicaps my compressor, although the new tagging system is very 
happy with 7-bit ASCII input.  Also, decompression is relatively fast, and 
uses no extra memory. 

I'm getting relatively near LhA, and shorter than LhA for 8 files (300-byte 
decompressor included!), and relatively near or shorter than LhA in other 
cases if the decompressor is removed. 

The table contains the file name (file), compression options (options), the 
original file size (in) and the compressed file size (out) in bytes, 
average number of bits used to encode one byte (b/B), remaining size 
(ratio) and the reduction (gained), and the time used for compression.  For 
comparison, the last three columns show the compressed sizes for LhA, Zip 
and GZip (with the -9 option), respectively. 

FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz 
Estimated decompression on a C64 (1MHz 6510) 6:47       LhA    Zip GZip-9 
file   options    in     out  b/B  ratio gained time    out    out    out 
========================================================================= 
bib    -p4    111261   35457 2.55 31.87% 68.13%  8.3  40740  35041  34900 
book1  -p4    768771  318919 3.32 41.49% 58.51% 65.1 339074 313352 312281 
book2  -p4    610856  208627 2.74 34.16% 65.84% 43.5 228442 206663 206158 
geo    -p2    102400   72812 5.69 71.11% 28.89% 11.4  68574  68471  68414 
news   -p3    377109  144566 3.07 38.34% 61.66% 15.2 155084 144817 144400 
obj1   -m6     21504   10750 4.00 50.00% 50.00%  0.1  10310  10300  10320 
obj2          246814   83046 2.70 33.65% 66.35% 13.5  84981  81608  81087 
paper1 -p2     53161   19536 2.94 36.75% 63.25%  1.5  19676  18552  18543 
paper2 -p3     82199   30676 2.99 37.32% 62.68%  4.3  32096  29728  29667 
paper3 -p2     46526   19234 3.31 41.35% 58.65%  1.4  18949  18072  18074 
paper4 -p1 -m5 13286    6095 3.68 45.88% 54.12%  0.2   5558   5511   5534 
paper5 -p1 -m5 11954    5494 3.68 45.96% 54.04%  0.1   4990   4970   4995 
paper6 -p2     38105   14159 2.98 37.16% 62.84%  0.8  13814  13207  13213 
pic    -p1    513216   57835 0.91 11.27% 88.73% 23.2  52221  56420  52381 
progc  -p1     39611   14221 2.88 35.91% 64.09%  0.7  13941  13251  13261 
progl  -p1     71646   17038 1.91 23.79% 76.21%  3.8  16914  16249  16164 
progp          49379   11820 1.92 23.94% 76.06%  1.3  11507  11222  11186 
trans  -p2     93695   19511 1.67 20.83% 79.17%  3.7  22578  18961  18862 
------------------------------------------------------------------------- 
total        3251493 1089796 2.68 33.52% 66.48% 3:18 

 
Canterbury Corpus Suite 
----------------------- 

The following shows the results on the Canterbury corpus.  Again, I am 
quite pleased with the results.  For example, pucrunch beats GZip -9 for 
lcet10.txt if you remove the decompression code. 

FreeBSD epsilon3.vlsi.fi PentiumPro® 200MHz 
Estimated decompression on a C64 (1MHz 6510) 6:00        LhA    Zip GZip-9 
file        opt     in    out  b/B  ratio gained time    out    out    out 
========================================================================== 
alice29.txt -p4 152089  55103 2.90 36.24% 63.76% 11.3  59160  54525  54191 
ptt5        -p1 513216  57835 0.91 11.27% 88.73% 23.2  52272  56526  52382 
fields.c         11150   3505 2.52 31.44% 68.56%  0.1   3180   3230   3136 
kennedy.xls    1029744 265887 2.07 25.83% 74.17%  571 198354 206869 209733 
sum              38240  13334 2.79 34.87% 65.13%  0.6  14016  13006  12772 
lcet10.txt  -p4 426754 144585 2.72 33.89% 66.11% 30.8 159689 144974 144429 
plrabn12.txt-p4 481861 199134 3.31 41.33% 58.67% 43.6 210132 195299 194277 
cp.html     -p1  24603   8679 2.83 35.28% 64.72%  0.4   8402   8085   7981 
grammar.lsp -m5   3721   1591 3.43 42.76% 57.24%  0.0   1280   1336   1246 
xargs.1     -m5   4227   2117 4.01 50.09% 49.91%  0.0   1790   1842   1756 
asyoulik.txt-p4 125179  50594 3.24 40.42% 59.58%  7.5  52377  49042  48829 
-------------------------------------------------------------------------- 
total          2810784 802364 2.28 28.55% 71.45% 11:29 

 
-------------------------------------------------------------------------- 

 
Conclusions 
----------- 

In this article I have presented a compression program which creates 
compressed executable files for C64, VIC20 and Plus4/C16.  The compression 
can be performed on Amiga, MS-DOS/Win machine or any other machine with a 
C-compiler.  A powerful machine allows asymmetric compression:  a lot of 
resources can be used to compress the data while needing minimal resources 
for decompression. This was one of the design requirements. 

Two original ideas were presented:  a new literal byte tagging system and 
an algorithm using hybrid RLE and LZ77.  Also, a detailed explanation of 
the LZ77 string match routine and the optima parsing scheme was presented. 

The compression ratio and decompression speed is comparable to other 
compression programs for Commodore 8-bit computers. 

But what are then the real advantages of pucrunch compared to traditional 
C64 compression programs in addition to that you can now compress VIC20 and 
Plus4/C16 programs?  Because I'm lousy at praising my own work, I let you 
see some actual user comments.  I have edited the correspondence a little, 
but I hope he doesn't mind.  My comments are marked with an asterisk. 
Maybe Steve has something to add also? 

 
---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--- 

A big advantage is that pucrunch does RLE and LZ in one pass.  For demos I 
only used a cruncher and did my own RLE routines as it is somewhat annoying 
to use an external program for this.  These programs require some memory 
and ZP-addresses like the cruncher does.  So it can easily happen that the 
decruncher or depacker interfere with your demo-part, if you didn't know 
what memory is used by the depacker.  At least you have more restrictions 
to care about.  With pucrunch you can do RLE and LZ without having too much 
of these restrictions. 

* Right, and because pucrunch is designed that way from the start, it can 
* get better results with one-pass RLE and LZ than doing them separately. 
* On the other hand it more or less requires that you _don't_ RLE-pack the 
* file first.. 

This is true, we also found that out.  We did a part for our demo which had 
some tables using only the low-nybble.  Also the bitmap had to be filled 
with a specific pattern.  We did some small routines to shorten the part, 
but as we tried pucrunch, this became obsolete.  From 59xxx bytes to 12xxx 
or 50 blocks, with our own RLE and a different cruncher we got 60 blks! 
Not bad at all ;) 

Not to mention that you have the complete and commented source-code for the 
decruncher, so that you can easily change it to your own needs.  And it's 
not only very flexible, it is also very powerful.  In general pucrunch does 
a better job than ByteBoiler+Sledgehammer. 

In addition to that pucrunch is of course much faster than crunchers on my 
C64, this has not only to do with my 486/66 and the use of an HDD.  See, I 
use a cross-assembler-system, and with pucrunch I don't have to transfer 
the assembled code to my 64, crunch it, and transfer it back to my pc. 
Now, it's just a simple command-line and here we go...  And not only I can 
do this, my friend who has an amiga uses pucrunch as well.  This is the 
first time we use the same cruncher, since I used to take ByteBoiler, but 
my friend didn't have a REU so he had to try another cruncher. 

So, if I try to make a conclusion:  It's fast, powerful and extremly 
flexible (thanks to the source-code). 

---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--- 

Just for your info... 

We won the demo-competition at the Interjam'98 and everything that was 
crunched ran through the hands of pucrunch...  Of course, you have been 
mentioned in the credits.  If you want to take a look, search for 
KNOOPS/DREAMS, which should be on the ftp-servers in some time. 
So, again, THANKS!  :) 

       Ninja/DREAMS 

---8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--- 

 
So, what can I possibly hope to add to that, right?:-) 

If you have any comments, questions, article suggestions or just a general 
hello brewing in your mind, send me mail or visit my homepage. 

See you all again in the next issue! 

-Pasi 

 
-------------------------------------------------------------------------- 

 
Appendix: The Log Book 
---------------------- 

5.3.1997 
       Tried reverse LZ, i.e.  mirrored history buffer.  Gained some 
       bytes, but its not really worth it, i.e.  the compress time 
       increases hugely and the decompressor gets bigger. 

6.3.1997 
       Tried to have a code to use the last LZ copy position (offset 
       added to the lastly used LZ copy position).  On bs.run I gained 
       57 bytes, but in fact the net gain was only 2 bytes 
       (decompressor becomes ~25 bytes longer, and the lengthening of 
       the long rle codes takes away the rest 30). 

10.3.1997 
       Discovered that my representation of integers 1-63 is in fact 
       an Elias Gamma Code.  Tried Fibonacci code instead, but it was 
       much worse (~500 bytes on bs.run, ~300 bytes on delenn.run) 
       without even counting the expansion of the decompression code. 

12.3.1997 
       'huffman' coded RLE byte -> ~70 bytes gain for bs.run.  The RLE 
       bytes used are ranked, and top 15 are put into a table, which 
       is indexed by a Elias Gamma Code.  Other RLE bytes get a prefix 
       "1111". 

15.3.1997 
       The number of escape bits used is again selectable.  Using only 
       one escape bit for delenn.run gains ~150 bytes.  If #-option is 
       not selected, automatically selects the number of escape bits 
       (is a bit slow). 

16.3.1997 
       Changed some arrays to short.  17 x inlen + 64kB memory used. 
       opt-escape() only needs two 16-element arrays now and is 
       slightly faster. 

31.3.1997 
       Tried to use BASIC ROM as a codebook, but the results were not 
       so good.  For mostly-graphics files there are no long matches -> 
       no net gain, for mostly-code files the file itself gives a 
       better codebook..  Not to mention that using the BASIC ROM as a 
       codebook is not 100% compatible. 

1.4.1997 
       Tried maxlen 128, but it only gained 17 bytes on ivanova.run, 
       and lost ~15 byte on bs.run.  This also increased the LZPOS 
       maximum value from ~16k to ~32k, but it also had little effect. 

2.4.1997 
       Changed to coding so that LZ77 has the priority.  2-byte LZ 
       matches are coded in a special way without big loss in 
       efficiency, and codes also RLE/Escape. 

5.4.1997 
       Tried histogram normalization on LZLEN, but it really did not 
       gain much of anything, not even counting the mapping table from 
       index to value that is needed. 

11.4.1997 
       8..14 bit LZPOS base part.  Automatic selection.  Some more bytes 
       are gained if the proper selection is done before the LZ/RLELEN 
       optimization.  However, it can't really be done automatically 
       before that, because it is a recursive process and the original 
       LZ/RLE lengths are lost in the first optimization.. 

22.4.1997 
       Found a way to speed up the almost pathological cases by using 
       the RLE table to skip the matching beginnings. 

2.5.1997 
       Switched to maximum length of 128 to get better results on the 
       Calgary Corpus test suite. 

25.5.1997 
       Made the maximum length adjustable.  -m5, -m6, and -m7 select 
       64, 128 and 256 respectively.  The decompression code now allows 
       escape bits from 0 to 8. 

1.6.1997 
       Optimized the escape optimization routine.  It now takes almost 
       no time at all.  It used a whole lot of time on large escape bit 
       values before.  The speedup came from a couple of generic data 
       structure optimizations and loop removals by informal 
       deductions. 

3.6.1997 
       Figured out another, better way to speed up the pathological 
       cases.  Reduced the run time to a fraction of the original time. 
       All 64k files are compressed under one minute on my 25 MHz 
       68030.  pic from the Calgary Corpus Suite is now compressed in 
       19 seconds instead of 7 minutes (200 MHz Pentium w/ FreeBSD). 
       Compression of ivanova.run (one of my problem cases) was 
       reduced from about 15 minutes to 47 seconds.  The compression of 
       bs.run has been reduced from 28 minutes (the first version) to 
       24 seconds.  An excellent example of how the changes in the 
       algorithm level gives the most impressive speedups. 

6.6.1997 
       Changed the command line switches to use the standard approach. 

11.6.1997 
       Now determines the number of bytes needed for temporary data 
       expansion (i.e.  escaped bytes).  Warns if there is not enough 
       memory to allow successful decompression on a C64. 

       Also, now it's possible to decompress the files compressed with 
       the program (must be the same version).  (-u) 

17.6.1997 
       Only checks the lengths that are power of two's in 
       OptimizeLength(), because it does not seem to be any (much) 
       worse than checking every length.  (Smaller than found maximum 
       lengths are checked because they may result in a shorter file.) 
       This version (compiled with optimizations on) only spends 27 
       seconds on ivanova.run. 

19.6.1997 
       Removed 4 bytes from the decrunch code (begins to be quite 
       tight now unless some features are removed) and simultaneously 
       removed a not-yet-occurred hidden bug. 

23.6.1997 
       Checked the theoretical gain from using the lastly outputted 
       byte (conditional probabilities) to set the probabilities for 
       normal/LZ77/RLE selection.  The number of bits needed to code 
       the selection is from 0.0 to 1.58, but even using arithmetic 
       code to encode it, the original escape system is only 82 bits 
       worse (ivanova.run), 7881/7963 bits total.  The former figure is 
       calculated from the entropy, the latter includes 
       LZ77/RLE/escape select bits and actual escapes. 

18.7.1997 
       In LZ77 match we now check if a longer match (further away) 
       really gains more bits.  Increase in match length can make the 
       code 2 bits longer.  Increase in match offset can make the code 
       even longer (2 bits for each magnitude).  Also, if LZPOS low 
       part is longer than 8, the extra bits make the code longer if 
       the length becomes longer than two. 

       ivanova -5 bytes, sheridan -14, delenn -26, bs -29 

       When generating the output rescans the LZ77 matches.  This is 
       because the optimization can shorten the matches and a shorter 
       match may be found much nearer than the original longer match. 
       Because longer offsets usually use more bits than shorter ones, 
       we get some bits off for each match of this kind.  Actually, the 
       rescan should be done in OptimizeLength() to get the most out 
       of it, but it is too much work right now (and would make the 
       optimize even slower). 

29.8.1997 
       4 bytes removed from the decrunch code.  I have to thank Tim 
       Rogers (timr@eurodltd.co.uk) for helping with 2 of them. 

12.9.1997 
       Because SuperCPU doesn't work correctly with inc/dec $d030, I 
       made the 2 MHz user-selectable and off by default.  (-f) 

13.9.1997 
       Today I found out that most of my fast string matching 
       algorithm matches the one developed by [Fenwick and Gutmann, 
       1994]*.  It's quite frustrating to see that you are not a genius 
       after all and someone else has had the same idea.  :-) However, 
       using the RLE table to help still seems to be an original idea, 
       which helps immensely on the worst cases.  I still haven't read 
       their paper on this, so I'll just have to get it and see.. 

       * [Fenwick and Gutmann, 1994].  P.M.  Fenwick and P.C.  Gutmann, 
       "Fast LZ77 String Matching", Dept of Computer Science, The 
       University of Auckland, Tech Report 102, Sep 1994 

14.9.1997 
       The new decompression code can decompress files from $258 to 
       $ffff (or actually all the way upto $1002d :-).  The drawback 
       is:  the decompression code became 17 bytes longer.  However, the 
       old decompression code is used if the wrap option is not 
       needed. 

16.9.1997 
       The backSkip table can now be fixed size (64 kWord) instead of 
       growing enormous for "BIG" files.  Unfortunately, if the 
       fixed-size table is used, the LZ77 rescan is impractical (well, 
       just a little slow, as we would need to recreate the backSkip 
       table again).  On the other hand the rescan did not gain so many 
       bytes in the first place (percentage).  The define BACKSKIP-FULL 
       enables the old behavior (default).  Note also, that for smaller 
       files than 64kB (the primary target files) the default consumes 
       less memory. 

       The hash value compare that is used to discard impossible 
       matches does not help much.  Although it halves the number of 
       strings to consider (compared to a direct one-byte compare), 
       speedwise the difference is negligible.  I suppose a mismatch is 
       found very quickly when the strings are compared starting from 
       the third charater (the two first characters are equal, because 
       we have a full hash table).  According to one test file, on 
       average 3.8 byte-compares are done for each potential match.  A 
       define HASH-COMPARE enables (default) the hash version of the 
       compare, in which case "inlen" bytes more memory is used. 

       After removing the hash compare my algorithm quite closely 
       follows the [Fenwick and Gutmann, 1994] fast string matching 
       algorithm (except the RLE trick).  (Although I *still* haven't 
       read it.) 

       14 x inlen + 256 kB of memory is used (with no HASH-COMPARE and 
       without BACKSKIP-FULL). 

18.9.1997 
       One byte removed from the decompression code (both versions). 

30.12.1997 
       Only records longer matches if they compress better than 
       shorter ones.  I.e.  a match of length N at offset L can be 
       better than a match of length N+1 at 4*L.  The old comparison 
       was "better or equal" (">=").  The new comparison "better" (">") 
       gives better results on all Calgary Corpus files except "geo", 
       which loses 101 bytes (0.14% of the compressed size). 

       An extra check/rescan for 2-byte matches in OptimizeLength() 
       increased the compression ratio for "geo" considerably, back to 
       the original and better.  It seems to help for the other files 
       also.  Unfortunately this only works with the full backskip 
       table (BACKSKIP-FULL defined). 

21.2.1998 
       Compression/Decompression for VIC20 and C16/+4 incorporated 
       into the same program. 

16.3.1998 
       Removed two bytes from the decompression codes. 

17.8.1998 
       There was a small bug in pucrunch which caused the location 
       $2c30 to be decremented (dec $2c30 instead of bit $d030) when 
       run without the -f option.  The source is fixed and executables 
       are now updated. 

-------------------------------------------------------------------------- 

References 

   1. http://www.cs.tut.fi/~albert/ 
   2. http://www.cs.tut.fi/~albert/Dev/ 
   2. http://www.cs.tut.fi/~albert/Dev/pucrunch/ 

....... 
.... 
.. 
.                                    C=H #17 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 

VIC-20 Kernal ROM Disassembly Project 
Richard Cini 

Introduction 
	In order to put this project into perspective, a little personal  
history is needed. I received my first Commodore as a gift from my parents  
back in 1982. I used Commodore PETs in the school's computer lab, and  
Radio Shack Model I's in the local R/S store. It was nice to have one  
of my own to hack on, though.  Back then, most of my work was with the  
built-in BASIC interpreter. My claim to fame (to my family, at least) was  
a BASIC/machine language mailing list management program, and an allophone  
speech synthesizer hardware-software hack. Both worked well, which surprised 
my mother, who claims that nothing I ever built worked right.  As I grew-up,  
my computer-of-choice changed, but I never lost my love for the VIC. It is  
small, easy to program, has a very capable processor (by 1980's standards)  
and decent I/O capability. It's peripherals were varied, if not quirky  
(take the 1515 printer, for example), but at least everything worked well  
together.   
	Now, fast forward to 1994. Commodore International failed,  
crippled by years of a weak product strategy, squandered opportunities  
and the market's increased focus on mainstream PC-compatible or Macintosh  
machines as productivity tools. After Commodore's failure, I decided that  
I wanted to try to purchase the Commodore 8-bit intellectual property,  
including the rights to the Kernal and BASIC source code, primarily for  
preservation purposes. One of my hobbies is collecting and preserving  
obsolete and unsupported computers, accessories, documentation, etc.  
	Since Commodore's bankruptcy attorney would not return my calls

  
(no surprise there), I embarked on decompiling the Kernal. This project,  
although time consuming and rewarding from an informational perspective  
was not truly trail-blazing. Many before me probably decompiled parts of  
the Kernal in order to gain some understanding as it related to another  
project. However, in my research, I don't recall ever seeing complete  
recompileable source code. Memory and ROM maps, yes; source code, no.  
	Marko Makela manages a great Commodore web site that contains lots  
of useful information, including these memory and ROM maps. These provided  
the starting point for my work. See http://www.hut.fi/misc/cbm/docs/ for  
this and much more information.  What I would like to accomplish in a  
series of articles is to explain the process and to discuss specific  
Kernal routines that may be of interest to C=Hacking readers. Also, where  
appropriate, I will make comparisons with the other Commodore machines  
that were the contemporaries of the VIC, the C64 and the PET, specifically.  
	One might ask, "Why is this project different from all of the other  
resources already available?" In short, here's why: 

1.	The end result is a fully modifiable and compilable source file. 
2.	Only the VIC memory map and ROM location map are available  
	*because* to date everyone has focused on the C64 as it is the  
	more functional (and hence, more popular) Commodore of the era.  
	The same goes for the PET, too. 
3.	A far as I know, there was no "Mapping the VIC" written, because  
	of #2, above. 
4.	Because of #3, the fact that I had some spare time, and that I  
	love my VIC (although I don't use it as much these days), I felt  
	that I had to do it. 

Brief VIC-20 History 
-------------------- 
Although I'd like to provide a complete history lesson on the VIC-20, many  
others before me have done a better job. The March 1985 issue of the  
IEEE Spectrum has an article, as does Marko's web site. See  
http://www.hut.fi/misc/cbm/docs/peddle.english.html for a great article on  
Chuck Peddle, the well-known creator of the 6502 microprocessor. Nonetheless, 
I will provide a Readers' Digest version of the history of the VIC. 
	In the late-70s, engineers in the Advanced Systems Design Group  
(ASDG) at Commodore created a multi-function video/sound interface chip,  
the 6560 (a.k.a., the VIC). Al Charpentier ran the LSI section of the ASDG, 
and was the lead in designing the VIC-I, and later, the VIC-II. The ASDG was 
the old MOS Semiconductor operation that Commodore bought in the mid-70s. 
The 6560 supported complete composite color video, 3-voice plus white noise  
sound, a volume control, two analog-to-digital converters that supported  
the use of a game paddle or joystick, and a light pen interface.  
Feature-rich as it was, no manufacturer wanted to commit a product line to  
it. Since Commodore could not find anyone to buy the chip, they decided to  
build a computer that featured the chip. Hence, the VIC-20 was born. Al's  
buddy, Bob Yannes, was a senior systems designer at Commodore who developed 
the VIC-20 (and later, the C64) prototype. The VIC was Commodore's first  
color computer, and the first designed for home use.   

Relationships with other Commodore Products 
------------------------------------------- 
	The structure of the VIC-20 ROM parallels those of Commodore's  
other machines of that era, the PET and the C64. All three machines share  
the concept of a standard, public, API accessed through a jump table  
located in the last page of the system ROM. 
	There's also another striking similarity, which comes somewhat as  
a surprise to this writer, but intuitively makes sense. The C64 and VIC  
Kernal ROMs are nearly identically laid-out and contain a lot of common  
code. The C64 ROM of course contains certain modifications relating to  
its unique hardware and capabilities, but otherwise is the same. Since  
the machines are so similar in design (the same engineer designed them), 
the similarity of the code isn't surprising. This recycling appears to  
have been a cost-effective way to develop a new computer in record time.  

The Process 
-----------  
I believe that the reverse engineering process for the purpose of creating  
source code that produces binary-identical object code, is fairly standard:  
know thy hardware, get the object code, turn object code into assembly code,  
give everything names, test compile, fix errors, re-compile and call it done. 
However, I'm certain that C=Hacking readers have reverse engineering methods 
which differ from mine. I'd be interested in hearing them, as I'm always  
looking for a better way to do things.  I started with an image of the ROM  
from my VIC (although a ROM image from funet would work) and cranked it  
through a disassembler (I used SuperMon). Since I wasn't too up on  
transferring data from the VIC to the PC, I used a brute-force method:  
scanning disassembler listings into TIFF files and running them through  
an OCR program. This produced plain-text files for me to work with.  Then,  
I used various available information on the Web and in books (such as "VIC  
Revealed", "The Commodore Innerspace Anthology", "Mapping the 64") 
to break the code into subroutines. I inserted meaningful memory and  
program location labels, taking names from all of the above sources.  
My assembler allowed me to create conventional data and code "segments", 
so I took the time to create a "real" data segment that mirrored the  
VIC memory map. These segments are not to be confused with the segments  
supported under the various PeeCee memory models. I used TASM 3.1, a  
shareware 8-bit table-based assembler by Squak Valley Software of  
Issaquah, WA. TASM supports the 6502, 6800/05/11, 8048/51/85/96, Z80 and  
TMS7000/32010/32025 processors. 

Next, I read the code and added comments as I went along. I do this on a  
routine-by-routine basis, as time permits, but always beginning at the  
first instruction after POR. Finally, I did a recompilation and compared  
the recompiled output with the ROM image to make sure that no errors were  
introduced in the source code creation process. I wasn't so lucky  
the first time around :-). I like to make gross checks first: ROM image  
size, location of well-known routines (such as the jump table at $FF85,  
the POR vector at $FD22, and the individual jump table routines).  
This helps to narrow down the location of any errors. 

Once I was satisfied that the disassembly was right, I created ROM image to  
be burned into a test EPROM. The ROMs used in the VIC are 2364  
mask-programmed ROMs. 2364s are a 24-pin 8k x 8bit ROM device, and the  
closest commonly available EPROM is the 2764 EPROM, a 28-pin 8k x 8bit  
device. Since the 2364s have four fewer pins (reflecting its non- 
programmability), an adapter board needs to be built. A piece of perf board 
and two 28-pin wire-wrap DIP socket should do the trick. 
	If all checks-out, the source is then ready for modification. 

Issues and Considerations 
------------------------- 
	There is one important thing that I discovered while hacking the  
Kernal -- there is no wasted space in the Kernal ROM as it presently exists. 
Actually, the Kernal occupies 1,279 bytes less than 8k, beginning at  
$E500 (and consequently, the BASIC ROM overhangs the $E000 boundary by  
1,279 bytes). The Kernal developers maximized the 8k space, so any Kernal  
hack will have to use jumps to a patch area elsewhere in the processor  
address space. For example, let's say that I wanted to add BASIC 4.0 disk  
commands to the VIC Kernal by hacking it (recognizing that I could have  
used the easier wedge method). I could place jumps in the Kernal ROM to  
locations within my own non-autostart ROM located at $A000.  The other  
important consideration is backward compatibility. Certain programs may rely  
on the specific location of code within the Kernal ROM. For example, a  
game ROM may make direct calls to the internal Plot routine, as opposed  
to using the jump table at the end of the Kernal ROM (saving a few  
processor ticks in the process). Shifting code around would relocate that  
code, breaking that program. 

General Hardware Information 
---------------------------- 
The Microprocessor 
Manufactured by MOS and second-sourced from Rockwell Semiconductors, the stock  
6502 is an 8-bit, 1MHz NMOS-process processor. It supports 56 instructions  
in 13 addressing modes (although six are combinations of the seven basic  
modes), three processor registers (.A, .X, and .Y), stack pointer, and a  
condition code (flags) register. The 6502 supports both maskable and  
non-maskable interrupts with fixed vectors at the top of its 64k address  
space. 

I/O Processors 
	VIC-20 I/O is managed by two memory-mapped I/O processors, the  
6522 versatile interface adapter (VIA). The I/O region occupies the  
4k-address space beginning at $9000.  The VIAs manage the keyboard,  
joystick, light pen, cassette deck, the IEEE serial interface and the  
user port. Each 6522 has 16 bi-directional I/O lines, four handshaking  
lines, two 8-bit shift registers and two clock generators (capable of  
generating free-running or triggered pulses). One of these clocks is  
responsible for RTC, RS-232, IEEE, and cassette tape timing. Of the eight  
handshaking lines, two are free for general use, while the other six are  
used for the IEEE serial port, the RESTORE key, and cassette control.  
24-bits of the total 32-bits of I/O are used for keyboard scanning and  
joystick, light pen, and serial inputs. Truly available to the user is four  
handshaking lines and 8-bits of I/O. 

Video Interface 
The VIC-20 video interface is managed by the 6560/6561 VIC chip. The  
VIC screen is organized in 22 columns by 24 rows in text mode and  
176 by 192 pixels in "graphics" mode. Graphics mode is synthesized by  
mapping the character generator ROM to RAM and modifying the character  
glyphs.  The on-chip video sync generator is capable of generating video  
in NTSC (6560) or PAL (6561) formats in 16 colors. The VIC also contains  
three programmable tone generators, a white noise source, volume control, 
two A/D converters (for game paddle interfacing), a light pen input, screen 
centering, and independent control over background, foreground, and border  
colors.  The 6560/6561 performs its own DMA to separate 4-bit video RAM  
and 8-bit character generator ROM. The address buss is a private 2MHz buss, 
which is not shared with the 1MHz microprocessor buss. Shared ROM access  
is performed during the processor Phase 1 clock. 

Memory Map 
	Space limitations prohibit listing the complete VIC-20 memory map,  
but an abridged version may be helpful: 

HEX Offset	DESCRIPTION 

0000-00FF	Zero page: Kernal and BASIC system areas 
0100-01FF    	Page 1: tape error log area, processor stack 
0200-02FF    	Page 2: BASIC input buffer, file and device address tables, 
			keyboard and screen vars, RS232 vars 
0300-03FF    	Page 3: BASIC vectors, processor register storage, Kernal  
			vectors cassette buffer area 
0400-0FFF	Pages 4-15: 3k expansion area 
1000-1DFF	User Basic area (unexpanded VIC) 
1E00-1FFF	Screen memory (unexpanded VIC) 
2000-3FFF	8K expansion RAM/ROM block 1 
4000-5FFF	8K expansion RAM/ROM block 2 
6000-7FFF	8K expansion RAM/ROM block 3 
NOTE: When additional memory is added to block 1, 2 or 3, 
the Kernal relocates the following things for BASIC: 
1000-11FF	Screen memory 
1200-?		User Basic area 
9400-95FF	Color RAM 
8000-8FFF	4K Character generator ROM 
8000-83FF	Upper case and graphics 
8400-87FF	Reversed upper case and graphics 
8800-8BFF	Upper and lower case 
8C00-8FFF	Reversed upper and lower case 
9000-93FF	I/O Block 0 
9000-900F	VIC chip registers 
9110-911F	6522 VIA#1 registers 
9120-912F	6522 VIA#2 registers 
9400-95FF	location of COLOR RAM with additional RAM at blk 1 
9600-97FF	Normal location of COLOR RAM 
9800-9BFF	I/O Block 2 
9C00-9FFF	I/O Block 3 
A000-BFFF	8K block for expansion ROM (autostart ROM) 
C000-DFFF	8K BASIC ROM 
E000-FFFF	8K Kernal ROM 

Kernal Functions 
---------------- 

System Startup 
	Let's first take a look at how a VIC-20 boots. The process is  
substantially similar for the C64 and the PET (through the commonality  
of the microprocessor upon which each machine is based), although the  
locations of various routines differs, as does the memory and I/O map. 
When power is first applied to the microprocessor, the RESET pin is held  
low by a 555 timer for a period long enough for the power supply and clock  
generator to stabilize. The 6502 utilizes the last six bytes of the address  
space to store three critical vectors: the NMI, RESET, and IRQ vectors,  
respectively. On power-up, the PC (program counter) is initialized to the  
address stored at location $FFFC and execution begins at that location  
(the first column in the source code represents the program line number): 
  
6495   FFFA ;=================================================	 
6496   FFFA ; - Power-on and hardware vectors				 
6497   FFFA ;				 
6498   FFFA A9 FE       	.dw NMI		;non-maskable interrupt 
6499   FFFC 22 FD       	.dw RESET	;POR 
6500   FFFE 72 FF       	.dw IRQ		;IRQ processor 

Execution begins at $FD22, the POR (power-on reset) vector: 

5971   FD22  ;#################################################	 
5972   FD22  ; Power-on RESET entry				 
5973   FD22  ;################################################# 
5974   FD22             RESET				 
5975   FD22 A2 FF       	LDX #$FF			 
5976   FD24 78          	SEI		;kill interrupts 
5977   FD25 9A          	TXS		;set stack top 
5978   FD26 D8          	CLD			 
5979   FD27 20 3F FD    	JSR SCNROM	;check for autostart ROM 
5980   FD2A D0 03       	BNE SKIPA0	;not there, skip ROM init 
5981   FD2C              
5982   FD2C 6C 00 A0    	JMP (A0BASE)	;jump to ROM init if present 
5983   FD2F              
5984   FD2F             SKIPA0				 
5985   FD2F 20 8D FD    	JSR RAMTAS	;test RAM 
5986   FD32 20 52 FD    	JSR IRESTR	;init work memory 
5987   FD35 20 F9 FD    	JSR IOINIT	;setup hardware 
5988   FD38 20 18 E5    	JSR CINT1	;init video 
5989   FD3B 58          	CLI		;re-enable interrupts 
5990   FD3C 6C 00 C0    	JMP (BENTER)	;enter BASIC 

	The startup routines setup the processor stack, check for the  
existence of an autostart ROM. Autostart ROMs are located in the $A000  
block and have a five-byte signature (A0CBM) at offset $04. If the  
signature is found, the Kernal jumps to the A0ROM initialization routine  
pointed to by offset $00 of the autostart ROM. 
	If no signature is found, the Kernal initialization continues by  
testing the RAM, initializing the system variables, system hardware, and  
the screen. Finally, the Kernal transfers control to the BASIC  
initialization entry point at $C000. 

Routine SCNROM 
	The first routine called from the POR code is the SCNROM routine.  
This routine looks for the special 5-byte signature that indicates the  
presence of an autostart ROM located in the $A segment. 

5992   FD3F  ;==================================================== 
5993   FD3F  ; SCNROM - Scan ROM areas for Autostart ROM signature	 
5994   FD3F  ;	 
5995   FD3F             SCNROM				 
5996   FD3F A2 05       	LDX #$05	;5 chars to compare 
5997   FD41              
5998   FD41             SCNLOOP				 
5999   FD41 BD 4C FD    	LDA SCANEX,X	;start at end of signature 
6000   FD44 DD 03 A0    	CMP A0BASE+3,X	;compare to ROM sig area 
6001   FD47 D0 03       	BNE SCANEX	;no match, exit loop 
6002   FD49              
6003   FD49 CA          	DEX		;match; check next char 
6004   FD4A D0 F5       	BNE SCNLOOP	;loop 
6005   FD4C              
6006   FD4C             SCANEX				 
6007   FD4C 60          	RTS		;return; Z=0 if no match 
6008   FD4D             ;				 
6009   FD4D             ; ROMSIG - Autostart ROM signature 
6010   FD4D             ;				 
6011   FD4D             ROMSIG				 
6012   FD4D 4130C3C2CD  	.db "A0", $C3, $C2, $CD  ;A0CBM 

Routine RAMTAS 
	The RAMTAS routine is the second subroutine in the initialization  
process. It clears the first three pages of RAM, then searches for expansion 
memory. If any is found, the screen memory, color memory, and start of  
BASIC RAM pointers are adjusted to their documented alternates. 

6058   FD8D  ;=================================================== 
6059   FD8D  ; RAMTAS - Initialize system contents 
6060   FD8D  ;	 
6061   FD8D             RAMTAS				 
6062   FD8D A9 00       	LDA #$00	;zero regs .A and .X 
6063   FD8F AA          	TAX			 
6064   FD90              
6065   FD90             RAMTSLP1		;clear system memory areas 
6066   FD90 95 00       	STA USRPOK,X	;zero page 
6067   FD92 9D 00 02    	STA BUF,X	;clear page 2 
6068   FD95 9D 00 03    	STA ERRVPT,X	;clear page 3 
6069   FD98 E8          	INX			 
6070   FD99 D0 F5       	BNE RAMTSLP1	;loop till done 
6071   FD9B              
6072   FD9B A2 3C       	LDX #$3C	;setup cassette buffer 
6073   FD9D A0 03       	LDY #$03	;area to $033c 
6074   FD9F 86 B2       	STX TAPE1			 
6075   FDA1 84 B3       	STY TAPE1+1			 
6076   FDA3 85 C1       	STA STAL	;clear I/O start address 
6077   FDA5 85 97       	STA REGSAV	;register save 
6078   FDA7 8D 81 02    	STA OSSTAR	;and start of OS memory ptr 
6079   FDAA A8          	TAY		; .Y=0 
6080   FDAB A9 04       	LDA #$04	;check RAM from $0400 
6081   FDAD 85 C2       	STA STAL+1	;set I/O start to page 3 
6082   FDAF              
6083   FDAF             RAMTASLP2				 
6084   FDAF E6 C1       	INC STAL	;increment LSB 
6085   FDB1 D0 02       	BNE RAMTAS1	;not done with page, cont. 
6086   FDB3              
6087   FDB3 E6 C2       	INC STAL+1	;inc. to new page 
6088   FDB5              
6089   FDB5             RAMTAS1				 
6090   FDB5 20 91 FE    	JSR MEMTST	;test RAM 
6091   FDB8 A5 97       	LDA REGSAV			 
6092   FDBA F0 22       	BEQ RAMTAS3 
6093   FDBC B0 F1       	BCS RAMTASLP2	;next address 
6094   FDBE              
6095   FDBE A4 C2       	LDY STAL+1	;done testing,get RAM top MSB 
6096   FDC0 A6 C1       	LDX STAL	;and LSB 
6097   FDC2 C0 20       	CPY #$20	; top at $2000 
6098   FDC4 90 25       	BCC I6561LP	;page below $2000, halt 
6099   FDC6              
6100   FDC6 C0 21       	CPY #$21	;RAM at $2000? 
6101   FDC8 B0 08       	BCS RAMTAS2	;yes, set params 
6102   FDCA              
6103   FDCA A0 1E       	LDY #$1E	;$1E00 
6104   FDCC 8C 88 02    	STY HIPAGE			 
6105   FDCF              
6106   FDCF             RAMTAS1A				 
6107   FDCF 4C 7B FE    	JMP STOTOP	;CLC and set RAM top 
6108   FDD2              
6109   FDD2             RAMTAS2				 
6110   FDD2 A9 12       	LDA #$12	;With exp. RAM, BASIC starts 
6111   FDD4 8D 82 02    	STA OSSTAR+1	;at $1200 
6112   FDD7 A9 10       	LDA #$10	;and screen starts at $1000 
6113   FDD9 8D 88 02    	STA HIPAGE			 
6114   FDDC D0 F1       	BNE RAMTAS1A	;set top of RAM and exit 
6115   FDDE              
6116   FDDE             RAMTAS3				 
6117   FDDE 90 CF       	BCC RAMTASLP2	;loop to next address 
6118   FDE0              
6119   FDE0 A5 C2       	LDA STAL+1	;get MSB of I/O start 
6120   FDE2 8D 82 02    	STA OSSTAR+1	;save as start of OS 
6121   FDE5 85 97       	STA REGSAV	;save copy 
6122   FDE7 C9 11       	CMP #$11	;page $11 
6123   FDE9              
6124   FDE9             RATS3				 
6125   FDE9 90 C4       	BCC RAMTASLP2 
6126   FDEB              
6127   FDEB             I6561LP				 
6128   FDEB 20 C3 E5    	JSR V6561I-2	;$E5C3 init VIC regs 
6129   FDEE 4C EB FD    	JMP I6561LP 

This routine actually tests the RAM, and is called during the memory  
search loop at $FDB5. It uses a simple walking-bit pattern to test for  
memory defects: 

6271   FE91  ;=================================================== 
6272   FE91  ; MEMTST - Test memory 
6273   FE91  ;		.Y is index in page 
6274   FE91             MEMTST				 
6275   FE91 B1 C1       	LDA (STAL),Y	;get address 
6276   FE93 AA          	TAX		;save .A 
6277   FE94 A9 55       	LDA #%01010101	;set pattern 
6278   FE96 91 C1       	STA (STAL),Y	;save it 
6279   FE98 D1 C1       	CMP (STAL),Y	;and compare 
6280   FE9A D0 08       	BNE MEMTS1	;not equal 
6281   FE9C             			;pattern compares OK 
6282   FE9C 6A          	ROR A		;%10101010 invert pattern 
6283   FE9D 91 C1       	STA (STAL),Y	;save it 
6284   FE9F D1 C1       	CMP (STAL),Y	;and compare 
6285   FEA1 D0 01       	BNE MEMTS1	;not equal 
6286   FEA3 A9          	.db $A9		;LDA #$18 for OK, $55 or $AA 
6287   FEA4             			; for failed pattern 
6288   FEA4             MEMTS1			 
6289   FEA4 18          	CLC		;CLC only on error 
6290   FEA5 8A          	TXA		;restore previous .A 
6291   FEA6 91 C1       	STA (STAL), Y	;save it 
6292   FEA8 60          	RTS			 

	The RAMTAS routine also calls STOTOP to save the top of RAM pointer: 

6241   FE73  ;================================================== 
6242   FE73  ; IMEMTP - Set/read top of memory (internal) 
6243   FE73  ;	On entry, SEC to read, .X/.Y is LSB/MSB		 
6244   FE73  ;	          CLC to set, .X/.Y is LSB/MSB 
6245   FE73  ;				 
6246   FE73             IMEMTP				 
6247   FE73 90 06       	BCC STOTOP	;set or read? 
6248   FE75 AE 83 02    	LDX OSTOP	;get top of memory 
6249   FE78 AC 84 02    	LDY OSTOP+1			 
6250   FE7B              
6251   FE7B             STOTOP				 
6252   FE7B 8E 83 02    	STX OSTOP	;set top of memory 
6253   FE7E 8C 84 02    	STY OSTOP+1			 
6254   FE81 60          	RTS			 

Routine IRESTR  
	This routine loads (or re-loads) the default Kernal vectors upon  
POR or Run-Stop/Restore sequences. The default vectors that are loaded  
include the links to IRQ, NMI, Open, Close, Channel In, Channel Out,  
Clear Channels, Character In, Character Out, Scan Stop Key, Get Keyboard  
Character, Close All, Load and Save routines within the Kernal ROM. 

6014   FD52  ;==================================================== 
6015   FD52  ; IRESTR - Restore KERNAL hardware vectors (internal) 
6016   FD52  ;		 Called during POR and NMI sequences. 
6017   FD52  ;				 
6018   FD52             IRESTR				 
6019   FD52 A2 EA       	LDX #$EA 	;FIXUP2;#$6D points to list of  
6020   FD54 A0 EA       	LDY #$EA 	;FIXUP2+1;#$FD $FD6D KERNAL vecs 
6021   FD56 18          	CLC			 
6022   FD57             ; 
6023   FD57             ; IVECTR - Change vectors for user 
6024   FD57             ;	On entry, SEC= read vector to .X/.Y LSB/MSB 
6025   FD57             ;	          CLC= set vector from .X/.Y LSB/MSB 
6026   FD57             ; 
6027   FD57             IVECTR 
6028   FD57 86 C3       	STX MEMUSS	;save vector to temp 
6029   FD59 84 C4       	STY MEMUSS+1 
6030   FD5B A0 1F       	LDY #$1F	;# of bytes to move 
6031   FD5D   
6032   FD5D             VECLOOP	 
6033   FD5D B9 B6 02    	LDA IRQVP,Y	;get old vector address 
6034   FD60 B0 02       	BCS VECSK	;branch on CY=1/read 
6035   FD62              
6036   FD62 B1 C3       	LDA (MEMUSS),Y	;get new vector address 
6037   FD64              
6038   FD64             VECSK 
6039   FD64 91 C3       	STA (MEMUSS),Y	;save new address to temp 
6040   FD66 99 B6 02    	STA IRQVP,Y	;and to vector area 
6041   FD69 88          	DEY		;go to next one 
6042   FD6A 10 F1       	BPL VECLOOP	;loop 
6043   FD6C 60          	RTS			 
6044   FD6D   
6045   FD6D             ; 
6046   FD6D             ;KERNAL Vectors	 
6047   FD6D             ; 
6048   FD6D             KNRLSV				 
6053   FD6D BFEAD2FEADFE	.dw IRQVEC, WARMST, LNKNMI, IOPEN 
6053   FD73 0AF4 
6054   FD75 4AF3C7F209F3	.dw ICLOSE, ICHKIN, ICHKOT, ICLRCH 
6054   FD7B F3F3 
6055   FD7D 0EF27AF270F7	.dw ICHRIN, ICHROT, ISTOP,  IGETIN 
6055   FD83 F5F1 
6056   FD85 EFF3D2FE49F5	.dw ICLALL, WARMST, LNKLOD, LNKSAV 
6056   FD8B 85F6 

Routine IOINIT 
	This routine initializes the VIAs. Lots of bit twiddling goes on  
here to set-up the various ports. This routine also starts the system  
IRQ timer. 

6138   FDF9  ;=================================================== 
6139   FDF9  ; IOINIT - Initialize I/O registers 
6140   FDF9  ;		 
6141   FDF9             IOINIT				 
6142   FDF9 A9 7F       	LDA #%01111111	;disable HW interrupts 
6143   FDFB 8D 1E 91    	STA D1IER	;interrupt enable reg VIA1 
6144   FDFE 8D 2E 91    	STA D2IER	;interrupt enable reg VIA2 

6145   FE01 A9 40       	LDA #%01000000	;Sets tmr1/VIA2 to free- 
						;running; used for IRQ 
6146   FE03 8D 2B 91    	STA D2ACR	;VIA2 aux ctrl reg 

6147   FE06 A9 40       	LDA #%01000000	;same for tmr1/VIA1. Used for 
						; RS-232 timing 
6148   FE08 8D 1B 91    	STA D1ACR	;VIA1 aux ctrl reg 

6149   FE0B A9 FE       	LDA #%11111110	;sets CA1/2, CB1/2 modes 
						; CA2/CB2 manual H, CB1 pos 
						; trig. CA1 negative trig. 
						;CA1=Restore 
						;CA2=cassette motor 
						;CB1=user port 
						;CB2=user port 
6150   FE0D 8D 1C 91    	STA D1PCR	;VIA1 periph ctrl reg 

6151   FE10 A9 DE       	LDA #%11011110	;sets CA1/2, CB1/2 modes 
						; CB2 manual L, CB1 pos 
						; trig. CA1 negative trig. 
						; CA2 manual H 
						;CA1=cassette read 
						;CA2=*SCLK 
						;CB1=*SRQIN 
						;CB2=*SDATAOUT 
6152   FE12 8D 2C 91    	STA D2PCR	;VIA2 periph ctrl reg 

6153   FE15 A2 00       	LDX #$00	;DDR all bits IN 
6154   FE17 8E 12 91    	STX D1DDRB	;VIA1/B data dir reg 

6155   FE1A A2 FF       	LDX #%11111111	;DDR all bits OUT 
6156   FE1C 8E 22 91    	STX D2DDRB	;VIA2/B data dir reg 

6157   FE1F A2 00       	LDX #$00	;DDR all bits IN 
6158   FE21 8E 23 91    	STX D2DDRA	;VIA2/A data dir reg 

6159   FE24 A2 80       	LDX #%10000000	;BIT7=OUT, BITS6-0 IN 
6160   FE26 8E 13 91    	STX D1DDRA	;VIA1/A data dir reg 

6161   FE29 A2 00       	LDX #$00			 
6162   FE2B 8E 1F 91    	STX D1ORAH	;VIA1 output reg A MSB 

6163   FE2E 20 84 EF    	JSR SCLK1	;set IEEE clock line=1 

6164   FE31 A9 82       	LDA #%10000010	;enable IER CA1/VIA1 RESTOR 
6165   FE33 8D 1E 91    	STA D1IER	;VIA1 IER BIT7=1 

6166   FE36 20 8D EF    	JSR SCLK0	;set IEEE clock line=0 
6167   FE39             ; 
6168   FE39             ; ENABTM - Enable timers 
6169   FE39             ; 
6170   FE39             ENABTM			 
6171   FE39 A9 C0       	LDA #%11000000	;enable tmr1/VIA2 (IRQ) 
6172   FE3B 8D 2E 91    	STA D2IER	;VIA2 IER BIT7-6=1 

6173   FE3E A9 89       	LDA #%10001001	;$89 IRQ tic divisor LSB 
6174   FE40 8D 24 91    	STA D2TM1L	;VIA2 tmr1 LSB 

6175   FE43 A9 42       	LDA #%01000010	;$42 IRQ tic divisor MSB 
6176   FE45 8D 25 91    	STA D2TM1L+1	;VIA2 tmr1 MSB 
6177   FE48 60          	RTS			 

Routine CINT1 
	This final routine initializes the character generator, sets the  
initial screen colors, clears the screen and "homes" the cursor, and  
updates screen and cursor pointers. 

1349   E518  ;================================================== 
1350   E518  ; CINT1 - Initialize I/O	 
1351   E518  ;	 
1352   E518    
1353   E518             ; 
1354   E518             ;Screen reset 
1355   E518             ; 
1356   E518             CINT1 
1357   E518 20 BB E5    	JSR IODEF1	;set deflt I/O and init VIC 
1358   E51B AD 88 02    	LDA HIPAGE	;get screen memory page 
1359   E51E 29 FD       	AND #%11111101	;$FD MS nibble is ChrROM and 
1360   E520 0A          	ASL A		;LS nibble is ChrRAM 
1361   E521 0A          	ASL A			 
1362   E522 09 80       	ORA #%10000000	;$80 
1363   E524 8D 05 90    	STA VRSTRT	;set chargen ROM to $8000 
1364   E527 AD 88 02    	LDA HIPAGE	;get screen mem page 
1365   E52A 29 02       	AND #%00000010	;$02 check for screen RAM at  
1366   E52C F0 08       	BEQ CINT1A	;$E536 $1E page 
1367   E52E              
1368   E52E A9 80       	LDA #%10000000	;$80 screen RAM is at $10 page 
1369   E530 0D 02 90    	ORA VRCOLS	;set Bit7 
1370   E533 8D 02 90    	STA VRCOLS			 
1371   E536              
1372   E536             CINT1A				 
1373   E536 A9 00       	LDA #$00			 
1374   E538 8D 91 02    	STA SHMODE	;enable shift-C= 
1375   E53B 85 CF       	STA BLNON	;start at no blink 
1376   E53D              
1377   E53D A9 EA       	LDA #$EA 	;FIXUP1+34;#$DC 
1378   E53F 8D 8F 02    	STA FCEVAL			 
1379   E542 A9 EA       	LDA #$EA 	;FIXUP1+35;#$EB 
1380   E544 8D 90 02    	STA FCEVAL+1	;shift mode evaluation 
1381   E547              
1382   E547 A9 0A       	LDA #$0A 
1383   E549 8D 89 02    	STA KBMAXL	;key buffer=16 
1384   E54C 8D 8C 02    	STA KRPTDL	;repeat delay=16ms 
1385   E54F A9 06       	LDA #$06 
1386   E551 8D 86 02    	STA CLCODE	;color=6(blue) 
1387   E554 A9 04       	LDA #$04			 
1388   E556 8D 8B 02    	STA KRPTSP	;repeat speed 
1389   E559 A9 0C       	LDA #$0C			 
1390   E55B 85 CD       	STA BLNCT	;blink timer=12ms 
1391   E55D 85 CC       	STA BLNSW	;set for solid cursor 
1392   E55F             ;				 
1393   E55F             ; Clear screen				 
1394   E55F             ;				 
1395   E55F             CLRSCN				 
1396   E55F AD 88 02    	LDA HIPAGE	;mem page for screen RAM 
1397   E562 09 80       	ORA #%10000000	;$80 
1398   E564 A8          	TAY			 
1399   E565 A9 00       	LDA #$00			 
1400   E567 AA          	TAX			 
1401   E568              
1402   E568             CLRLP1				 
1403   E568 94 D9       	STY SLLTBL,X	;address of screen line 
1404   E56A 18          	CLC			 
1405   E56B 69 16       	ADC #$16	;add 22 
1406   E56D 90 01       	BCC CLRSC1			 
1407   E56F              
1408   E56F C8          	INY			 
1409   E570              
1410   E570             CLRSC1				 
1411   E570 E8          	INX			 
1412   E571 E0 18       	CPX #$18	;all rows done? 
1413   E573 D0 F3       	BNE CLRLP1			 
1414   E575              
1415   E575 A9 FF       	LDA #$FF			 
1416   E577 95 D9       	STA SLLTBL,X			 
1417   E579 A2 16       	LDX #$16			 
1418   E57B              
1419   E57B             CLRLP2				 
1420   E57B 20 8D EA    	JSR CLRLIN	;clear line 
1421   E57E CA          	DEX			 
1422   E57F 10 FA       	BPL CLRLP2			 
1423   E581             ;				 
1424   E581             ; "Home" cursor				 
1425   E581             ;				 
1426   E581             HOME				 
1427   E581 A0 00       	LDY #$00			 
1428   E583 84 D3       	STY CSRIDX	;set column to 0 
1429   E585 84 D6       	STY CURROW	;and row to 0, too 
1430   E587             ;				 
1431   E587             ; Set screen pointers	 
1432   E587             ;	 
1433   E587             SCNPTR	 
1434   E587 A6 D6       	LDX CURROW 
1435   E589 A5 D3       	LDA CSRIDX 
1436   E58B              
1437   E58B             SCNPLP	 
1438   E58B B4 D9       	LDY SLLTBL,X 
1439   E58D 30 08       	BMI SCNPT1 
1440   E58F              
1441   E58F 18          	CLC 
1442   E590 69 16       	ADC #$16 
1443   E592 85 D3       	STA CSRIDX 
1444   E594 CA          	DEX 
1445   E595 10 F4       	BPL SCNPLP 
1446   E597              
1447   E597             SCNPT1	 
1448   E597 B5 D9       	LDA SLLTBL,X 
1449   E599 29 03       	AND #$03 
1450   E59B 0D 88 02    	ORA HIPAGE 
1451   E59E 85 D2       	STA LINPTR+1 
1452   E5A0 BD FD ED    	LDA LBSCAD,X 
1453   E5A3 85 D1       	STA LINPTR 
1454   E5A5 A9 15       	LDA #$15 
1455   E5A7 E8          	INX 
1456   E5A8              
1457   E5A8             SCNLP1	 
1458   E5A8 B4 D9       	LDY SLLTBL,X 
1459   E5AA 30 06       	BMI SCNEXIT 
1460   E5AC              
1461   E5AC 18          	CLC 
1462   E5AD 69 16       	ADC #$16 
1463   E5AF E8          	INX			 
1464   E5B0 10 F6       	BPL SCNLP1			 
1465   E5B2              
1466   E5B2             SCNEXIT				 
1467   E5B2 85 D5       	STA LINLEN			 
1468   E5B4 60          	RTS		; return to init routine 

	CINT1 calls one routine, IODEF1, which resets the default input  
and output devices to the keyboard and screen, respectively, and then  
resets the VIC chip's registers to their default. The one interesting  
thing to note is the PANIC entry at $E5B5. The Kernal does not call the  
PANIC entry! The Kernal bypasses the extra JSR by calling the IODEF1 code  
directly. Not that there is any magic in the PANIC entry; it only calls  
the IODEF1 code and homes the cursor.  I haven't yet examined the  
BASIC ROM, so it is possible that BASIC calls or vectors the PANIC entry. 

1404   E5B5		;================================================ 
1405   E5B5		; PANIC - Set I/O defaults (unused entry point??) 
1406   E5B5		; 
1407   E5B5             PANIC				 
1408   E5B5 20 BB E5    	JSR IODEF1	;reset devices and VIC regs 
1409   E5B8 4C 81 E5    	JMP HOME	;home cursor 
1410   E5BB           ;				 
1411   E5BB           ; Real PANIC entry; reset default devices				 
1412   E5BB           ;				 
1413   E5BB             IODEF1					 
1414   E5BB A9 03       	LDA #$03				 
1415   E5BD 85 9A       	STA OUTDEV	;reset output to screen	 
1416   E5BF A9 00       	LDA #$00				 
1417   E5C1 85 99       	STA INDEV	;reset input to kbrd	 
1418   E5C3             ;					 
1419   E5C3             ; Initialize 6561 VIC					 
1420   E5C3             ;					 
1421   E5C3 A2 10       	LDX #$10	;move 16 VIC registers	 
1422   E5C5              
1423   E5C5              
1424   E5C5             V6561I					 
1425   E5C5 BD E3 ED    	LDA VICSUP-1,X	;start at end of tbl 
1426   E5C8 9D FF 8F    	STA $8FFF,X	;start at end of regs	 
1427   E5CB CA          	DEX		;decrement index 
1428   E5CC D0 F7       	BNE V6561I	;do next register	 
1429   E5CE              
1430   E5CE 60          	RTS			 

2789   EDE4             ;	 
2790   EDE4             ;VIC chip video control constants	 
2791   EDE4             ;	 
2792   EDE4             VICSUP	 
2793   EDE4			.db $05	;bit7 interlace, 6-0 HCenter 
.db $19	;VCenter 
.db $16	;bit7=video address, 6-0 #rcols 
.db $2E	;bit6-1=#rows, bit0=8x8 or 16x8 chars 
.db $00	;current TV raster beam line 
.db $C0	;bit0-3 start of char memory 
	;bit4-7 is rest of video address 
;BITS 3,2,1,0 CM starting address 
;             HEX   DEC 
;0000   ROM  8000  32768 *default 
;0001        8400  33792 
;0010        8800  34816 
;0011        8C00  35840 
;1000   RAM  0000  0000 
;1001        xxxx  } 
;1010        xxxx  }unavail. 
;1011        xxxx  } 
;1100        1000  4096 
;1101        1400  5120 
;1110        1800  6144 
;1111        1C00  7168 
.db $00	;Hpos of light pen 
.db $00	;Vpos of light pen 
2794   EDEC			.db $00 ;Digitized value of paddle X 
.db $00 ;Digitized value of paddle Y 
.db $00 ;Frequency for oscillator 1 (low) 
.db $00 ;Frequency for oscillator 2 (medium) 
.db $00 ;Frequency for oscillator 3 (high) 
.db $00 ;Frequency of noise source 
.db $00 ; bit0-3 sets volume of all sound 
; bit4-7 are auxiliary color information 
.db $1B	;Screen and border color register 
; bits 4-7 select background color 
; bits 0-2 select border color 
; bit 3 selects inverted or normal mode 

Conclusion 
	Next time, we'll examine more routines and answer any questions  
that you may have. 

....... 
.... 
.. 
.                                    C=H #17 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 

NTSC-PAL fixing, part 1 
=============== 
Russel Reed <rreed@egypt.org>, Robin Harbron <macbeth@tbaytel.net>, S. Judd 

Introduction 
============ 

	Just about everyone knows that there are differences between 
NTSC and PAL machines.  Most people are also familiar with at least a  
few of the technical issues, such as extra raster lines, and know that 
these differences can cause certain types of programs to fail, in particular 
games and demos.  But how does one actually go about fixing one of 
these programs?  It is that with which this series of articles is 
concerned. 
  	"Fixing" has been a job which a number of 64 hackers have taken on,  
in order to enjoy programs written in another country.  Some of the major 
game companies imported software from other countries and sometimes had 
to reprogram the software so it would run correctly.  Fixing is not 
always a simple task, but there are some techniques and strategies which 
are useful in most cases.  There are only a few different classes of 
problems overall. 

	We decided to approach this problem in a novel way: a pair of 
yay-hoos (Robin and Steve) would place themselves under the tutelage 
of an experienced fixer (Russ, a.k.a. Decomp/Style), fix up a program, 
and write up the results and experiences.  The first step was deciding 
on which program (or programs) to fix.  It had to first of all be 
fixable!  Since time is always in short supply it couldn't be a big,  
complicated project.  And finally, since we were just getting our fixing 
feet wet, the actual fixing job needed to be fairly straightforward (so  
the big custom track-loading demo will have to wait for a future article). 
	But demos are the natural fixing candidate, and after viewing  
several different ones, and examining the code to see what would need  
fixing, we finally decided on the demo "Slow Ideas", written by a couple 
of crazy Finns way back in 1989.  It's a cool demo, was challenging and 
fun to fix, and turned out to be pedagogically a good choice.  It's 
two pages, and each page had a number of effects in need of fixing. 
These pages, and the practical side of fixing, will be discussed in 
some detail below, but first we need to review the differences between 
PAL and NTSC, and discuss the implications of those differences, and 
how they manifest themselves in programs. 

 
NTSC/PAL Differences 
==================== 

	There is really just one primary difference between NTSC and PAL  
machines: the graphics, which means VIC.  But since VIC generates the 
machine clock cycles, that means the computers run at different speeds. 
And since they run at different speeds, the CIA timers run at different 
speeds, the SIDs run at different speeds, and the CPUs run at different 
speeds.  So just having a slightly different graphics format affects 
nearly every aspect of the machine's operation. 

VIC and graphics 
---------------- 

	The PAL television standard is different from the NTSC standard. 
The primary difference is actually the way color is encoded, but  
the main issue for the 64 is the frame rate, the number of raster  
lines, and the number of cycles per raster line:  

VIC chip         Frame Rate  Raster lines  Cycles per line 
6567R56A (NTSC):    60Hz          262	         64 
6567R8+  (NTSC):    60Hz          263	         65 
6569     (PAL) :    50Hz          312	         63 

As the video raster beam sweeps across the television tube, VIC tells  
it what to display, one pixel at a time.  The CPU clock is exactly 
1/8th of the "pixel clock", so that one CPU cycle corresponds to 
eight pixels on the screen.  Thus, it is clear from the above table 
that a PAL machine has 63*8 = 504 pixels per raster line.  320 of 
those pixels comprise the visible display, while the other 184 make 
up the left and right borders. 

	What is important here are the three numbers in the table, though. 
First consider the number of cycles per line.  If a program is exactly 
synchronized with the raster, then it can make precise changes to the  
screen merely by letting a certain number of cycles elapse.  A simple  
example is making raster bars; a more involved example is opening the side  
borders, or generating an FLI display (which you can read about in previous 
issues of C=Hacking).  Needless to say, programs which require exact cycle  
timings will fail when run on a machine with a different number of cycles  
per line.  (Note that on older computers, like the old Atari 2600, the CPU 
actually built the screen display, and so _all_ the screen code had to  
be exactly timed). 
	Next observe the different number of raster lines.  The visible  
display begins on raster line 50, and there are 200 visible raster lines  
(320x200, you know).  That leaves 13 raster lines for the NTSC border, but  
62 raster lines for the PAL border.  This causes two problems.  Code which  
waits for a raster line greater than 263 will have problems on an NTSC  
machine.  A busy loop such as 

	LDA #$10 	;Wait for line 266 
	CMP $D012 
	BNE *-5 
	LDA $D011 
	BPL *-12 

will never exit, while a raster IRQ will never occur.  The code must be  
adjusted to use a different raster line or another method of timing.   
Sprite graphics on lines greater than 263 will wrap around on an NTSC  
screen and in some cases the image will be displayed twice.  The sprites  
must either be moved or their images truncated to correct this. 
	Moreover, the extra raster lines mean extra cycles per PAL frame. 
Using the table above, an NTSC machine has 263*65 = 17095 cycles per frame,  
and a PAL machine has 312*63 = 19656 cycles per frame.  In many demos and  
games those extra 2500 cycles are used to perform needed calculations and  
operations before the next frame begins, which leads to all kinds of 
trouble on an NTSC machine.  Note that NTSC machines have more cycles  
available in the _visible_ display, while PAL machines have many more 
cycles available in the borders. 

	Finally, consider the frame rate.  On an NTSC machine, there are  
17095 cycles per frame, and 60 frames per second, giving 17095*60 = 1025700 
cycles per second, or 1.02 MHz.  On a PAL machine, there are 19656*50 = 982800 
cycles per second, or 0.98 MHz.  So although a PAL machine has more cycles 
per frame, the CPU runs slightly slower than on an NTSC machine.  Thus a 
game like Elite, which involves raw computation, runs a little faster 
on an NTSC machine.  But for most games and demos it is the cycles per 
frame which is important -- as long as all game calculations can get done 
before the next frame, the game can run at the full frame rate.  Also 
note that most tunes are synced to the screen, so when a PAL tune, 
designed to play at 50 calls per second, is suddenly called 60 times 
each second, it will play noticeably faster. 
	By the way, there actually aren't _exactly_ 50 or 60 frames per 
second.  The frames per second is actually determined by the machine 
clock rate, not the other way around!  The actual system clock rates are 
14318181 / 14 = 1022727Hz for NTSC and 17734472 / 18 = 985248Hz for PAL. 
Dividing by 17095 (PAL=19656) cycles per frame gives 59.826 frames/second  
NTSC and 50.124 fps PAL. 
	The important thing to remember here is that PAL machines run  
slightly slower than NTSC machines, but have many more cycles per video  
frame. 

	Just as a side node, the above calculation should indicate to you 
that although the AC electricity lines are 60Hz in the US and 50Hz in Europe,  
that has nothing to do with the 50/60Hz PAL/NTSC frame rates.  Not only can  
an NTSC monitor easily display a 50Hz signal (let alone 59.826Hz), but the  
actual power frequency fluctuates around that 50/60Hz anyways, so that  
the AC line frequency is only 50/60Hz on _average_ -- good enough to 
run a clock, but not nearly precise enough to generate a video signal. 

	Also note that there are two different NTSC VIC chips in the table. 
The 64 cycles/line VIC was present in the earliest 64s shipped.  This is  
actually a bug in the chip, though, and 65 cycles/line is the "correct", 
not to mention most common, NTSC VIC chip, and the one which we will 
refer to in this article.  Oh?  You don't believe me?  Well, from the  
March 1985 IEEE Spectrum article: 

	In addition to the difficuly with the ROM [sparklies], "I made 
	a logic error," Charpentier recalled.  The error, which was  
	corrected sometime after Charpentier left Commodore, caused  
	the early C-64s to generate the wrong number of clock cycles  
	on each horizontal video line.  "It was off by one," he said. 
	"Instead of 65 cycles per line, I had 64." 

	As a result, the 180-degree phase shift between the black-and-white 
	and color information, which would have eliminated color transition 
	problems, didn't occur.  Depending on their color and the color of 
	the background, the edges of some objects on the screen would appear 
	slightly out of line... 

There ya go! 

Machine cycles 
-------------- 

	The tiny CPU speed difference has a number of important ramifications. 
It means that the CIA timers on a PAL machine run slightly slower than on  
an NTSC machine, so timer values may have to be recalibrated; note that the 
system CIA interrupt has a different setting for PAL and NTSC.  Moreover,  
a disk drive runs at exactly 1MHz -- there are no "PAL disk drives" -- which  
means cycle-exact fastloaders will not synchronize correctly on  
different-speed machines.  And finally, it means that SID works differently. 
	Not only will the tempo of any interrupt-based tune (raster _or_ CIA)  
change, but the actual pitch will change as well.  SID generates its waveforms 
by simply updating an internal counter every cycle, so an NTSC SID is  
essentially playing a digital sample at 1.02 MHz.  When that sample is  
played at 0.98 MHz, it's like slowing a record player down a little -- the  
pitch decreases.  To be specific, the _absolute_ pitch changes, but the  
_relative_ pitch between notes does not; the tune plays the same, but 
the pitches are all a little over a quarter-step lower.   
	Practically speaking, this is totally irrelevant to fixing. 
Only the tune speed is of significant interest. 

Fixing the Problems 
=================== 

	The previous section described different classes of problems  
which can occur from the different cycles per line, lines per frame,  
cycles per frame, and cycles per second.  These problems include screen  
syncs, bad interrupts and infinite busy-loops, too many cycles per frame, 
mis-timed timers and fastloaders, and different music tempos. 

	Fixing tunes is easy.  For programs in which the interrupt frequency 
is otherwise unimportant, the interrupt timer source can be adjusted to 
the correct frequency.  In other cases, if the interrupt frequency 
cannot be changed, the music speed may be adjusted by calling the play 
routine twice on every sixth interrupt or not calling it on the sixth 
interrupt.  If perfection is needed, the music data might be adjusted or 
the music routine rewritten to work at the different frequency.  Much of 
the time none of these are necessary, as the music sounds fine at the 
different speed anyway.   

	Next consider too many cycles per frame.  In order to fix this  
class of problems, we must improve the efficiency of the code.  There are 
three cases here.  Sometimes busy waits are used to synchronize the code  
with a raster line on the screen.  This wastes cycles and imposes some  
restrictions which result in additional wasted cycles.  These busy waits  
may sometimes be replaced by raster interrupts which make better use of  
the available cycles. 
	At other times, programmers will set up all graphic updates so that 
the code executes in the vertical borders.  NTSC machines have a big  
disadvantage here, as we've seen earlier.  Usually this code can be  
rearranged to take advantage of the available cycles during the screen  
display.  This can be tricky, as you don't want to be updating the screen  
at the same time it is being displayed, but by splitting updates up it can  
usually be pulled off. 
	Finally there are some cases in which neither of these techniques  
is of use.  For a perfect fix, the code must be optimized, often by 
sacrificing memory.  If it can't be optimized, then something has to go; 
effects can be truncated or updates slowed down so that less is updated 
each frame. 

	Next consider the different cycles per raster line.  In most  
cases, this difference is not significant.  The routines for which it is 
significant are raster routines, where synchronization with the video 
chip is established by using exactly the right number of processor 
cycles.  FLI, VSP, and color bars are all affected by this.  Color bars 
will have flicker and look crooked; VSP effects will often be shifted 
the wrong amount; FLI routines usually either repeat the top row of 
graphics all the way down the screen or else don't display at all.  
These problems can all be corrected by adding or subtracting the correct 
number of cycles per raster line.  In most cases, this class of problem is  
actually the easiest to fix. 
	There are several approaches to putting the right number of cycles  
in for the fix.  If the source is available, inserting a NOP instruction 
may be all that is required for an NTSC fix.  Without the source, a 
modified routine may be inserted into some empty memory and the original 
routine bypassed.  Sometimes the code may be shuffled around enough to 
insert a NOP with a machine language monitor.  You can sometimes change 
the opcodes used to use a different number of cycles.  Consider the 
delays that can be added with just two bytes: 

	CMP #$EA	;2 cycles 
	BIT $EA		;3 cycles 
	NOP NOP		;4 cycles 
	INC $EA		;5 cycles 
	CMP ($EA,X)	;6 cycles 

This gives lots of room to work with, assuming the flags aren't important. 
If .X or .Y is unused, then 

	STA $1234  

can be changed to  

	STA $1234,X or STA $1234,Y  

to gain an extra cycle (if .Y=0).  If .Y is known to contain a fixed  
value like $FF, the target can be adjusted so that STA $1234 becomes  
STA $1135,Y.  Usually in an FLI routine, you'll see STA $D018, STA $D011, 
and STA $D016 together.  Two of these can be replaced with indexed opcodes 
to add the two extra cycles needed for an NTSC fix. 
	Also note that when sprites are active, a different amount 
of cycles may get stolen depending on which instruction is executing 
when the sprite data is read.  C=Hacking #3 discusses this in detail, 
and explains how it may be used to synchronize code with the raster beam. 
From a fixing standpoint, it means that you don't always want to add 
two instruction cycles to convert a piece of PAL code to NTSC; sometimes, 
as in the demo below, only _one_ cycle must be added to fix certain routines,  
with the other cycle being eaten by VIC. 

	Sprites have a few differences between PAL and NTSC as well; like  
the other differences they are not evident in many programs.  The horizontal 
sprite positions start at zero on the left side of the screen and 
increase as you move to the right.  At some point past position 300, the 
positions wrap back around to the left side of the screen.  For these 
high horizontal positions, the sprites are 8 pixels farther right on a 
PAL screen than they would be on an NTSC screen.  When these high 
positions are used to create sprites that extend all the way to the left 
edge of the visible screen, problems show up.  A PAL routine will have 
an eight pixel wide gap in the sprites on an NTSC machine while an NTSC 
routine will have eight pixels of overlap on a PAL machine.  Often these 
positions are stored in a table or loaded into a register with immediate 
addressing.  In this case it is trivial to adjust the value by eight in 
the appropriate direction.  The vertical sprite registers present similar 
behavior, which is seen even less often. 

	As was stated earlier, the 65xx processors in the 64 and 1541 run  
at close to the same frequency, but the ratio of the 64's processor  
frequency to the 1541's processor frequency is not exactly 1.0, and the 
ratio is different on PAL and NTSC systems.  This means that although the  
6510 in the 64 and the 6502 in the 1541 may be synchronized at the start of  
a section of code, after executing the same number of cycles they will no 
longer be synchronized.  Cycles may be added or subtracted on either 
processor to bring them back into sync, but this adjustment varies 
depending on either a PAL or NTSC system.  This is most often seen in 
fastloaders, where the code depends on the two processors being in sync 
in order to transmit 8 bits one at a time or four pairs of two bits 
without the need for handshaking.  As with raster routines, a few cycles 
need to be added or subtracted for a fix.  Finding the right place to 
insert or remove these cycles can be challenging.  Instead of trying to 
remove cycles from the 64 routine, which may be impossible, you can instead 
e.g. add cycles to the complementary drive routine. 

  	Finally, understanding what problems may arise and what should be  
done to correct them is only part of the skillset needed by a good fixer.   
In the best case, you'll be working with source code which you've written 
and understand.  In other cases, someone else may have written the code, 
in which case you must study and understand it before you can start 
trying to fix any NTSC/PAL problems.  Often the source code isn't even 
available, leaving you to work in a machine language monitor.  This 
imposes some new restraints.  In a monitor, it can be difficult to shift  
blocks of code around.  Several techniques are available to work around  
this.  Code can easily be left out by replacing it with NOP instructions  
or inserting JMPs to branch around sections which aren't needed.  Extra  
code can be patched in with a JMP instruction to the new code and another  
JMP instruction at its end to return to the old routines.  Where cycles  
need to be added, opcodes may be changed; for example, a LDA $ABCD might  
be replaced with LDA $ABCD,Y to add an extra cycle.  Finally, you may 
simply have to resort to a symbolic disassembly (i.e. disassemble into 
source code).  This may become an absolute necessity in some cases. 

	Then there are the cases where the object code isn't even readily 
accessible.  Some of the same skills used by crackers to remove copy 
protection are valuable.  Most commonly, games, tools, and demos will be 
compressed to make the files shorter, adding a single BASIC SYS command 
line as well.  These routines rarely try to be deceptive and are fairly 
short, so they can be undone, often just by modifying the existing 
decompression code a bit.  It is common practice to have a sequence 
cruncher on top of an RLE (or equal-byte) packer or linker.  Within the 
"scene", there will be intros to wade past, while commercial software 
will often have disk or tape copy protection and purposefully obtuse code. 

	Just remember that fixing is often about creativity and diligence. 
There is a common set of problems encountered and many creative solutions  
to correct them.  With practice, it becomes easier, just like anything else. 
You'll have an easier time of it if you pick your battles carefully.   
Remember that some problems just can't be fixed perfectly.  If you get  
stuck, try a different challenge and perhaps come back to the problem  
later.  And most of all, remember to have some fun! 

	With all this in mind, let's have a look at that demo! 

 
Slow Ideas, page 1 
---------- 

Reconnaisance: 

	Page 1 is divided into basically three parts.  The upper part 
of the screen contains a stretching sprite tech-tech extending into 
the borders, overlayed on top of raster bars.  Then the rest of the  
screen features a large Pu-239 picture/logo.  Finally, a bouncing 
sprite scroll takes place in the lower portion of the screen, extending 
into the lower (but not side) borders. 
	The screen is built using two interrupt routines, one located 
at $1240 and the other located at $1280.  The $1240 routine is short 
and occurs at the bottom of the screen (raster $F8 or so).  It removes 
the upper/lower borders, performs some calculations, and calls the 
music.  The $1280 routine basically controlls the screen, and occurs 
at raster line $xx, at the top of the screen.  It generates the tech-tech  
sideborder display, and performs most of the calculations.  Both routines 
are vectored through $0314, not $FFFE. 
	Fortunately, there is a lot of distance between different routines,  
that is, there is a lot of empty memory between routines -- probably it was  
coded in an ML monitor.  This means that adding patches, or shifting  
routines around, is much easier.  What a forward-thinking guy that 
Pasi was, to realize that it would need to be fixed by C=Hacking one 
day. 

	When run, the demo is a mess.  The most prominent defects are 
that the music plays very slowly, the screen flashes, with the main 
picture flickering between the top of the screen and the lower portion 
of the screen, the side borders are not open in the tech-tech, and 
sometimes the scrolling sprites are off the screen. 

Too many cycles: 

	The half-speed music and flickering screen indicates that interrupts  
are getting skipped, which points a finger straight at too many cycles -- if 
the next interrupt is set *after* it is supposed to occur, then a whole 
frame will of course pass by before it actually occurs. 
	The first question is, where are the cycles getting eaten?  That is, 
of the two interrupt routines, which is using too many cycles.  The answer 
is immediately obvious: any interrupts that take place totally on the screen  
have *extra* cycles available -- 2 cycles per raster line.  The loss of cycles 
comes in the lower border, which means the $1240 interrupt: 

	jsr tune 
	jsr blah1 
 	jsr blah2 
	LDA $D012 
	CMP #$1E 
	BCC *-7 

First note that curious $D012 code.  It surely is meant to compare with 
line $011E, not line $1E.  Line $011E doesn't ever occur on an NTSC machine, 
though, so it actually waits until line $1E, well past the $1280 interrupt. 
So the first order of business is to BIT that BCC out of existence. 
	Alas, this still does not fix up the flickering.  The next step 
is to BIT ($2C) out the JSR tune call, to see if the problem really is 
cycles.  And sure enough, ditching the tune gives a suddenly stable, or 
at least mostly stable, screen.  BITting out the other two subroutines, 
but keeping the music, gives an unstable screen; the tune simply has to  
go somewhere else. 
	Finding extra cycles takes a bit of work; for now, it is enough to 
$2C-BIT out the JSR TUNE and focus on the other problems. 

Tech-tech: 

	Let's have a look at the tech-tech routine: 

	$1280	LDA #$96 
		STA $DD00 
		LDY #$FF 
		BIT $EA 
		NOP 
		NOP 
		LDX #$09 
		DEX 
		BNE *-3 
	$1290	LDX #$5F 
	$1292	LDA $1000,X 
		STA $D018 
		DEC $D016	;Open border 
		STA $D021 
		INC $D016 
		LDA $1060,X 
	$12A4	STA $D017 
		STY $D017 
		LDA $1100,X 
		STA $D011 
		DEX 
	$12B1	LDA $1000,X 
		STA $D018 
		DEC $D016	;Open border 
		STA $D021 
		INC $D016 
		LDA $1060,X 
		STA $D017 
		STY $D017 
	$12C9	BIT $EA 
		NOP 
		DEX 
		BPL $1292 

The routine has three parts: the initial delay at $1280, and a two-part 
loop, the first part at $1292 and the second at $12B1.  The difference 
between the two is the LDA $1100,X STA $D011 at $12AA; without two 
parts to the loop, the branch would take too many cycles.  Each part will 
need fixing, since each part uses at least one raster. 
	Opening the side borders requires exact timing, yet the above 
routine is entered through $0314, which will always have some cycle 
variance.  The trick here is that all eight sprites are active; Pasi 
himself wrote a nice article in C=Hacking #3 on using sprites to 
synchronize the raster.  The basic idea is to get the CPU to wait on 
a specific instruction; VIC then frees up the bus on a specific cycle, 
and you know exactly where you are on the raster line. 
	First to fix is the initial line delay.  Since there are +2 extra 
NTSC cycles per raster line, it seems reasonable to first try adding 
+2 cycles to the delay: 

	1287 BIT $EA	to	AND ($00),Y	;+2 cycles 
	     NOP		NOP 
	     NOP		NOP 

Really, CMP ($EA),Y is in general a better choice since it doesn't 
affect .A, but it doesn't matter here and we were young and naive, 
besides.  If the above +2 cycles aren't enough, it will be clear 
soon enough (but it turns out they are enough). 
	Next up is the borders.  Although the immediate impulse is 
to add +2 cycles per raster line -- +2 cycles to both loop parts -- 
it's not obvious where the raster sync is taking place, and +2 cycles 
might cause the sync to fail.  In fact, what's needed is just +1 cycle  
per part: 

	12A4 STA $D017	to	STA $CF18,Y	;+1 cycle 

	12C9 BIT $EA	to	INC $00EA	;+1 cycle 
	     NOP 

The $12A4 fix uses the fact that .Y=$FF.  A NOP NOP NOP would have done 
the trick at $12C9 and be safer, but we used INC in our reckless youth. 
	And suddenly -- poof!  The borders open up and the screen stops 
flickering.  Why should the screen flicker at all?  Remember that the 
loop is split in two because of an STA $D011 instruction.  This STA 
pushes badlines off, so that the timing stays precise; since badlines 
never occur, the graphics data is never fetched.  It's only after the 
rasterbars that VIC starts fetching graphics data; without the $D011 
push, this data will appear on the first visible rasterline (hence the 
earlier flicker with too many cycles).  When an imperfect (from incorrect  
cycle timing) push takes place, the picture can get the jitters.  So 
now we know why the demo behaved as it did, earlier. 

Sprite scroll: 

	And yet... the screen still flickers, when sprites bounce down 
too low.  Clearly the sprites are eating into the cycles needed by the 
lower border routines.  The simplest way to fix this is of course to 
change the y-coordinates of all the sprites.  One option is to re-do 
all the coordinate tables.  But a much easier option exists: figure 
out which code stores the sprite coordinates, and subtract a fixed 
amount from each of the y-coordinates.  This routine just happens 
to be located at $1590, and the simple insertion of code 

	15AF STA $D001,Y to	CLC 
	     ...		SBC #$0E 
				STA $D001,Y 

fixes things up just dandy. 

Still too many cycles: 

	We still have the problem of what to do about the tune.  Since 
there aren't enough cycles in the $1240 interrupt, they need to be 
found somewhere else, and the only somewhere else is the $1280 interrupt, 
during the logo display.  The first thing to figure out is how many 
lines are needed, and how many lines are free.  This is easy enough, 
by simply moving the JSR TUNE to the end of the $1280 routine, 
sandwiched between an INC $D020 and a DEC $D020.  The border will 
then indicate the end of the $1280 interrupt as well as the size 
of the tune. 
	The good news is that $1280 has a fair amount of extra 
cycles available.  The bad news is that the music is fairly inefficient, 
and needs even more cycles.  But not many more -- just a good 8-12 raster 
lines.  $1280 is pretty large, so maybe by rewriting some code enough 
cycles can be gained to make it all work.  Note that if $1280 only 
had a few raster lines to spare, this task would be much more difficult 
(if not impossible). 

	Towards the end of the $1280 routine, there are a series of 
subroutine calls.  One of them is a JSR $1E50.  This code has two loops,  
one which copies values from a table at $1900 to a table at $1000, and  
one which ORAs a value into the $1000 table.  Instead of copying 
and then ORAing, why not just combine the two loops? 

1E50 	LDY $1FC6 
1E53 	LDX #$5C 
1E55 	LDA $1900,Y 
1E58 	STA $1001,Y 
1E5B 	DEY 
1E5C 	DEX 
1E5D 	BPL $1E55		LDY $1FC6 
1E5F 	NOP 
1E60 	LDA $1FC1 
1E63 	STA $1E70 
1E66 	LDA $1FC2 
1E69 	STA $1E6F 
1E6C 	LDX #$5F		LDX #$5C 
1E6E 	LDA $32B6 
1E71 	ORA $1000,X		ORA $1900,Y 
1E74 	STA $1000,X		STA $1001,X 
1E77 	INC $1E6F 
1E7A 	BNE $1E8B 
1E7C 	INC $1E70 
1E7F 	LDA $1E70 
1E82 	CMP #$34 
1E84 	BNE $1E8B 
1E86 	LDA #$30 
1E88 	STA $1E70 
1E8B 	DEX			DEY 
1E8C 	BPL $1E6E		DEX 
1E8E 	RTS			BPL $1E6E 
1E8F 	BRK 			RTS 

On the right are the patches we added, along with replacing the JSR $1E50 
with JSR $1E5D.  Instead of copying from $1900 to $1000 and then ORAing  
into $1000, it simply ORAs to $1900 and stores it in $1000 ($1001 actually,  
since that's where the first loop stored stuff).  Sharp-eyed readers may  
have noticed that the patch affects $1001-$105D, whereas the second loop  
affected $1000-$105F; doesn't the patch above lose some bytes?  Of course  
it does, what's your point? :).  The ancient hacker technique applies 
here: try it, and if it works, don't touch it and don't ask questions! 
Better than a huge rewrite of self-modifying code. 

Wrapping up: 

	With the above fix in place, and the tune moved to the $1280 
interrupt, the demo finally seems to work great.  All that remains is 
to save it and crunch.  Figuring out which areas of memory are used 
is easy enough, by looking at the disassembler and the initialization 
code (which unpacks the code further).  But after crunching -- uh oh, 
lockup.  A program freeze shows that it is still running, but that the  
interrupts are not occuring.  A glance at the setup code shows that $D012  
is set, but $D011 is never set -- presumably the high bit is set, so the  
interrupt occurs on a nonexistant raster line.  Adding a simple 

	LDA #$1B 
	STA $D011 

to the initialization routine at $1443 fixes that up just fine, and 
the program decrunches successfully.  Woo-hoo!  One page down and 
one to go. 

Slow Ideas, page 2 
------------------ 

Reconaissance: 

	Page 2 has essentially four visible parts: a tech-tech at the top of 
the screen, followed by a swinging FALSTAFF sprite on top of rasters and 
open borders, follow

  
ed by the ubiquitous Pu-239 pic/logo, followed finally 
by a sprite scroll on top of rasters with open side and bottom borders. 
	All this is done with two interrupts, one ($11FE) occuring at 
raster line $31, the other ($1350) occuring at line $D1.  The $31 raster 
performs the tech-tech, the FALSTAFF rasters, and also performs some 
calculations for various sprite scroll effects.  The $D1 raster handles the  
lower rasters and scrolling sprites, and also plays the music, scrolls the  
sprites, and does the calculations for the FALSTAFF sprite. 
	When run, the screen is quite a jumble -- too many rasters -- and 
needless to say, the screen effects need retiming.  The music really sounds  
better at 50Hz, so it needs to be retimed as well. 

Top to bottom: 

	Before fixing the timing problems, the extra cycles need to be 
addressed.  In the $D1 interrupt, after the lower rasters are displayed 
there are a series of subroutine calls at $13B3: 

	$13B3	JSR $211C	;Tune 
		JSR $0F80	;Scroll sprites 
		JSR $1300	;Clear $7Fxx tables (used by JSR $0E00 
				;  in $31 interrupt) 
		JSR $1D00	;FALSTAFF sprite 

Deducing the function of each routine is easy enough -- just BIT it out 
and see what happens.  BITting out the first three subroutines frees up 
enough cycles to make the screen stabilize.  Finding cycles is usually 
more work than fixing up timing -- especially three subroutines worth! -- 
so it is enough to $2C-BIT out the three subroutine calls for now and fix  
up the timing first. 

Tech-tech: 

	At the top of the screen is a normal tech-tech, controlled by 
the routine at $11FE.  The code flows roughly as follows: 

	$11FE	Set up $D020/$D021, waste a few cycles to get timing right 
		LDX #$08 
	$1214	LDA $xxxx,Y 
		STA $D018 
		LDA $xxxx,Y 
		STA $D016 
		INY 
		DEX 
		NOP NOP NOP 
		Change VIC bank ($DD00) 
		CPX #$00 
		BEQ $123F 
		NOP  
		NOP  
		NOP 
		... 
		CPY #$2F 
		BCC $1214 
		JMP $125E 

The thing to recognize here is that there are two loops -- on every eighth  
line the BEQ $123F branch is taken.  A simple cycle count shows that the  
first loop takes 63 cycles, and the BEQ branch adds an extra 20 cycles: 
obviously, these are simply timed for the normal and badlines.  So all 
that is needed here is to add 2 cycles to each loop: I changed the 
CPX #$00 above to INX DEX for the first loop, and changed a NOP NOP 
to a CMP ($EA,X) in the $123F branch.  With the raster timing correct, 
the next step is the initial timing at $11FE.  By doing a little rearranging 
of code two bytes can be freed up, giving 2-6 cycles to fiddle with: 

	$11FE	LDY #$0B	LDX #$0B 
		LDA #$01	LDA #$01 
		STA $D019	STA $D019 
		LDX #$02	LDY #$02 
	$1207	DEX		DEY 
		BNE $1207	BNE $1207 
		STY $D020	STX $D020 
		STY $D021	STX $D021 
		LDY #$00	NOP NOP 
		LDX #$08	LDX #$08 
	$1214	tech-tech loop 

As you can see, by using .Y in the delay loop instead of .X, the LDY #$00 
instruction becomes redundant.  As it turns out, just +2 cycles are needed. 
Finally, there is the matter of the last raster line.  When the JMP $125E 
is taken, there is some delay before changing the border and background  
registers, to get a nice solid line.  It turns out that four extra cycles 
are needed here, and fortunately there just happens to be several padding 
bytes before $125E.  By changing the JMP $125E to a JMP $125C, two 
extra NOPs are easily inserted.  That takes care of the tech-tech. 

Falstaff rasters: 

	Immediately after the tech-tech are the rasters and open borders 
behind the FALSTAFF sprite.  JSR $0EA0 handles this part.  We already  
fixed a sideborder routine in the first page, and this one is similar: 

	$0EA0	LDX #$05		;Initial delay 
		DEX 
		BNE *-3 
		LDX #$15 
		CMP #$EA 
	$0EA9	BIT $EA			;Change to NOP NOP for +1 cycle 
		LDA $xxxx,X 
		DEC $D016 
		STA $D021 
		INC $D016 
		STA $D020 
		LDA $xxxx,X 
		STA $D011 
		NOP 
		NOP 
		NOP 
		DEX 
		BNE $0EA9 
		LDX #$02		;Change to LDX #$03 for last line 
		delay loop 

As before, +1 cycle needs to be added to the border loop, which is 
easy enough to do by changing the $0EA9 instruction.  The end delay 
also needs a little change, to make the last line nice and solid 
(there are no sprites active on the last lines).  Finally, the 
initial delay needs to be changed, to line up the routine correctly. 
I changed a CMP #$EA at $1277, just before the JSR $0EA0 call, to 
NOP NOP.  As usual in this type of thing, it was easy to simply 
experiment with the initial timing until the borders opened up. 

Bottom rasters and sprite scroll: 

	Finally, the bottom rasters need fixing up.  Once again, the 
loop looks familiar: 

	  $135D LDY #$FF	 
		LDX #$55 
		CMP #$EA 
		NOP 
		NOP 
		NOP 
	  $1366	LDA $7F02,X 
		STA $CF18,Y			;Stretch if necessary 
		STY $D017 
		LDA $1504,X 
		STA $D011 
		LDA $1C32,X 
		DEC $D016			;Open border 
		STA $D021 
		INC $D016 
		DEX 
		BNE $1366 

And once again, we need to add a cycle.  As with page 1, .Y=$FF means 
that the STA $D011 can be changed to a $CF12,Y.  Changing the CMP #$EA  
to EA EA adds the two cycles needed in the init to get everything aligned, 
and poof -- instant open borders (as long as all the sprites are being 
displayed correctly). 
	When the scroll is activated, something else becomes apparent: 
as the sprites scroll into the left border, they "pop" to the left by 
several extra pixels, i.e. the screen coordinates change by too much. 
To fix the sprite popping, all that was needed was to add 8 to $0F00 
(the sprite coordinate). 

Too many cycles: 

	Alas, the moment we've been dreading has arrived.  Where in the 
world do we find enough cycles for three subroutines?  One option is 
to get rid of some effects -- for example, the sprite scroller has 
many time-consuming effects that can occur; by getting rid of those, 
the tune can be placed in the $31 raster.  Working through the code 
doesn't reveal subroutines that can easily be rewritten.  Still, it 
sure would be nice to get all the effects going; but where to get the 
cycles? 
	A little thinking suggests something, though -- the PAL 
routine can't have THAT many extra cycles, simply because it doesn't 
get cycles until the raster routine is finished, at the bottom of 
the screen... oh duh.  A glance at the raster routine shows that 
it covers $55 rasters.  The interrupt takes place on line $D1, 
which means that the rasterbars extend all the way down to line  
$0127 = 294 or so, which is over 30 rasters past the last NTSC line. 
So suddenly there's a great big chunk of cycles, for free: 

		(Orig PAL)	(Fixes) 

	  $135D LDY #$FF	 
		LDX #$55	LDX #$35	;Only $35 raster lines 
		CMP #$EA	NOP NOP		;+2 cycles 
		NOP 
		NOP 
		NOP 
	  $1366	LDA $7F02,X	LDA $7F22,X	;Add $20 to compensate for .X 
		STA $CF18,Y 
		STY $D017 
		LDA $1504,X	LDA $1524,X	;Add $20 ... 
		STA $D011	STA $CF12,X	;+1 cycle to open borders 
		LDA $1C32,X	LDA $1C52,X	;Fix probably not needed 
		DEC $D016			;Open border 
		STA $D021 
		INC $D016 
		DEX 
		BNE $1366 

Note that the table offsets need to be increased; without that, the sprites 
start stretching all over the place and everything goes weird.  With $35  
raster lines instead of $55, the rasters go all the way down to the last  
NTSC lines, and free up just enough cycles for two of the $13B3 subroutines  
to be re-enabled: 

	  $13B3 JSR $211C	;Tune 
		JSR $0F80	;Scroll sprites 
		BIT $1300	;Clears $7Fxx tables (sprite scroll) 
		JSR $1D00	;FALSTAFF swing 

That pesky $1300 subroutine just pushes it over the edge, though: 

	$1300	LDX #$2F 
		LDA #$00 
		STA $7F01,X 
		STA $7F2D,X 
		DEX 
		BNE *-9 

It doesn't take *that* many cycles, so one option is to use even less  
than $35 rasters.  But this makes the bottom part look awfully small, and 
obscures parts of the scroll. 
	If you think about it for a moment, there are some free cycles 
still hanging around elsewhere in the code: in the tech-tech routine! 
There are lots of NOPs inside of the tech-tech loops; enough to piggy-back 
the $1300 routine into it without much effort: 
	 
	$11FE	LDX #$0B	LDX #$0B 
		LDA #$01	LSR $D019	;Still 6 cycles, -2 bytes 
		STA $D019	LDY #$02 
		LDY #$02	DEY 
	$1207	DEY		BNE $1205 
		BNE $1207	STA $D020 
		STX $D020	STX $D021 
		STX $D021	LDX #$08 
	$1210	NOP		NOP 
		NOP		NOP 
	$1212	LDX #$08	LDA $1887,Y 
		LDA $1887,Y	STA $D018 
		STA $D018	LDA $1B87,Y 
		LDA $1B87,Y	STA $D016 
		STA $D016	INY 
		INY		ROL		;Replaces LSR LSR ... 
		DEX		ROL 
		NOP		ROL 
		NOP		AND #$03 
		NOP		STA $DD00 
		LSR		LDA #$00 
		LSR		STA $7F01,Y 
		LSR		STA $7F2A,Y 
		LSR		DEX 
		LSR		BEQ $123F	;Every 8th raster 
		LSR 
		STA $DD00 
		INX 
		DEX 
		BEQ $123F 

Using LSR $D019 instead of LDA #$01 STA $D019 saves 2-bytes (6502 
instructions like LSR use a "read-modify-write" cycle, and write #$FF 
to the register while LSR is working, which is why LSR $D019 clears $D019!) 
while still using 6 cycles.  Replacing the LSR LSR ... stuff with the 
ROL code saves another byte.  Moving the DEX down saves two more bytes, 
and getting rid of the NOPs saves three more, for a total of 8 bytes -- 
just enough for the LDA STA STA which replaces the JSR $1300 code. 
The new code uses two fewer cycles, however -- but by rearranging the 
initialization code we can just branch to a NOP at $1211 above. 
	Every eighth raster the branch is made to $123F.  This routine 
performs another INY, which has the effect of skipping every eighth entry  
in the $7Fxx tables.  Thus extra STAs need to be added: 

	  $123F	BIT $EA		BIT $EA 
		CMP ($EA,X)	STA $7F02,Y 
		NOP		STA $7F2B,Y 
		NOP 
		... 

	The two STA xxxx,Y each take 5 cycles, so 10 cycles exactly replaces  
the 10 cycles used by CMP ($EA,X) NOP NOP.  But since the STAs take two more  
bytes, the rest of the routine needs to be moved forwards; luckily, there  
are empty padding bytes immediately following the routine, so it moves 
forwards without a hitch... 
	Well, maybe with one hitch.  The whole tech-tech routine uses 
self-modifying code all over the place.  The modifying instructions 
immediately follow the routine, and so need to be adjusted to the 
new locations. 
	Finally, for some strange reason, the above fixes also affect 
the FALSTAFF rasters.  Adding an extra +2 cycles to the intial FALSTAFF 
delay fixes that up, though. 

	So much for extra cycles! 

The Tune: 

	Finally, the tune really does sound better at 50Hz, so a 5/6 
delay needs to be added.  There is some free memory at $09xx which is 
ideal: 

	$09E0	DEC $09FF 
		BEQ *+3 
		JMP $211C	;Play tune 
		LDA #$05	;Skip tune every 6th frame 
		STA $09FF 
		RTS 

As the comments say, it simply skips every sixth frame of the tune, to 
give it an effective play rate of 50Hz.  It works! 

Glitches: 

	Unfortunately, there are still a few glitches in the fix.  The 
bottom rasters have some flicker in the upper right (and occasionally upper- 
left) corners.  Pretty minor stuff, but still annoying.  Fixing means 
rewriting the raster loop. 

	Much worse, though, is that the screen will glitch badly when 
certain sprite scroll effects are activated.  This glitch seems to involve 
quite a lot of rasters, since even with the tune disabled glitches are 
still evident.  One choice is to reduce the number of bottom rasters, 
but that doesn't look very good.  Another is to eliminate effects, but 
that's no good.  A third is to do a big code rewrite -- bleah.  But the 
easiest is to just call it a 95% fix and say "good enough".  Good enough. 

	And there it is -- Our First Fix.  Hope u lik3 1t, d00dZ!  Next  
time, we'll have a look at some of the routines which weren't covered this  
time, such as fastload and FLI routines.  Until then, have fun fixin! 

....... 
.... 
.. 
.                                    C=H #17 

begin 644 slowideas-ntsc 
M`0@+".\`GC(P-C$```!X+##0Y@&B4[WP")W_`<K0]Z*QO4`(G?8`RM#WH%;* 
MO9I=G8CXBM#VSC$(SBX(B-#M3!4!@`*-``CF^M`"YOM@I/BB`R`O`H7XF*(% 
M("\"(/D`H`"8H@,@+P+%^-#L(!$"A2U*T$`@)0)*D$@@)0)*D,[((!$"A2W) 
M@)`+H@$@)@*%+2`1`J@@$0*JO3("X""0!HJB`R`F`J8MZ"#Y`,K0^HC0]_"L 
M(!$"R?_P).D`H@`@+P*%+B`C`F7ZIBV%+:7[Y2Z%+NBQ+2#Y`,C*T/?PTZDW 
MA0$L,-"E^H4MI?N%+EA,0Q1(K3&D[@("T`/N`P(JA?=H8.B*!O?0`R```I`2 
MZ.`(T/+P"Z('Z`;WT`,@``(JRM#U&&``_R`#`00,$`(?,*#P"`]`@`8'L`HD 
M8"[@&3]*<+;`3V_UA_)G\!>0DGS^DDZ/S]-W3I-]^)NJ8#A@0$H_UAG0"VQE 
MA\`/P&)M)PGCPIN1\?/CXG?OA&4_/R+Y-$\^M-A/)/?WXG`"!@X>/IB>?>GO 
MX)43(8#@"C]P&`/QF9FQTXQV!G;%7Q#YB/_G3C/&1\8ST[J$A$VD]^%-E-A- 
MXT*"%9-Z462:(>$>B/!-$<R/)'<CJ3WZ_S]3D_D0">R:<<"GA@>'PJ:`,//W 
MY5\)H!@*P7,-//7S]U64H__Y%2MSD5?Z.]9("=`8,&!"=?R,7X5>@2<ING@F 
MX>$!P)J'J"0Y34Y!TYKT%!3T5)G]!@+PG0>,6%.G7@,!G(F\0>GQ,$!#IZ&R 
M>])HMDA9EP@33HP<$]]U3DM2P"`P&`P",!(SX0=_P!<IWZ)"44G`GH$K#?0: 
MIS7(F\W.`9D(J_T2-/04]/83$>`<@9\?`0,>```IBQ/CHB(A6/@Q,2$@<)TF 
MRLWD3=5DW)N2R^$%7GYK,X%LK(WDZ?C\/`TZ0Z++H^/#`3P`0=9>'AP8:PX% 
M9#)TI$):>"-9/0!6#N/OZ<C^?P'PGF)P@3@%50G,IP$9Z<[;J<!TC/R&<M,= 
M.83I2^2U,14V*P-_#Y82A^^."RZ/@\6#)_#X.G@!@\.LNC/6F``K!+@.#P>$ 
MZ!Q\^>%ET+4]/Q`VGI`'ZE?TWC38_OY4?((DP?4,XZN1G2I2@0*$2R:'!7/" 
MP`%>\5[ZAUK#KF+L?5Y7<K/C\MHU3\<'P>'J0E)4_^[`#7A'9V9F9V=CX]KQ 
MLQV9]-RU95O(<#@Y7C;A:Q['_30KRN%TX5N=TP0&M,$`/K).W$JZ,5H0M(]H 
MH'(*>JHA^N&TPK5PNR/<E*M+,08<??GA3*KT'*];KJC"<&%ZD*G%=S6\\.NX 
MIAW<'<.ZJ!__EL2J+5%"E;!P='BB"<>%@T.+"D\'UD8:\=[$37OF`5YY4JJ] 
M0!^:?])C^KS'\^H,9]/^/H[O]UE_^*]P)N\>]H2LR6T%1R\LDM/ZNZ#/@!_Z 
M]>$`KWNQ&G!^/UE<V.4LJ\G__HIAP\>7&.ZS)(>6'JE9!T)+8V?EEO+9.40: 
MWDKO6ZCB.6V$MC=A^?V=VY3+1HX61;MRA`]@`KLMMY2%KRLH*?VMBS4_<9%: 
M[BZAI`UI=:LK&H!W5[[,@%'`_K+F+599+BW$+:V."EI=P,K+;F8<!<W,M>%B 
MEN'131>*3-CHMLI;$P?H063$MK=1&_/"ZQ!UQ;AL'[,#MR:J==$.Y/]QJ2^& 
M"O0/!*\LA`KRN[I,!!7D8"%QHX!JV_B,#!CQL>E<4];ELGS&_8#Y<:T1\!/Z 
MXPP:'C+8D<!CP@>'<-B,&!BML^&*W-@.*X/V!^!ORL!JX#+G.%6\0'8L_"FU 
M3.L;C'C8T9#UQQF9FQVL`S=ORJX,*ZM/E;SX&/8L*2E++S9F?,QZXIF<8-:V 
M/`9\:>U<?D!\#'@"6:+PZN!FZP1>,13_W0!\28^0CH'>H00)X14_W<`CI$X1 
M7U!M-?A7.%MT1`(!$0YK#&N>&,S;#Y`P4)$[]9:F*0M9`.9F8T77FL-C;S-F 
M:Q%-C9CL##8XKJ$(!)R$_95&LDA+R`OZNB"Y?L!"@=72P0H+"@]2`0H%"@E2 
M"0H-4F4*)$$(%U)1"B=`!]#\!/R$F)P(/#6(,8PL$"P0J!2DFJ">GJ"=XAGD 
M5&A0;=`NSS/)]\LVPM#@$N#"EC^"\?WV&?87^8_`S'Y_X_`__'.\?[R'+_+] 
M$_P4&Q0;]1KV"2>H!VCG-]<WYV8.++6V+BVM\_?<7W[/W\WG]OCO[(""@\'# 
MPH"``8.`PT##@8`"PSQX!@6`!(*!@X&"A/?O"@F)"(@+BPK_R[XK0#<+0'<* 
M1#0Z=FGBI@8T-W*D`C1K0J>J-%0.I,8T4`W'B,:C/-G<-QHBS["(T69]BK.Y 
MGV`QHSH+%$/WO`(SH`"90A[IC/INFG@@212B`+T`5&=6!T:'OW6PCW7(C6L_ 
M<G`T.IG5"3K8ZZLQG5GLLN\@3A18:8&11I"@+K-T&<80ZXPP$[@D$9QU)NSI 
M6$P-"&<`"Z;VG]TR_%$@4`E&B6A',:(QPP?[IH8P$&@00`F0``JVHH0*`\02 
M`\S):8!%O@J1%PCLXG'**EA@*5K,/)&B3Y9X#R9`]3>!`SK=>K?H5,M&@&[0 
M?YB`=751!.5H?M$OL51!R?V<+:$:0Z'<SZO6F`0F1C/C!=GN`.I7*?A<,:SD 
M<</<R;RA##>1XQ-'Z($<6=3+N*.I86C#S*!B(P[T!-P?$"ZS]`!-AR)-G=A# 
M^@']V"/UK/W)--`%J3"-9[]V,S]RF=(16E(FD@,:S[KF>;(H6C<!G5BN/<TC 
M<<?BAAI)'*I5ZY"XO7)1>98_&YX_0`]R0#0P\E`T4O\RCWB!#QK3!'BVYP^C 
MQC4>[`GW43_>P`VI!Q%\.3L`CZ@$?0`"H]!U'HCZ@$?0`#(+-^BH+T_&IW_U 
MT[[)T_V;W?1YS^PL//?V4#?&NJMH@:M0I%)_47(`,SO!L3H1LSI@C]2>&?L` 
MS]J&:0AT/N?ZQAG18!!H#U8PVHLAU\,.:+0<?#VV*DVA0&I<S75SMSVH"9>! 
M`#3;RC""N$2!3%$P`$I;,,$PZ9B2B8^#\?Q`$`?W[]_7Q[>OGY>'?V]G5T\_ 
M-R<?#P;^[N;>UL:^MJZFEHZ&?G9N9EY>5DY&/CXV+BXF)AX>%#_C[?8,CNAT 
MPY8<.-#;R,G*AHYF;G9^AHZ6IJZVOL;6WN;N_P</'R<W/T]79V]_AY>?K[?' 
MU]_O\`24?$16I-8HT2%9D'5H\:]T._0`^Z'?D@$CM(!`5,(%N3_Z`-)C]3"` 
M52"E@2I7_5-!2/Q+N6/J.Y/J4@$$#1QVYX_,H^;`K&&T'&E'NFHJ,-,!,B`U 
M6YB.@8&EB]^:LWO4N[6YR.LH4NOC*GZB!XH*J+U1>](78`:/L$"BX0T)D`T( 
M3_O$:]S`"B.X.I/^9`="]1:3OP#WJ+2=)]#*$,Q0="^A&CFA'#N@P)_-)?J\ 
MP,3(S-#2UMK>XN;H[/#T]OK_YM\AH>)BHR.CY&2E)66F)F;G)V>H*&BHZ2EI 
MZBIJJNKK*VNK^WDL;*RL[.TM+6U[ON[Z_6_KNYT<Y.<'-RQL;"OKZZMK*RKJ 
MJFHIZ:EI*2CHJ"?GIV<FYJ9F)>6E).2D9".C8R+BHB'AH6$@H&`?WY\>WIY> 
M'=U='-R<7!O;FUL:VII:&=F961C8F'MYU]>75U<6UI:65E85U=65E555.?ZG 
MS_4N?_>OE%0^W_]/#N_*7C?\;/-15YGJW,I8Y_\PN7WFV#G_RS7EY?7^5+#S 
M_8F)CY_LG/]EY_LW/_N_,_>OFAI^W_]/#QS>-_QL\U.9[%\&OPU<M%OX-?&W 
MCAQT4$].34Q+2DE(1T9%1$)!0#\^/3LZ.3@W-30S,C$O+BTL*RDH)R8E)"(A 
M(!\>'1P;&AGS>E\[R0\.#0T,"^WE">W;MX^WB[?`O;7=QWUX]W';7.3G!S<\ 
M\#>%!88&AP<'B`B)"@J+"XP,C8X.CQ`0D9(2DY05%9:7F!D9FIN<'1X?'Z"A 
MHJ,D)28G*"BIJJNLK?B]]?O/[]'[\/S\7SX@/M]^/P#^7]]/WX/'IB[Q@"^7 
MX!?;^^G[Z_/7UW[>('^KV^V%]K/Z?OTEOQ0WV=?DR?9:[Z:]''Y"4%_%]]P% 
M\_@L,\?O/SN;#_'[S\[G<!]?O=S\O;X=?PX?6S\,_QC]T!=NW;R]L\_.?`%W 
M0-S^!@[G]S]`PX#[N?'SZ\N?WCG_C:'FOW@Z%>/WGYW/*#^P,\?H.`^OWNY^ 
M7MZZ_7(%?#9^&?XQ_\1(>J'UF<P?E8<^?7E]+\9,DMD?9:OR*G"$RQ%7G>0[ 
MWT[WXN_>ZW\_+]?Y[F>_E_-,_#]A9W['SW,]_>N:9^'[RSLWA;&?A^^O_._> 
M>>?A,?W%L9^'Y67\[\NYY^'=R;&?A^]\\3/7+<6QGY?OC>%L9^7[YWA'&?Q< 
M@0'FH("Y^F@P/G>P2%9L9J%AF:AH<'YZH@(Y\Z)">SGAT;`UIG=(!X@'R$BT 
MSM$Q.4%)`5%986D!<7F!B98V"`S-#4@-M6P%FYN&:-C%@'G!N<X^CH[(#P]/ 
MC\@0$%"0T31L(J,C[#82$G-4I+3$U.3U!.45)3(%15SS597H%A9T;"TMKFPV 
M%U>7V#/586)C9&5F9VAI:FO.5LH&W-<;#<WN#CJV'(@<W1U=G9WSE>'EZ>WQ 
M]?G\Z@('.6"@X35L0L-#Q&RV(F*<HMSC(V.CY"1DI/3:A+S!!,D$UKM,FY_7 
M:7C_I:;5L81]%93U$=4U6SL_L;(@LR"TH+6VMZ"XSW<V+FZNR"\O;Z_SER0( 
M[!PL/$4XUV1DIKE99!F9MLN+83\USL\@T-$@TEQ;&FL<7[94XOI!N[RG%[XN 
M/DY>;GZ.GJZ^SM[N]?`6":^'CY>9`OA-CT]?;W7%L?"?Q<_-`#T0=<_`^,-N 
M`/J<9:=`-"I&)T!T,K*J0V9)]"($.2B_U.-(.Z$<IQI!='("QI*KU[29'V+T 
M4!0"219H#Y?<7T^ZJS&-Z5$:X#U;A$^F\>HE^M4"C`9,YLX"(=Q/(SI$F.<G 
MZ==(DGO$&<Q1@)140?.L`A.N!/V`)^R!/V@)^V`@3FZZW%)>WY+T4K,,?J?^ 
M9)O97@&3!$I,&*CN)U)EH/1K"(&/U@`([;)N!%R`"!13X$2Q"9)\9!,7Q8"_ 
M4H@`8?L!!?L$G[A$_8`G[`D_8(GZ@$_4$GZA$_0`GZ`D_0(G[@(!-PG[A$_8 
M`G[`D_8(GZ6\!-PGZA$_0`GZ`D_0)`Z"?_`4?Z%@22-[2*!.D_2?2"3Z0<K0 
M\:U;'S.1K&D^JUA.'N3Z.#;4OACMJ7TQXT`>4P*@^0`#K]B-#TZNISLX40^R 
MYBJ`'F?@_&&FD!B!#W9]!%56T3PP6?62R8XTS@=D'`F?B`FR`=I:<,R($#-@ 
M$^/9!=[WK2`KS(/\_H@>IH)T0&D.`U.A`UE"'DJ@A@P22&!XD*@`_@!3P^@? 
M_HL__P/__X4?K\^?CX</G\C^315^^?_/*/UY_SQ1VGP`C?`?A1K*-!1E*,3X 
M1<)AXL/U^6'[_JOT!1R*/U&]@*/T'Y^!1^A^'X*/U&,C#53GXH_4:C\H_87, 
M1L(_<!'[P(_?!<G[P++KP`%3QKCPC]/$?HXN3]4>/PFG@FG!"KH0.=%)A_#_ 
MS^#^C#31#Q^(_?!8?2/11W/ZV'US(`H//\`>(\T$/P@;_$?7XCT_D=W\>)H\ 
MIIYK#Z4>B.Y5]/*L]/Y8;GXCN^%AE0"NQ<`17IB=B@(0YIV)BJ)`?Q5]`:KZ 
M3I5T(B``'3L4;(`?IV4JTZEE@`Z\K``%P7J$T/@%7T#H_1]8>]`#\_\/\?\? 
M\/\_\"C]?D-9\C]'D?H<C]!D?J-Y'$NS]?^?\4?JDUA_G\$8F`0(J#]>-^J% 
MKQ0%(U4!Z%G^CU)+KRMD_/`.<X`<YP`YS@!SG`%E^J/11W*.@!SG``YSA9?J 
MCT7D:U2@@CD28XCU1MCB-(XX%I^H]%7<CH5<0#CJ/UQP<?[XX[\]QR.+CA'2 
MBT^T47O@`''`8;P`,'@`<#P`X"':`'@<`/`X`>$97@C$>$7QXBX'\!5]*/15 
MW#BKB5;B-91H*LI&(&#*8=IC^F?R9V)F2F3ZP\?=,YDS<20`"YP%9,*Q>CG[ 
MOG/[?/?R?>_P__G@_^/A_X?`@1Q`:)\'^$2RD\T;::0*:<Z?RHDE(P`]&`/( 
MP#Q&`<(P#Q$J//_`'C_@#A_X'L?\/</_.<#^A5/_!X'_@\!_X<`?\\1.(24> 
M")!1%(^@.H6L0#\M8B5<1_^B_S_`?A_@_`_Q_@?X_P/P?X?@/\T5JM#,`!61 
MF_+#*1B;#FCJ0H?"/K\.?]%"C]4?2/3^1W?RP[OA'=P'"8X)G`L/I1Z#B8N; 
MN)`#\5IHL8%>`A__`Q1^O&?\/$?\?@/^?Q.^?[DC]\#Q_^4?OW.'__/(_=8? 
M_^,/_\-&F`HQTZ`T3'@B2_'-5#?XXFBCZ<UB7RKNS_P<A_`X?$3!\B2'T?0? 
M`'`/@#@&AN`?!$-\(@'!'T0/RG;?#X^.']_>/?W\>/CX3OM3#!,/U<-/`@,^ 
M%C&*&)@<H_5'^FY#Q_?WCW]_#CX^#`F]-2K$]&[_##_^#!_\#`_X*1Q&$30P 
M!P`P`IT,+;]3%Y)+!2N$M?!?H>L/T>?T82F0937Z'J&50!ID:IB>6,3:L\A^ 
M!^%'ZLZU5G62LZO5=6*U7Q;U,GBU$"E3X1./E&#^H_7S\/WY'Z?/R)5'Z/GZ 
M)*__`^?15[_P#YZ*OO\`-15^_``\>`H_0#!0=(O<`\`>!%=@(OX!_@0`_PD/ 
M7(A4R$)<@AA)$Q).5QX`)_``1_@`@_P!`?X"`/\"`'\#`#X(@>!&(D@$R<BI 
M`6A4N-FUI^WZ/V;1^S*ZOVY1^_JU,!']P_'CY_'WY^%_#\`^'\`<'X"N28_4 
M?JP&/Y77T`_"'11^^"&O@<$;J)G=1)S^^?Q\>?IO&"+,:GX\3OS\C[/#S_/! 
MC_&!G_F%\;<`<&`P&/!X/(\C]X_QXQ_XQ?/Q@`:/E<HVI(2^AL`($X`<.`\^ 
M?/4?OCAQZE@%Y'Z@!#UC,X/HS#Y_@%J2"ANL$^P!@%7ZI4('@#@/6$DH^E'H 
MK3!5RI]Q.G]XA`?$S^7L?KXMOTX5?J='"XNYF`.'`4``'#H3,='USX"/U5XI 
M_)4`/GP"C&1F+B_5F3:C]7J=RK]4Q^^?`*@,&`++]3@?5V'_CRCT84?KPF(] 
M%8GHY3Q'Z&++]3#Q'7\*.A<&4P+`=.`\`<!^`X#G!P#!%.<XZ+>.``\<6+W/ 
M"K]5]X+XHM^.<1^ABKZX4>G!X*TQ^`!6V:S8Q=[0K&8]7C$*/V933#33G5?3 
M_A_RM:X'\'P#X!D0,`KI[^%A^O"Y:9D_D`I7C:AZ8/+!2.%5TJO?!>-^K(T? 
M%AK(T%64C$`U&"!X``@P6&\X+9Z/%XWZLL!0K>"K01E*H4^3[`.#@`''3.Y5 
M,`ZLZ%7ZJC$=R.A5Q)]E4GW'RP_1Q5^@*,99?JR\5IRI]42G\HN%23Q<9+?R 
M?G*D9;?J?J1J?TRY9C6A;-[PNDJ587+W.E7G4JP`5RL<".3`#P'P#@/P'@)3 
MELG"\[]1AO`<@(1=+CKT9%>%*.*GL#6>:J>+Y#]6.%NB`?U8L*GV!15`"P_6 
M#P`^'@`<-><;F`/@/"(#X6'[X!2,962H<CF0\<`<P(NL%8"BO3C6!`/IBX5> 
M5JWP`RC9`4":L:$`_\`!*QQ0U;2`"P_5><FIET4?J'8'1^AR/T<5?HP/K&H] 
M':F+;]5Z.P_@%@^L9VT+*&T`<$?K`M.5%(CKX!Q<<=KPL<VEX/%A^GP`N/5] 
MRD`:!)7ADH_4%*O%=3%97K#`'X'P0">"C)7-^CRC#`5KA+*2`2IKI`63SIR4 
M)'"F'8&,"S?_=3#'[B8!]<2]?+U?O];U42"Q>6[#Z2MD)[E^T]+I2`\#3H`+ 
M719`F\'[9F)IOWB@`RZ8%AZ@*?MK5.+P?MDZZ:ML6*;AQ673]#+9Z4`[H`8` 
M#7Y5X84AR^YUHM^FP!ZL#KKF;(-8,7C@RNIC7BX;K5CX7&4E\N&&/W$!@>XZ 
M!`#X*=?D7AQ`+Y'C7S(EZK&^%,0%\ANN2N!U@'>.,#_\+K%ZP\`'9LDQ[UQA 
MN5W/&]9Q"$`X4_I!@JX%"P0.?V`8:!!S^ZX!#G]QW_C?WKF];W-CUI<S.9'K 
M"Y^L$@H4'COL##`\<>AXV^+_\B_]<WK>YL>M+F9S(\8,!_WJ`\?\AS_[=.YY 
M.Y[^[]H=WT]^3KZY_6_XV/&ES-\9'C"Y`=SSJ"AW/!W.?=O=X=^1ZY^[?&QX 
MTN9OC([GM_[Y`>('Q">(?]$7ZE_$WR=\4'*/Q2^JG]%KXN?%YZO__>L#_OX` 
M>,!P]7OJ[]?_>?!WGGWJ]\.'?R'X,[O5^5C\J7X37PD/A#%A<8&1H;>^WF0$ 
M)#14=(2DOWT'WGH/_/=IJWV_U7\/:7):4D(Z*AH2`?'AT<&QH9&!<6%).2D9 
M"/CHV,BXJ*"0@'AH9Y[4D'!@4$`^W'OAGGJ!@@+YZ5T)"@P-#Q#SVKPN,C8Z 
M/D)&2DY25_]\@(*FP.F!OX%$PP/%R`)9>@(F6$`@`LP"`Q;S:0##P0)#@<ZX 
MB`!#`(%<=#BUI0<2`08)&#H@?-UTX:*B82!NNP*9W!X*'$`H"ACKHJF?`P*" 
M;XM0L!F?`F*,P`9G=GMYM,*,2A-D()O:S%Q()%'$:6-J$`(L"D`R'BI`"L^! 
M)BZ"O"J`P@M=?9`"!X<SC%V:K/83$'&<"B9^R&8^LP6&X>%LM0(/9FD)"VH" 
M'`A`)`H:S^<4I"]IX/9!DP@*EID9G),0!H"P+9!L[&%XDS#=9P2\KEF6P:!6 
M?+*$!%'7"S(`P%ETR5#R`#8!MN`RQB`A44>#(%ASLQF4H:0,]R`'9!LD&(9` 
M\S>;>AP>!H:S4HWQE#7ECTK&"`*,`(&*!@RP[=>`N"O(19?.#T%!A1U=,T`T 
M/+1</$@P$9;@+KI`7#.5.IPJ>5FSJ8N/$174L\&YER\NP4"6VV2XD,N2W<<C 
M+UZ3;%!EH\$H3`1D_/VU&043(7E]T_CQMY9+=M0Q<!7%RQ;#B=R:(JS!!A5Q 
M8.3N!78RLB`QG,\SL%1-XF$RC$W7B'O,W6-[-QCRP\!7$2\BX!#0D'`2UG`' 
M"C@-$WF8;TGF9[K*$&>;/@.`0QQU-MA[ET""#<M8`4CAX9IN@@99\AXRSO(& 
M8@!2B;)$$8&,@Q1Y>%&&LF&4`FXD6>#S`X-9I51.]I(GHF\7'Y*AR+(0'B#S 
M:UA\%L]WBPSLFBQ]QF`D`<"O,3BYQE/9B#!EUFAP6\P]=K4,O,#75X\,@4#W 
MF'+HK=:H.ZN'74Q,8G.QX90/-H'0.$./,H>9+!H"Y?AP"N'AX?!:\4A=;SS- 
M-X:C"49%V;#;T%+&P3<)H2\I4"=QT"(JYC]=?3E3`QC(R^<J\R:%01G@]7;= 
M-`D"/*S>I\L^6!KE505U3A[IY?`&/!!!'4JVKS;9F&Z=QEF6WHWJX7"VK\>* 
MH%&'K!XZD60("".-MTFO%>GO/\]S/$WE+MZL'X\3?"U3VU2]Y4[EI@=B@\8G 
MC(V+C$R9^G)P<LJC$U+C`S;M@FU=F&SN>\UKVP'O-67.8<`AG"]+(7=$KAB< 
MS9.V8`64KH[K79#,%@$29`O-/`%Q5@H6[$-AZR_!!42=A2ZGGG.6\M1YARZ+ 
MGRC@`N$]ZS>/>"!V%#+NE,3$K?"P;BX<4@&>FQHR.*`(9>Y8/%-V9Y0Z^&7+ 
M,4*#QEX:YB7F;I9`7AB%)2VM*Z3P=W+"^2X9,"H$&`K'P48I.&T9<6Q8^$\G 
M689/DA7C#$\H038N!+F!!F\Z3Q,5;8!1-W`*SG$A5A72FCF'L89W.-7D>M&3 
MFFH#9S`@J)ECV7'20[K;#@OB>5T\:GF2!49&IL0!82S:9-.:4W1\S<#01C&% 
MN+)[7;P<F/X&6F1@ZN7N>SV*7E8"[LEV?H@S]9DAPH\WRI0P!"!1RF`>(#): 
M[M!8]&Y<:/%XB"N?A*A`^4K7HMN9B7$B[LE8!"!:&O.1AU3#8V'`H6'L6@FJ 
M=T^B8J&LY!)R-@H[N=%7>&RP[>=!X*&AC/@G92"#'4#!6YZ"EO%P^1RACE\* 
M-Z&M4ZA:=[.]4X)7'NP\>;8O"#+'JVKP&0X/>8QLY'4<VV7/2)!@ERVK.<.% 
M7):E!^4O&`UC32/0L94L^V<C*IT,/:R<7KMV7F"K.-W16\I!S-HSQ8G$%:]0 
M%9`5CT,ZIX8%4\0T9\O2M?(QFCV9&'KF=MBH[MY>^!XP->:?N%('"'7J](84 
M;=W)R&KVQGU.[K/-R!YX*8F7/6#5ZOB[K0>LNK@&#W,1+`-FLXR&=`:Y@_<, 
MCJS><$[J?,2\R=LXXT@TN?#@\!+YZ07LJO&?L5=.[N3QU`AXD,N-1GJ[:]P2 
MLEW$<*N9VP9BJZ`2QW5D[8$3>5N)B#D#G2&[Q774<[FL#'C%`ZA8/AP6U?CR 
M;NV>=1;?"Y'#1.($2<W5LDW,)(HPG>:JA#O&.=TQO694/=V?.RU+'APE3P=" 
MIY/!8['@A;Q6\0`>OJI3:/_V?#?WQ.\KOW\`#Y:W`#UG`#V7\>[/CWE\>ZWC 
MWQQ^@.`&PWL.Z?QW+FK<Y_#=$73YQH[K\#Y5O*ODF7.0);-85G-EJ>ZS+Y[* 
M_@>LSX'_)8?^@9\FPW#E6Z<K^'K=GDS[T8&`?__A\OKYRC\YJ!ZYLEY9ZU7X 
M93N'``];*0E&\]?+'%3N_-)?7,!O;.&?G/9]_*=U?!5;UP!UH'D^L&`WR'#/ 
MJ&LOL'R_S3N-*PVP``YI^FM/WVP`/9SZT#6&6#8!9#E%E#ZTOK]6!X5E1@6. 
M!8YEH49R2RYH#*YDQK,M'`[]%-@%\D.7P8!V_3$KG+?+FL^L&R^P'+?`&H^H 
M&D@`Y+Y0TUW;)`'E`RT/1@%IX;%[?\!1\MZCZR[+[+<M\,ZSX"D]Q<GN_ZS_ 
MX;)(OA(X8!/K1)^\I/?6)]$T2^+G+___ZSX]3"M6N3)+'[,5U2`43?.\=]EE 
M"'U4_\:(W\:)8_1._T=8%HZYWFQ^]./WUT<7C&FE^P)%2`Z/EHJKV*M8U,KK 
MV,Q])M=0`MS'6LS-6L0(>][#8[\O^^O8CGT46)`$UU$+\T9T`U.6GC[Y?]&= 
M2/QHP8\XT7U#$N+:C!@%NA9_O`9_@,```0,P?]2!&IVM)^9(%X`^2`Z`(F"= 
M<P=Q&[N(W=Q&XCAG%.=^(`U:R@C6?$-?WO`SX&+^HK6>F:SNT!2]9`1O2QOG 
M6?757;S$XP!DSIL$P8L;QA[<\UMD_W@)9/_H"SAW9RO(J3.6N>$BF-,==&D] 
M^4P'@'5KCY'YT4O_[C=<;KR0'EUG\Z;/BZPJHN:"UA;VYH6?=\XL02_O')O, 
M7;&]A;@I\,GPT!^E9\,1A!P$(#,MG+D.N=,"[`@I[!P$N/IQF,YZ5UJ<&@2. 
M-H$>FU.H[KW=>CCS&&GUCS.HE_$.L-QJ<&3@:`4X^&,7Q?.+UX837AH#GCX# 
MZJ]/K:-7;&-:>:J)SW7N>[58TU_\MS`R8#0)O'P?G'1229[#6.<.T.-=I)\G 
M><NPGDF*-?>]AL&B>IJ(,@H0=S+2=3+](G0&_N2BFUG4\G9]DX/Z^K!B_F0' 
M4:)D`U+TA[H(UZ1S+GEJHX7O[U\MID%)UOI/H&)]/ULD7KT8_'NV([SO>3(* 
M65+:S(%J3H/?M$Z&WZ$Z9`T>@5?J`8O4=Y8J"U)?(0E(PB2DY$>MZ;7_@"[R 
M/L0`U9&5UHLSA1P^'#"W#!`EG]&D7_`EO_X^N7$'\0?1!U$'-1,&__C2`W(" 
MT0)Q`C02`G_HYDF.(07'(>2!TSQ\#O_O6$O@1/)283,82GP2F?\:$6+@+./U 
MG+DB%@"WK%C0=?93_.%&]<+0AX![C*T`IRT3E>`-T4/3`:?[:%LKF`9]V=[Z 
ML)/*^N8%GK:.-$#3%BRI6>X)X%&\1NU;4K:=9\4U_>,^\BY9Z##3ZZTWN5#S 
M*RELO!+O01R7%XRFOMJVC@]F!9%(?&C+CG*6G!N^QR8(U?NMXDO'$QFVAQ\L 
MPHT_9A%5K\LOXT^O\YFJ8!B<SX:;F=B9@6+B`O1UD,%;Q`S/&?YQ["G\SVO6 
M7^ZSZL`F=9][S.UQRNK9Q='CI\C3UUP)51$&`[>B+!D$D#*M9T'':RR<<-:S 
MD[64SC>3`BRO&9+-T6<<@PUV_;D;?ER%=EQ<\='@MG0'UT_-UJ;X)8,F(MFF 
M$]G>H_)[V2,)91NG9>//@]Y7(8?>FF^W>R^L-QII!''S(;%YZWUZB7]4<.&7 
M]<,0?9N&+]<+XU9T1Q0=-@,T0Q0--*3_N/&M9RN/AGWW,_=`#[F>$8',_1FS 
MC@VSK-]S#F,,L8@0^XU*8KT0@SR+F;A<S>*K`%S6=\@"=K/LXRM#K>76%.NT 
M(>UVX4&(3,0F8>.C:]8?M1(!+1"!I*/O"!\B3ACME;\LQIEPS&`*EL_#;*0> 
M2\W2#PW6*V2),VX%)K=V[-HDAI>V>1([(FW7>'R4M/^Z(LB@-,E&>3]#Q"GA 
MH"#(9E2S^*?!Q[7I??Z`U>@[D:&>CF+V]"K_I9$L^\D(L#!DSH%+.9R-3VYF 
MFEX<P;(.9&?`-<=M:R`>,2V@!.%4O<KDY@_<9[F'-QIN8>7*H`/@=,Z8XU<< 
MCE_:`)R?^0!J`$J8#0+8C9B*ZP#\`&T/:@`+EGJF)5HF*%I$.O^]*61K1R<. 
MXF7V$QM`(J?^=)VF:,:4W'PL&T[,SU7OY:QL9:1KY:7OZ3WC2>[Z3W_B>\F3 
MZ_R?4I:5H(0:AS&`;8@/L"N&HN>>'F(>(N17G24>EKRU)1Y&T-^9-)E_YGY^ 
M2%S;F03B%SO]R^)_N7+*,`-,O:Y@,%%F>Y&IP]WH:"&#A:?:/,]B-3AKO@,' 
M"T^T>8[4>G"K'?I8>%H]X\PVH].&._2P\/1[Q]@J]1Z<,=^EAX:BWS[!:SPY 
M8[Y+#PQ%OWV#UGESQWR7'AB+?NMH&K//FC_@NSX9SUQSQC.Z!=G^>?`<(V/Y 
MYZW#^>>1,5%A<9&AS9_@9DJ+"\R-3@[/T-'2T]465YD:G!W?H:.EI^HL[W)H 
MU.'N_0(%;9CI2UM>;3:^K:WN+2WNNYY/EZ05=IW`%<GMSK7('=#ETHW3-=/E 
MU(W2E=N5W*7=K<85Q7W>]\/Y6O%^\8[QSO'>\>[R-O)*\E[R<O*"\N;PNO>? 
MV985][TZ7O[>A%\27QO?--5],WUY?7MALV`);"ML2VQE;!\"@H+?3P&'Y]_/ 
MNWR^]@MQOM\/"2?P"0$@X.$CK\2DY25EI>;S[@(``P=GX"./P%"''X$N/`&< 
M?`.#!0%0`D`#(`.SX4`1`!C`(#0+`!*`04&]A1!`)EL@0069\;)@`(0`#((5 
MG>"D)@($@@F^`*;/@(9X9G@`"3KHS(!#$V5R"$9GBAH3KH9L(5&SX$0C8$&P 
M0>'P`N/AG1FD"%!@+<-JQP;<&P.@!06%XQ%`01F4"`,<`N/MU^!:$$!+R_7W 
M?T@^[OC`V.#7ZWY(#0\-CPXX_G\=\1+/>+9^Q@['#KXO9^!=S(<'R@=+A[P^ 
M0^5>7L]/F&3R:&0R:!0WXGU_P`J+#`+&-WC(@SCY!I$"C`:.9\R8%'`SR^D- 
MU_`3/R0=?+=X-O5+L_9![?46_&^HEQ]Q@S'/@^)!\F9)$A+&DD@2O#Y)8YY> 
MST^6+!.AD[&LGAGX'TQ9!6+:&`@<<!@`6V\85(!0F)PCPEGE]2`^0OF\88CD 
MBB&*2HI*W7Y_Y,$`83<WE.OIZ>>2S[>7@SO\<3!A&E`D@DWFA%'W".A(J^E7 
MBN3K<4U@B"$<A^"3<EA@0@9A00QE0B/>""@=1;0@5!!BP\A887-Y!ZJ:A!A' 
M4HEQ$31]J);#L"2,)D5Y-8<"2`3$\85P2&-1[%99$%).FZ(]U'D20Q85PA2C 
MK($>D,8&4(9PR/8S-?`S\#!5\0BKW79W&_@<02K[4?AJO0^$:1DC0\"[P*_` 
MJ\"GP)_`D\"+P'_`>\!WP&_`6\!C_P4$BX!1U5G^MO!>)U_^3$)]4(F02=+, 
MIA99G0#A62N+S!59MA08LO,,6W,CZ1T!2D[,2P,(@R,Z@P0$E9<4P2/Q`2J< 
M"M'L5NI7"%:/=1[(]U'^HZG,G)X(GA">**<9#OX$R@W1UX:D\$82YOR$,C!2 
M4Z-I'Y*(\2HKAE)M&<0CV,5)94B/DV-T9Z2(='AXL/N$/C]'<>'T$CW1\H\8 
M(ZE$<:B]59\,9%R(2@1Z(<EI0(:EM\02*$N4*:'=5^+"=A$=9F@\6&.7+"=1 
M;&IN>(_5E]*)N"/CPZA$=1JJH$:<$L,!#`3$0_1$G_XX&%$)Y^&(06$1,66G 
MHI;481^C8T+?ZEDPP/@A&$$P=258L%BZV\8)1H00F)"(?_Y,(X@&"292/Q2, 
M2/15\JO-6)",.&"!2"`P5-VA!0*#(8."H@%13HNT>H=#!X9"!`>'!@1#(\5* 
MGIN\(CT"O_#B"Q`2J"S)29Q(9J3.,H@N*80K+#`S.#<U,Z1!`<(%SI'Z>'"" 
M4\1O)(2":$4$T">*.I1^EBC])U'Z2*/T@4?HVH_1A1^BB-,R+E'Z4J/TF4?I 
M$H_1U1^C2C]%E'Z(__CAX"$C&H#!#^H_%,>984INA70(?#<S1D_]!.([*!1T 
M>/HK/48!X(*1\C"D]`=2Q[DK8IU5?;@J^W-0$J86\I6?:;EDH^A]'CE+#T48 
M6'4<"3`5%<020A/$%<8P;)RF()X0F9^4G`8$A,\N4PA&")G&.P_B&`I@P&@C 
M`ND"N$9Q,U7U<#DXC)QZF#D%"TNJ'AX&\(:NN40<8IRX]3)XOS_Y##1RE182 
M,E67)88%D@SK+"YF24NW0I';HP)P:F9@7ES!%U^$Y(1D3K\(!T;&'7X+"@B' 
MB(HY?L=(")U^$C'DL=?A<8,QW7X<'1X?PR"RY/T&$=?;K\/SPZ@F=)PQ_@C. 
M4=4#H*AC=B2;FK/YS-J2,R"Z4+G7>>'Q^?'D,?';E;V-$(S7,V-\,[21G"\4 
M-9WN,UGF\X_WBOT$W-O@]WK5T,##S@N9\Y+YO-FS!#Y.3C\(3,^6WF'R9GNP 
MK</&0Y#UGZ74!9(%+*F@),.0&!J(&Q\:&WA\$`T+/;P'R`?8".7K>"+P&#($ 
MP<CLZ^')'L/V++%-X?!A&&$H\17-723.&?_!,42UB>6NM@W.3J&=P8,_F?3/ 
M5@H=,D'<#3PW2E^1NG)JZW34W_P(`!PYMQ`6W80%Q,#/S\'04$1*RL`<?H,! 
MA(>'`(<S]"[<P;K```D)"&WB9L-74,'5PD`'^"`M/RL`P&O`%]@!U`(#P$*Y 
MFN+7Y7/`%AX`.5\U+4`I6`8`/N>0/.N:30&V;RGDHU2(DZ:YW0`S;JGU0)O_ 
M8M9)S5#D5>!O'<R[JMH)>.B82#@H6(CY:AK+O*W?$&'SA54,I.TV?\F47RKV 
MPQ!=J_I)&.BY!:ENM@%,F^L\D><^F?52LA5XWD>2Q')(O[;.!7STC$0\-$R$ 
M_78.QZ"YTPQE_YE3+2=!F_9Q"5\:AO-@3;NJ&0CXZ56I+O;`C%HK_=&6G?UD 
MU$O+4N)Z&<%Q*.BP@%@X(!`.?\`\(+".\`GC(P-C'.'@L,,G#F`:)3O?`(G? 
M\!RM#WHK&]0&>OL`./6@*,J]ER^=:>8^*T/;.,0C.+@B(T.U,%0&``8W@">; 
MZT`+F^VF"D^*("("\"A?B8H@9Q\$'R`4`!,/#MQ?C0["`19S%J5H(!`2@25( 
M)#P^L[(>'7DP$@%T0"0$SCU<>%0X_*J],@+@()`&BJ(#<=-,6]#BT8]_3(WW 
M\*QQPY/_X$G2`40`ZDRX@(P)J%^H;L,2E^^4NAX]%B6R-/(D)OAIR_WKA(+* 
M7ZE<<.583``02*TD:>Z-QD]P6!@15%2^[0P=$4#>]/(@``*0$NC@"-#R\*.H 
M/T73M*J77J,,%)G_+@'J`P80#`X/@`(_B.`'#1CI\`4*"XG`H5?&=_A/@!JS 
M!PAJ06-SV8"U3764XMWS^/07:`R4(#!`7=AF6("0H+V@8&[D#MX@>SL<]O+O 
M$%-WV"NTY!87&!D:&QP='A\A(B,D)28G*"DJ*RPM+B_>H89LF)EN0S--H:FQ 
MN<&Y`<G1V>'I\?H""A(:(BHR.D)*5M>Q+3$U.WB)Z@HJ2FJ*BJK*ZPLK2VN+ 
MJ\OL#"Q,;(RLS.T;Q&D#:MXC6V-OO]M7.7O]M9N_W_;6\`00/O]M=0N1<-#Q 
M$3PNQ1!%UAD:01T?(2)!)2<I*RTO,3,U-SD[/$$_0T1X04>;])2TU/GU45)! 
M55=96UU?86-E9VA!:VUO<7)!=7=Y>WV\2_Y>%.@S4!-,&?&81!AXD^LQ;`4, 
M,;'GN9&3E9>9FD&=5GS_U4B3@_FIJ.JIH&J)J/H!JO^)JO>L?O['OOQ^KIJN 
MT?J]'NO&B_O&`__1]-,>-'[(E(_1'_]1*CC^''K2ZQ%'[K2ZT1^[0<.V->]7 
MU>U5U:3]H'[JSX>DO_ZJ^'NNBE=5\!=5]BCZW8],EDK<UM,".']E%+QT+NBZ 
MKN`ZNNB[>NJ[/Q)(>C]#K#Z7=5WM/W4=U_0/U_4?[*+Q_?_WM,!I=94M'[#] 
M*7H_8?EV<H.O'"I^KL%#=U%_=U5?^M57^O55^`F;R_J+JOQ.*_C'7Z^\UW;3 
M`UO]:U]]:@3:8*_&GU@GU4_G50&%]%A95=WH3.0U/\?KO'[OQ^[OI&WX#ZZZ 
M+K;'HU;O_[*GMAGE.M'[#]K.LCB?4U^H#7VP$%V!!=%X!+(!L/T=5=0-CZ'H 
M_5Z:_UI@2)1K2_J_*SAPRZJNKXI7]7`?"D+_V6JJKUE^_JO^KJNTMJ/+E;0. 
MEK4>7K5>7G5==X_<JG7TP%K'5;Q(9>U3&Q0':U4U?QI:]&9&,NV7U[1:/@/K 
M;HNM;JNM46]-5HKR[`JU^O^7ZP*`&N7`#,$_:;5RP`8P\M`',Y^U%R`"FNJH 
MHA2I0!D!?YFM/W41D=65H>4YKN`0X8PYAQ<V1^6RSPZ0UBU(!&YWU(#&YW6/ 
MFC<[F#?'148$WU6"5Y6`)E@"<6&LHCJP3"P.XBUB3?+5`H>V[N^#*EWK>H@7 
M-33[1"K7#>J>Q7.%M"-(=#N8^BR#0O1(5C1'08Z$W^.B`V-J.3FCDZ<ZL#JQ 
M/W`5P:R$`A&IH(W/>I&^-NY%<U6<2PI@3-,`=S?V@A;?Z(`H?Y5!/#MTB727 
M=*4KI9ND"Z1[I.NEPI%DJUXA';6\#E&Q^M"9U!\]^$@+@],XP]((/`.DU*:, 
MG_T`JI@(7_J18_,C%*H,G^\`W)_:/S\H_/N/Q@EG+X'GXI[\?C/GY%>EK'95 
ML&_(H@Z]``\XTVIFSJ?.5E"'E6A0?DC$@%9/&?A29\%+4@I88R:GAC1`/<-< 
M84TG#EL>;6P`6`+34TX+7*`7I"'SI`'WJJ&+I@`Y7K3'TT1].,?3@!]'`#Z' 
M'0F&`20"(>$X$LT4E'#V!,*`_>->&,P,#Q4O\:&Z5G*U/T@4B<I`8"!!["%2 
M(QHJ!U/]&BB?9ES&B6@XE^R#0UH*V$PL$`#H,Z!*YV$[U5'(U0`]$M\(,0JD 
M%&C7CK[P)X\<".-#2605(]"_3#*`!4ZI`!4^@M/H&Z-#QJ0=/C963]VD]?>I 
M4@,G/4@$+\*8%_L"155.6@&PCY](A&D*`IY)YJ2$_<!AL[D@@1[@L1K4^LF` 
MT.AMC`!6!.A&<@@EX3HPL<*V"=*&B=,'"=.-``'V1.I&A.J&Q.K'!.L'0'7X 
M!AEKIW"X?A$9].D^?I5GZ6Y^F,]L\_36>W.?I[/;_/U"S]1<_4C)IS-;22E] 
M.40!%)#*>$_?A,#V@J3='2B@&HU+'*$/>B[VF?83V^G\PO-^B"U)PSH4`%$, 
MQAD&@S&!1!#,Z^\.,1HQH)]#<:+:&05)_E('&@&ZCLS/IO,E2K_*\`V>=8!? 
M*0U4Q<$B3JDL@2:]!6IZ_8RZIC-CDQV,JB!\A28(B3C!<D\&LPS=5$'5,NBU 
M=#;%-4Y#&ZG*Y1Y!0!W<)B5:G[&C(EDK>PD[A02B[AI/!/W)'-`,J1:-3QJ1 
MF-8\NIF9-K;:JFM)00&X'+&I(?0=T:F%:$:FAK:):HJ!:CB4V4](`!N:"B$& 
M8JR9T`T&2PJ?].JI&$73]2EC1UA;5F"`GV^IU$5SN$LZ6O_&\P;+T@S2T\`] 
MKZIP`YTG8S/%B))6T")M!29`]3'Z`,I"I4'(W4'&4`K5*";:>CG'TR*R?#AP 
M)J#.F@XREEIC`#_T1JO&KTB4_S(QGQB0EZ2"LR)9]ZI6'&OADV36XXMG0`KF 
M/7G4YQ(MQ#O'#QW,2B?K*W7!SC4]R#@"2"`#RP`3W`!VI!FF``88*D.0N<$S 
MZ3Z(J3=@\..D!:"TY!=H1K4KU)A@)K\8!6[*;;!MX%6X+>);X(6^"0I<](8M 
MXUN:W1;B(LY`0FF*)_BHJ+,Y!(P,I_C0T-C8XG^.CH\/#Z?ZT(4_R)/\C(R1 
MGRJ?Y,G^N_$Z[2\4%*/.FO]>?BM>OBQ?P?BUTUW#+<,!)]^*6K!JH8H4G%9N 
M"2#$6Q"K1A[1V1R8;4:$9$8$6Q=/\6101,1'-O-$!#4P/7.:&`8A4MZ$(@A! 
MM<X`J""!B!`"L7SVF`F$EFEJ_BWPL`D3_@(*3^3]!TA%W+'RUY)_,<ZZ!@!; 
M1@')_)^@8<!M<#'RUY)_,<ZQ][`7]\OC[5]?#PI_<37\\G_R.W]QI/V9\2C^ 
MZ$_N1\[^ZG;^XWS'+W?SYFQR/A?W0^I_<SM_/(J,'C\]H/-_,GH]W^V94#SE 
MROM[5^#70Y#07;^YB7B_NAR_GQF/BYWRO[@?&1F/F?W2U^ATT'M'V=I9/D.C 
MYB0^0..@`\IW/";3J93XO\OR4?F4PBXL,=9$W&01F0-R\&ND\/"0\*"`L(#` 
MD-"0X*8]0H)"VZXTEWD&A@<&AX:$-<['\%AP8GFQ_,9:935NFJ$.&,Q_,;82 
M@',?S7.U_A+7$'M?P6U_)C-?R6*[XPQP_@MC^"F)OX)</IKR=.%C_2+8EG+^ 
M28)\TUI_-?[5.PGHWL?S&($AC-LMCD"G+D8@@YP=&,9I0@@EAK9KF?GM)!'Y 
MR6PP*8T0AH_86G+&8RTRF3\8WG>I-KP:S@R+BXI;U++>"/CT@DQL9&,>,4UW 
MQ,?$QT1&Q$9'Q<?%1S'K'1\9>X<";\9%1<2Q]#1$4QSQ4;$QC7G$Q29Z[X&K 
ME-=C^:R&$,9+MC^B&!%?\<[<CIQ,7KA_+UZG`;NQ_1@$8$"A&L?S2;&NG@4G 
M0Z9[IH.F$X6SA6-4C4XU]L8S@J13IQ).7)'N!8QC.`X@M&L!R;L0X`$4R>R? 
M\:R=2:L0R;S6,T]M6["Z4R747%N'`U<M=[\^F#PW/N-UB?[./ZMXF/T'`6OU 
M=RY>S76UR,;K6L\]P+G9^XVF)_@;'ZGXN;%@6OU=RY>SAUN'"XV*/'9"_GD` 
MOWE)D`.MY-[30!'"&A1!S.>P<=(Q8!2?80J(IVZ([6C*RAAN&AV1AS$>0?*8 
MK!J47^<VL+'S79@/+9.&Y!'9#73QY(/5A4LD:`ZT(T#S]!<];0?/T)S]"^/O 
M'L/T(,`^RE^$D2#"_'E@,<WSH!G-]4OUDD&@"-[8]``6/RK,JLM.OB@`1AQU 
M\M0P52`&F&1&/8,.[_%MWZ"@X6'BXZ2EYN@I*>I0JJFGHYZ7CH1Z;F-734,Z 
M-"\L*RTQ-CU%3EA1A:G)Y?\U8:%@D#GHW-#$O+2NK*ZRN,+.VNCY!Q,?*3$U 
M.3DU,2LE'14-!P#\^8I_OS_`P<-#]QV9WO[V[N9/?0V5E9FIO=7Z(DY^KM]0 
M>=L<2'-T]73S<&QGXDDOBLGY.(?G9OZW6=J;7)V>\E8;N'R'AH2!?WY]XLH" 
M"AHEJ-D97J9FYB4CXJ#?'5N29V%=6>'I:76+'IS>7Z"X8D,E7IS:V)83T8]- 
MC$M*RPN,SI"3%=B;GF%$CI:>H_`@I*"<EY..BXB%A.@.!S_X4GEU<6UH9&!< 
M6EE:7%]E;'1^B92@J[;4!D49^IK[%&MIYN/+^M:RCF9&*A,U7U^@83!*5FZ" 
MEJ:NMK*JGHIV6CXB!>G1O:VEG9VAK;G%U>7V`Y-F'X1-H+BF>LT_WOCOB(S$ 
M69G)Z>G9J5CH9\<65934$V+"0>YS3QTC*C20^2E=C;WJ$C93K=G)J6DHV),R 
M5?7OPTQ'_)#L3*4"_/1G7N])G!)_AX['JAIJJLQZJ::AG):0BH7-Y]XE8F0F 
M*&JM+W4+6F:6ML:^KH96#+XK:*6BX!V976;T(&-G;'!T>'OD8,_X+?X`!``. 
M';X`$?']P@[!U41$0?9IQ\<&"#,*$#,%/M/C3\&$GW!!6$%8(?A!^"U\!"<@ 
M0QIX4Y_#'C!`^$#N'0%X07_@RQ3@N5<8\$N_&&*FFZY>[MZI^DG^`FBMC>&[ 
MD`D5,!G4?/I#`<RQ+8@;Y-5OMJ[`LW:=4[_<MX@C9_/V@(.=@4"G48@G4G/U 
M,X^B:GG81YX4I@[X43::L(]JV"5;'QD@7@#Y(#H`B91X=T8_=&/VGV,G.RB` 
M0!I=J%EU%GQ#)%ZAY\/I/45N>FDU.[0%+W3)8U5:0RO=:/IN(=Z*<P!DS((D 
MSV(I/B2USF)8?^@')$-2#A7#JS&DAC;_.N`D`LI'UIUY)]?R-UW5O'<@<=Q_ 
M.Z5R!X.X*J+G9)9$50-<[04?:.%_>$)@.5BC0J&)I\,3^@$D%V11HOJ,?3%) 
MND%*G5P9.!H!3'PE4J,)J*FFCVP(HFM`FH^#^G?HENI^TSK\.T.Z[1A]SO\N 
MS/),U@8BZ)_9W<6M#Q:YR8T<G'D".$L[KF((5*G0@FN7Z[.O773FZS$TZTDU 
M9@-,@.E2E4E3(!J7J0V@K*O<D-ST54<^@$][=$',7.N[SZ!L^KP0FA4N]N,^ 
MTR]N2:F2)U5S"UM<3\R!:U)T9):5,G9H5@*0^9`\\UO.O&/2.D4I+G5L1`5! 
MQ@90<9,T28=$];W7(/`%WD^Q`#;.K7)&-_(:Y,?#AA58;!D1G&0^4'BKDD5) 
M6?\;IF(F#8X_4YF^36O4C--!K[*?YB%O6(\0\`!VT+U&=6!<WV?:#>X+5SJP 
M5/;\]4+C@`TQ,(Z7/<$UX$VI-ZAR*!:D#Y._$.\S[N70F&'WPU?4Z?,[J2]! 
M*=.$;.I1H?ZZ`U>H4'':GHP0WJ%?2`,$V2#6!>A%H%#'./+:[=<RFEPY@J0% 
M4'O`,8[:D&=(3\T`)@.QV)-DH6-!DV&-1DWV')!D.3IO54Q,<=(%6/U$5'8` 
M^"Q'9*.UHAL2"7D#"I=5Z\$T;KA([UYUSZURD==2^0J)4X8HM2/QHQK4XQYX 
M@H`G36BB2`/0`EBB)A`21&>E@':@`;0]Z;F*O!,)H[:M7^7[K!(!AQZ[YW/5 
M^-%NH@0UH`:9,DE*@%(>%=`3VNLEC64DC54DO=5D>[-'NK1[]L>[9'UW4H(2 
M2;1M(93`52.$GDA]@9U%RG@R1D1<F"=.`0,%S%V((),<"MC)M7MUSN3-.K&I 
M6I\#"FD-8,6`?F;")B-@/`X<KL"L!2I4ONW4(G;44P*6+6ZUR@B<MZ$$S'3> 
MN)YNX/N9C)\2_-DDY&UC3D95C*,$.U8(K"JP5&</SB1Q:\S7!*43P^_+_U?X 
M$P<+3Y19GN2C4X>[]#!PH:?:/,]A$:G#74_`+S\UR:=J/3BG?FRL+1[RCS#: 
MD^7S7R3J/3ACOTL/#46^?8*JL\.6.5I\EAX9D2@C0P+#2B<CW)"@L,#0WL/6 
M.S_GRS/7:R>-SOP).>>;@#W#Q-H97D6"%/$Y"HLB+C)I09G@/V2DY46%YD:G 
M!V?H:/*10P!-HPHH^D<"_\PYQR`G)2PF7B``)VPG!R5/)I2HY-*D(&4E]R4+ 
M)ALF,"9`)FPF#Y0*#/K\@4UO(']]_'O[_%P?E6Y-UN3]P)4`!_B>@#'Z:`%Q 
M?H.@H,TG>,X`IF-)"7?(P0I[BT(,-?F*FMJ+?[-0;K4`NOX`!0=7X35Q-$;G 
MF.!_`T$!M\MWS7`Q"#7%P)^%Q4<N9S1]&<#((!IK`0!3A`M>O"Q/AG5%V%PC 
MBGK@Q\>$`QX;(`F^(`8BQ\"2?`*0IP$"P$"!5PV*##@@C$(B0K<ODI?GZ[7R 
MGF5O51R%SE\F#7RSVB'^R1R/(*#`KA^,YL;A^(4,>@,GPG(G4G@Q^)X*0@9P 
M^08$!/_`B`#%D'%HH@(_Q%/]LS/".*(Y2PSKSW$=T1PQ'*2([!"U1Q_Y"*JV 
M5RK%$$%QZ[J]_>XS;!KGW*"5@`A*%``6,#(`'!]L#)H!A1=;`K':(@!L!%"= 
MHHBZ/L=>A:,(D$I,-VP2`#ANN#@T3>'EP+!K&6V;0BO`!!U%NB?8>J_G:'SU 
M?Z>K#_9>TXP/#\4_OZ8'F1N<;J0N3/R;895\3`.,#_C\//%S`H1)Z<$^(!!A 
MAAF#S-*&&&"4!/6(>>)X1@F(T#/#CYC'M.^PZZ.^9.>N<0RS`\30#,B&_)O= 
MX<%I5&'@?QL6$1X1'V$0>></\LZ\*+6#`>&)R@/\$H"/:<<\24>TXH\H]H=! 
ME^J\I0T=EI(J3X(=[1Z@7B@G56@/@]EA?U'Y0/&$X/\"H0F9]@(6B/9?1_^4 
M5I_5TX<<=S4O_,JLS@,(RK#+ML9_F[C]0Y.VH<P)NMD^'\\JWRTURQ@'M3&Y 
M[3PX+3PM.C#JT*9N\&TH^%^GEPC]#`/GD>9\\K<CORT9:/#_LN6\,M$D_M!$ 
M\J,JB7\GEM"D&3<PR=(`XI*4R;,MDAF87=`V5HDN*)E*0G"0D5B9.IK^.$XH 
M(`>KNK#RTAIYS,B'J_Z/A\J_YG@_@]AU,6+_6BE<+5?%C7("`8!GD57_CU$$ 
M\;0HW>"<Z\%'Q\'T?_],CY#[#H!Q^2X0G_/L=#52#L?&.N^9MU5/@XG^P8'W 
MPMX6LA>.C&1?P;%NV8J=7P<.[OH+JN,T,$F#!LDE<PU8-#1F$P!CYD>V'C'$ 
M;,>4:6`'S=W^N+^/&CPE8L)[S/?KH)7^=^<(?G_PG5&=G3SSGO<]^$X.#FK_ 
M.+,Z*@E&V_/"C"\P.S/?_D@=S&(?''^X`:'W^!P["9#L"W^N>)H6\,`%/P&A 
M@<5>6`R.0:GT+M`O0.__V?Q]SWUGGXXCG_9ABYF9B8=LSC5-C;S-N^9CMC9C 
M1.)DD!JI`7GY_0;<%M*GX7*UXO#.BQU)MP,"!N0DE0V'JLZ<CQ8B(YM.'![K 
MPAMRZ$R7+X9`?5\RF!L15TEM_O*_HD)9:^.<SG)P##*$XY<'ACN7P@BO)!`_ 
M:ZE<')R=01A!X!^71`>J`-+.^DIFU!SR8<>>4YW'X_YNGQ^6<@%3E6"_Q/0# 
M!P\?I/\?M`'^LW-G0>&3>0_I+E`8<'#EA]J;&>7,/F)^P'W6-#`^;G3)CHPZ 
M0&D[&_(#X<"08,3%@O@C%,'`@"F#P9+(`::V.(8342CC:S$^T`.-=^,^=BQ: 
M_Y\`/)BB,C(>'1A#P;,S%O<8*H&-FJ__,P\7.K8^0#QB[&^0,;\8D>WJBX&? 
MRIQ^`R49Z\!X/:U4?EG50P,$^2;E4KGL[Z1*S-L0`L968S[.5FF#N2`O9+[* 
M6ZW#\QD>7`Z89Q]<0*`D,#-#0PO(8,$93A<-5AZN&JQL#E[J^K[[`'_X`=52 
M!TJ5E2=(`4\@&T\A\\\G4\\\EYVK_:I.X6I.0!D.0`5`=,PZ_Z_2-.7(!JJI 
MN`+XL/.W1555/2JM(;W29?O7_J])'<5^G5T&7[JB0-5TP%K@)O2'4]6`X8=( 
MY=#U==2Y=&J*J50+U(H#5H6+D_?VYL"C6HT`&,"]:`N[]O4@)K^W(,*ME&1J 
M^BUAL2I+!`5+;KN+6C`5;Z#`N;]U9E\(`'\`QU*IJ_35IO0`/V/Z](5LFFNL 
M]+T1`#__0HL++]#6E0JB+S/T/Z[DUE50M,"8)?!TFET_0VE0U@I'POW\3]Y* 
M4_4:W4[CIK`>!``_1CDH9R#7Z=*0&OU*)7J8_:1P_:EI1=.ARW'"T1"=U`1! 
M3;D'3]VD:TV7SOT]ZBNSK?!$#\`#UB?ZJE`8=4\;F?5#QY1&AZ+;G28!F0'4 
M@&'[(&'[2&@":D>/*!L:AZ*.L/T#L/VK`%9P)5?JQ`7=^WS*`7.BL=:!>VP" 
M!>M`>ED=--![UT_TNG^H@*GQ.0TZY$.D\@//TYR`!?N@-#KB5ZJ_"5J<E6GC 
MLU^?)5]N%!^3"!K5:57]50/O/_]6,#^5(#]K`Y:@Z0\3Y[,R>[C)`ZI`X_>H 
M`4_C(`TWCRU7NHW*N(?9>Z5``U>%0&L!J;4AN$W//%I-I^N7HY=#EN.50F`* 
MO2@#ZEZ^Z8'^\LBNJK5I``.XA%S)B1U>AU3Y>N?J!]]=>I6LX;@#]#]+*&-7 
M]4NU=[R=;?:X@+WL"^+$"D?[4U8[;Z:;-)HESDK%[975'Z(C)+WS`GS`EA:< 
ME=9H=+OU^G7A6$C6?H-2JH#2;&3Q<P$WT`6_P;7Y;_$QJYL_F(`>AZT`Y%#D 
M.OV3I./T-4U3?R7A@:7]!S8QC]_IZ]9'KK2TJUW7SR=T]*O!K!JIVX;@/2Q, 
MR5?H6)$Q^\J@<K1XG7+?'[8CH`LTI``YZOT_YK#BCK-669R3-2XS`_?]`KA8 
MYD.1)S)U)U)TAJO)4@7><8J6+#J?$<:?7[A+4G4DG,G2',]Q``:K\=8`8`&! 
MSE?W>!\-WLM[R[Z4K7I>5X/UJ(`NA)\4I^1P_=!?Q#GA`.]4CO4U^M)[28PE 
M:EPI6M]VTS5X<H#P>+;3]>K;=OUW/1Y*56I^2P&&GK'[+2)'J]=]ZR$@"K^< 
M@?CKH'ZJI2%XN+7JWO8%\2CYS8?H;>Q@!__=^(K/:^WB`_;RX`*NFW9*>BC] 
MI8.W8"]#IN7!`IX*:XYB56NO>]#]_ZY!K4<;B@#VXFJU!ZSR-6B6]_(`NW!` 
MN"#-8_8ZKQ);]SJ2_Z.==#'Z:O+K`)K)O.,,D@G4ZJ<G2FG[5,?L@U].W6?I 
MUN6RQH(IUO.@`<]<?M,U^AT\U^B?2KJC]M(DU6HC#J\%`(JKB:H/52KB_ZYP 
M[F<3^ZR]NLP#_0J&[,YY<JAYBV=/^?NNKP*Y;ZQ]OO;`Q`_$@#56S"#NOWXE 
MCW(/[X+E3]K,`/98_"Z_N@/'%Y"K2`=3DHK^7H8A^GIS`-L,0UUF@'T4D?>Y 
M7ZK`4LN,(9J^+$!Z.[&2#G7TY=>H-6``#5K%->/JL$\T[6\+92HOI``GVX"7 
MWY^GE#_K#I)PZL'NCWL<6V'>[`_7O(!WC`:T7<[GU3<`1X@39`08X25.'U5& 
MYGRR-_P/\`,?]_Q$CM>,!#@X?`.'OX3#'X.!P.`KR@.!5A/[0#!!@@0YCY*# 
MF8!Q54I,`P&!5HW53X.,<%GR:2X='E*`LI#`,+#B`HQI@_:]-G&((X`;9O0) 
M_KR!@SO5*@(-R5CSG@^!S`_X`WZOE#?T!_P'^K8CH>G>E/FKP__L_]L__O-% 
M!YRP2X'#;B\'GARA^ZSNV\V_DO+OMUD2)$A04)2G63K(##1$CA(B1@?>/.N$ 
MA0%!:PG%X7%Q6$V28H68H&0K3`0>`@8*4)`0*"PJ;Z4ZQRB$DB4"(N+"P<%^ 
M/\`*?($=(T:RK(.`0\.1WXO&Y>6E<\*.<3_3A""$`V;:CG_+&?R$?PA`]"'H 
M"_\'S<>,B8$O88"!-W/8_9RROJ/[N_GS\W?Z;C.3(W!QL>-FWXKD=\S=^?@< 
MMPX,'`,P!>"1.45/VY)`0K=/94T&=R9ZF];^$;K7`,!GNHVL"R8BP8"P;(1, 
M6K)R<.#G7C1!0D2%&LDR0DQ6EI:JF)BQ,C`8#S,R4#P/3"X,*,2<L9H&98." 
MJ\"ST>2T^(335['^\,W-:A5!0D+`K*DI24E4B4I<=M4$[]8&$_VYO';7HR(_ 
MG"=^$0><37@__)G/>H%7_#OO*\X0]XPQ[9&W;&'7'''I$%`@8(?^`@2E%35" 
MZTI,U\`,%G_N=@&`4I.A7P?:=%8A&)6S<,`PD*Y/BGV;'63$PR#S"#_"2'^M 
MP]C-%A@"E\29)DR98]<II`3)$2)4;,NF#`QU33>";B`UD!*P:!P0^K$0^$@Z 
M-07%:O\E*0*P;0<%B/\#_/]5@/QH<8"'"0I8A3`,D"5)B9(B)$B),-WY@[Z) 
M??[YO'$(!`/7P_@8,'C9]HN)ZFG'_\+!\[/"'%?>$Z^,">2`@32EV0&L\>8X 
MP0ZF,5#/$^48)FG96GRD^,$COM:V#`6G#)Y,0X($!"/`X%.OZF'R91B_C_P) 
MDMHC1JDB1$B3B+AJM4,K`P8`X'HR?]_.`R)("'.?X`<F3D1*3(@3(L12^R<6 
M&>'_)!I(JJ9ICA!C@CQ`@Q4A1@@IC$R/&##A.6`7_CSY4AGP@'L]BA/W:F?O 
MG#Q#.`?,7J_!2&>)L+A3'##G@UNAR=Z9\NC!"Q?0`KTX(=5Q3.G,&C_@'_JN 
MS$"?$"C_IQ;]V`/VMK<=G_+`?]P$.$&/=B/I#=+_!=)])%2KA@+__?V0@$)_ 
M8>%FG!@LH%TN3^!"#]#"#\3U^KVO(3#`\7(`D$!0$3(!`!$A1/4)6@$,K3Y_ 
ML4M;@0!Z?H@*EID9G)<0"NVJ;2:H:'D#'IP:8FWG2D`0`AR60:`IV)W!R<(" 
M&B`*))Z@R>@8UT%`*,`8*("C&6<PT5$PD#(!=+\#7"&L-K-:F0I-`/!6,8%# 
M"EJB='UP71S:NT\4QP1JG,C%'5IT83,2P#0),U(0Y.#E$9'D-3T$DNTK1)-4 
M4T#6,MJ*1M6XIV!P*A4,H5IJBB;0R'BK6'R'KF!<%6Y%CE+B8&D,MR:=/T&1 
M,`5N0RR=0B*A*.J9<B+8*@F$@SACI8BR+Y.D")@K2PP;(2L.;2RTW)OB3'5> 
MF8QVHZ`X"XG(4(4?J:J5S"N*)5*E+%`$H*@HHY1+'ZU=IVB3&:!!XC'@R`F( 
M`+09!_Q.C["!-/<2-$RUR8F1?_`!A[)T?2F`%3\$G![>`Y#$K"=DM@].=H;3 
M/!00)&D2?P#2[0I_0:M=C!4G2@<8I3K]A'A"YPY`<23V09$VNAR5F+!BU2"@ 
M]$0';+E[M!R;T%Q@4>:U$K'3#<2'=L`$:E"&SC0X373*#4<\-S;\@_D5F.+\ 
M3]^Q<I<@20.I$(P,DZE*",G,`LFR"A`&6I]LNP4&)@:$`@$C""26XNRKI2VY 
MP:,B:>#$.!"KBTX<@^`AH*XE8@P)L@*'I9"@:Q8KF$LV3<;%)!073@`6WE94 
M$/7#F539(0%!@]9(`N>]\T-QD7`7D;6B=S%-U4W[6GY:!_P&Z!"#2$;K'"A& 
MP5(0M-`2PZ,QTI=@K"0D^5V(B+'Z"LOVE:),G\DFFVUQM:CE;D[P/C:"(V@Z 
M*`DN87B?8@0?TJE>'%:MVW\9/I@W%0)E_+4_F4#/VBX&0=%!5"UEGQ</('++ 
MH+0=DXQ-N+UH*8GF,0L<"Q-!J6;(%H#93Q*QHA;;@4;&YR2D[@,(*:6F6^<3 
M88OP@/$&5Y$PE+)PQCTV-"UREVS%#@]A\%6L$-`R%K022+:Z1`,9+89<I8-$ 
M`E/)(,"`6Y\RFGD7;#X)2G`1=S`,):<0P$#B=L7PP(M9,=H5!1E#M'X!=*(0 
M20&F+MHE<DII2=*Q@6;GD;^QQG.-M[37+:/6*!BC2Y8ZGAF>"D<Q4!$AED>0 
M\29+<"$G3R*$LCDP*60U"Q(/`DF10)<#L59#E(9'\:9.%AH<AV93C9."!DV@ 
M\N^PU=@NX2K8F)+A;G"V"OAPH`@8HXQS,2XF+JDJ>S#$XRI-SB,0(<XE2D.V 
M<2#P2DV4N&8V:EP=F?L81>@0Z!0X#!77!7"DN!1KE9;H"9:W'@W'6RTPRR:" 
M5[%6XDPRJ`8PSLF<R/#8:IHX3]<>6DLO7>&DMKB0#0O=<0(;#\Y,S%.IQ@W2 
M:3Z9BY*S,U."!.LI=<*8"8#V6*<Q!:%$;'%6$8@\72B#F1;=U$X<41O_@-F? 
M%X9PX//%^QY":"`<V>`>MP=T";DS"R#\!8(N$,7P>6,D6PE4G<*.LR"<N`0P 
MD#"RE,K#]BW@I@MV@H!#<TX>V^[`8X)H")A;'8V/[9W.L0Y:W)(MABC$&!4U 
M`E-[]"DFQPH%W$H67^8J$R2$I<'-BP%Q)N]5J_0&:366K%/<-<X+!-QI1&67 
MXBT-7CD`3$8I`(!`F4["PX/%Q-T-?L`#`$:&4%V@UU&/W&&)(G]"2IEP_<1; 
M(`)7N16V$[<CJ>X.S8I6ZQ<`+<WCB$S-N9T["(&YUM,K^X.H,XRZBB;,?@.$ 
M/5*HJL-8HD\T:L$0.S:Z@BN!A;7HU<-@"VRSF;KP"(!Z0P8S#T6/!=L=!P4N 
MGZ4.>&*3LNO4+I4,V/-5:+PRYSV(LN9$@K,C8HYX)KAX%):Q0"3,4S)TMMPZ 
MPK`8DT\N+7H>IZK.J+')@S&@#I9H*,Q"`SG<`U&MHTG6Y5X,*KK<QEA?9C36 
MP0LO"G`MZ4BGG)MQ>K8H/T36[O7<+&I5JJE%&$D3+4&4U6\<)SN.0Z3:G[FN 
MSF'CG8FQF@S<,X()KF'&O9*XA;L=;>-$L)4[&-.)Q(EDXPEVEFR+$Q!"D)2U 
MK+7@C5L'ZT<:PA=/UL'F\"T&7J?S+90XB3('^I\%"4J328[+-LMP)N9NB8J( 
M"0"GTEQ0W9D'-UPLNH'@C%:4EC%&.(,PK%H$,I"M8(R$B!"0N9*N=&#GA=B3 
M-*;%K&)S!SCC&H'H;%C-+Y8L$M4#>%\+@+CAS`@JS+<'M3B2+&PCNDNR:QDQ 
M(,L^2ZX`S%&0MT6))G-L&QM-20"Q0@Y:S=$YN8TSPC,TXN!,V@)Z-,2'I,@X 
MOR)A@",I[EC3>#DDVE^UQ!C!D@V!A;7`RLS>-$#%"<C.?RYT*!.6[7FDS#;R 
M:68\TA&Y;TPVN)N)E*"H3$W"-1/"4('4N0<5&4L7,6@$"2R#&:*GV1F9&Y+Q 
MXY)V17`1)Q9D2<Z4V$#]8%$(TC2%PJO1@M9AI:K77),`FZ\%5N?<@9SIV8KF 
MFU*A;":::C>=ZF,'A("R&JV.,UXKK:A!HFXQ%)@I\PE&-)ULV:>F$8'.?[\6 
M1\829`3\241299%BS9S2TU@A;QK8=\G<@;0C7_6:UF\ZE=C=7J=86WD+)].; 
M/,I\6/#FS_N,[YI0:VK[:YK88S@0ALY6M]WC(;F6%1(.=YUWAT2&].4L:;A` 
MER66\4QK;;)9SP6D_&JYUPUWA*9O+2$!0YI-9/EPMDMW0*`A5K.=\;-H5<<F 
M%MK61*35;71RMFW9;@62%':[:T'?$O%0YHO35#G?&.<X?W`&<96!VI=NCQ"@ 
M.<6=S',`L5Z'#CA["3H3IU-3#N1J3>`&Q/N-=36T[Y$KCA2H\-Y_@(Y+XZ[D 
MXM0;..HZT;B@RW@!-=3A5MG*XVP`C7$US-R<-7C-TB;;G/(NNQW>\%<=+YK\ 
MUZ$PR]1.)A#M%(M/470D#A[K0P%MSKL13C<M;8\MY0[E?([C+A&K8;<UWST& 
M9I`D-9R+F;U46FYB&L6A[</5.H(;!);D^3AN.HQK878VIY=8X@"$MNF\VHPQ 
MB8",N&RSD!,Q10R;H$W5^W0FYU9<Q`:UN;</@:ZX+GZ%(')S@J>#KE5*@S," 
MKJ\K":B$S)$'M4Y`A"Y/1QDI!0,5#@05;*EUT!F\&KURVG6D9YC9C==<EEM, 
MW)[=<#KQW*YJ(`P4.9.%QH@9J,*FQ<*D:V\:AG:*0.O1J>'>C`6TJ93Y/EFK 
MIP$;RXV\Q)FN1(QE^@1CI<X9X<G$`\X-S4U,P`49CP!@DG2Q=8,&HQD;Q;4L 
M/L1<]+YQMN)K&24#`MU=%O@#\E:/#JFMN,ZR1NN+.\N40UQHM';-?0*Q2A(( 
?HT;F)R7&)D7./!%,L0$LN>9&1T:G*_!/]O_^"7-__\N4 
` 
end 

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 

The Herd Mentality as gleaned from the musings of Bil Herd 
------------------ 

This is sort of a story that was really a bunch of posts I did on compuserve  
back in 1993. The language is coarse, the English malformed and if you're  
a certain ex-Commodore employee you might be offended by the story.  In  
that case, the story is probably about someone else and not you; my  
fallback position would be that the whole story is a fabrication and meant 
for entertainment, not to mention that worrying about things like 5 year  
stories written by burnt-out wackos is bad for your cholesterol.  As it  
turns out, this is about a sixth of the sillyness that went on, this is  
more about some of the events that centered on the 8563 chip which I was  
being asked about at the time.  Oh yeah, the content is also dated with  
a slight slant towards the melodramatic.  

12-Jan-93  19:28:05 

Coming soon to a terminal near you... the gruesome story of the chip that 
almost ruined CES (and the C128 along with it) 

EXPERIENCE the shame and horror of being a Chip Designer at Commodore during 
the Witch hunts! 

SEE the expressions on Managers faces when they realize that their Bonuses are 
at stake! 

HEAR the woeful lamenting of the programmers as they are beaten for no 
apparent reason! 

SHARE the experience of being a Hardware Engineer... stalking the halls in 
search of programmers to beat (for no apparent reason). 

LEARN how to say "THIS CHIP ONE SICK PUP" in Japanese. 

Find out just how badly busted up the 80 column chip was and how many DIRTY 
fixes were needed to make that all crucial show in Vegas on January 6. 
(Christmas, what Christmas). . 

<Said rather coyly in an attempt to elicit any positive responses> Unless of 
course no one is interested..... :) Bil 

 
14-Jan-93  15:37:59 

This is the first of many parts as this thing went round and round during our 
mad dash to make the CES show. I don't even remember what year it was. The 
8563 was a holdover from the Z8000 based C900 (the "Z" machine as we called 
it). The people who worked on it were called the "Z" people, the place they 
hung out was called the "Z" lounge and well.... you get the idea. 

The most interesting thing that came out of that group besides a disk 
controller that prompted you for what sector and cylinder you'd like to write 
to on every access, was one day they stole the furniture out of the lobby and 
made their own lounge disguising it as a VAX repair depot. We were so 
amused by this that we stopped teasing them for a week.  (But I become 
distracted....) 

Now the very very very early concept of the C128 was based on the D128, a 
6509 based creature (boo... hiss).  The engineers on the project had tacked a 
VIC chip onto the otherwise monochrome (6845 based) in an effort to add some 
color to an otherwise drab machine. No one dreamed that C64 compatibility was 
possible so no one thought along those lines. I was just coming off of 
finishing the PLUS 4 (before they added that AWFUL built in software to it) 
and even though I had done exactly what I was told to do I was not happy with 
the end result and had decided to make the next machine compatible with 
_something_ instead of yet another incompatible CBM machine. (I won't go into 
the "yes Virginia there is Compatibility" memo that I wrote that had the 
lawyers many years later still chuckling, suffice it to say I made some 
fairly brash statements regarding my opinion of product strategy) 
Consequently, I was allowed/forced to put my money where my mouth was and I 
took over the C128 project. 

I looked at the existing schematics once and then started with a new design 
based on C64ness. The manager of the chip group approached me and said they 
had a color version of the 6845 if I was interested in using it it would 
definitely be done in time having been worked on already for a year and a 
half...... And so the story begins..... (to be continued) 

 
16-Jan-93  19:06:28 

Looking back I realize that the source of a lot of the problems with the 8563 
is that it wasn't designed FOR the C128 and that the IC designers did not 
take part in the application of their chip the way the other designers did. 
The VIC and MMU designers took an active interest in how their chip was used 
and how the system design worked in relation to their chip. I overlooked 
ramifications of how the 8563 was spec'ed to work that came back to haunt me 
later. For example, it was explained to me how there was this block transfer 
feature for transferring characters for things like scrolling.  Cool.... we 
need that. Later it would turn out when this feature finally did work 
correctly that it only was good for 256 characters at a time. 256 characters 
at a time. 256 characters at a time?? I never stopped to think to ask if the 
feature was semi-useless because it could only block move 3 and 1/3 lines at 
a time. Did I mention the block move was only good for 256 characters?  Later 
a bug in this feature would almost prove a show stopper with a serious 
problem showing up in Vegas the night of setup before the CES show. But I get 
ahead of myself. It was also my understanding that this part had the same 
operating parameters as the 6845, a VERY common graphics adapter.  Not 
scrutinizing the chip for timing differences the way I normally did any new 
chip was another mistake I made.  The major timings indicated what speed 
class it was in and I didn't check them all. I blame myself as this really is 
the type of mistake an amateur makes.  I wonder if I was in a hurry that day. 
:) 

 
16-Jan-93  19:06:39 

It turns out that a major change had been made to the way the Read/Write line 
was handled. When I asked about this, VERY late in the design cycle, like in 
Production when this problem turned up, I was told "remember,, this was 
designed to work in the Z8000 machine." ???!!!! ????!!!! Shoulda seen the 
look on my face! Even though the Z8000 machine was long dead and we had been 
TRYING for 6 months to use this damn thing in the C128 I'm being told NOW 
that you didn't design it to work the way we've been using it for 6 months? 
Shoulda asked.... it was my fault, shoulda asked "is this meant to work"..... 
:/ 

 
Don't get me wrong, the designer was VERY bright, he held patents for some of 
the "cells" in the Motorola 68000. It just that chip had to work in 
conjunction with other chips and that's where some of the problems lay.  Our 
story opens as Rev 0 of the chip.... (what's that..... doesn't work.... OK,) 
Our story opens as Rev 1 of the chip makes its debut and ......(pardon me a 
moment.....) Our story opens as Rev 2 of the chip makes it debut..... <to be 
continued> 

 
19-Jan-93  20:50:41 

Forgive the sporadic nature of these additions.  Now where was I .... oh 
yeah.... It was sometime in September when we got 8563 Silicon (or so memory 
serves) good enough to stick in a system. I can't remember what all was wrong 
with the Chip but one concern we had was it occasionally (no spell checker 
tonight, bear with me) blew up.... big time.... turn over die and then smell 
bad..... But then all of the C128 prototypes did that on a semi regular basis 
as there wasn't really any custom silicon yet, just big circuit boards 
plugged in where custom chips would later go... but you can't wait for a 
system to be completed before starting software development.   I don't think 
any of the Animals really gave it a thought until when the next rev of the 
chip came out and now with less other problems the blowing up 'seemed' more 
pronounced.  Also the prototypes got more solid _almost_ every day.  (I knew 
to go check on the programmer's prototype whenever I heard the sound of cold 
spray coming out of their office.... later it turned out they usually weren't 
spraying the boards just using their "Hardware Engineer" call.  Sometimes all 
I had to do was touch the board in a mystical way and then back out slowly 
sometimes accompanied by ritual like chanting and humming.  This became know 
as the "laying of hands".  This worked every time except one, and that time 
it turned out I had stolen the power supply myself without telling them.... 
If anybody else got caught "messing with my guys" they'd get duct taped to a 
locker and then the box kicked out from under them leaving them stuck until 
they could peel themselves down, but that's another story.)  ANYWAY, when this 
problem still existed on Rev 4 (I think it was) we got concerned.  It was at 
this time that the single most scariest statement came out of the IC Design 
section in charge of the '63.  This statement amounted to "you'll always have 
some chance statistically that any read or write cycle will fail due to 
(synchronicity)". 

 
19-Jan-93  21:12:05 

Synchronicity problems occur when two devices run off of two separate clocks, 
the VIC chip hence the rest of the system, runs off of a 14.318Mhz crystal 
and the 8563 runs off of a 16Mhz Oscillator. Now picture walking towards a 
revolving door with your arms full of packages and not looking up before 
launching yourself into the doorway. You may get through unscathed if your 
timing was accidentally just right, or you may fumble through losing some 
packages (synonymous to losing Data) in the process or if things REALLY foul 
up some of the packages may make it through and you're left stranded on the 
other side of the door (synonymous to a completely blown write cycle). What I 
didn't realize that he meant was that since there's always a chance for a bad 
cycle to slip through, he didn't take even the most rudimentary protection 
against bad synchronizing. It's MY FAULT I didn't ask, "what do you mean 
fully by that statement" because I'd of found out early that there was NO 
protection. As it turns out the 8563 instead of failing every 3 years or so 
(VERY livable by Commodore standards) it failed about 3 times a second.  In 
other words if you tried to load the font all in one shot it would blow up 
every time!  The IC designers refused to believe this up until mid December 
(CES in 2-3 weeks!) because "their unit in the lab didn't do it."  Finally I 
said "show us" and they led the whole rabble (pitch forks, torches, ugly 
scene) down to the lab.  It turns out they weren't EVEN TESTING THE CURRENT 
REV of the chip, (TWO revs old), they were testing it from Basic because it 
"blew up" every time they ran it at system speeds (No %^$#%$# sherlock. 
That's what we're trying to tell you) and even then it screwed up once and 
the designer reached for the reset switch saying that something does 
occasionally go wrong.  Being one of the Animals with my reflexes highly 
tuned by Programmer Abusing I was able to snatch his arm in mid-air before he 
got to the reset switch, with blatant evidence there on the test screen. 

 
19-Jan-93  21:12:15 

One of the rabble was their boss and (I have been speaking about two 
designers interchangeably, but then they were interchangeable,) the word 
Finally came down "FIX IT".  Hollow Victory as there was only two weeks till 
we packed for the show, and there were 4 or 5 other major problems (I'll say 
more later) with the chip and NO time to do another pass. It was obvious that 
if we were going to make CES something had to give. As Josey Wales said, 
"Thats when ya gotta get Mean.... I mean downright plumb crazy Loco Mean". 
And we knew we had to.  The programmer thrashing's hit a all time high shortly after. 

 
22-Jan-93  14:17:32 

Memory flash, I just remembered when we found out there was no interrupt 
facility built in to the 8563.  I remember how patient the designer was when 
he sat me down to explain to me that you don't need an interrupt from the 
8563 indicating that an operation is complete because you can check the 
status ANY TIME merely by stopping what you're doing (over and over) and 
looking at the appropriate register, (even if this means banking in I/O) or 
better yet sit in a loop watching the register that indicates when 
an operation is done (what else could be going on in the system besides 
talking to the 8563 ???)  Our running gag became not needing a ringer on the 
phone because you can pick it up ANY TIME and check to see if someone's on 
it, or better yet, sit at your desk all day picking the phone up.   Even in 
the hottest discussions someone would suddenly stop, excuse himself, and pick 
up the nearest phone just to see if there was someone on it.  This utterly 
failed to get the point across but provided hours of amusement.  The owners 
at the local bar wondered what fixation the guys from Commodore had with the 
pay phone. 

Any ways.... To back up to the other problems that plagued the 8563.  Going 
into December a couple of things happened.  The design had been changed to 
support a "back-bias generator".  This thing is generally used to reduce 
power consumption and speed the chip up.  Well, something was not quite right 
somewhere in the design because the chip got worse.  The second thing that 
happened was that both designers took vacation.  Nothing against that from my 
point of view here 8-9 years in the future, but right then we couldn't 
understand what these people were doing working on a critical project. 

 
22-Jan-93  14:17:37 

Or maybe I was just getting to used to eating Thanksgiving Dinner out of 
aluminum foil off of a Lab Bench.  Christmas consisted of stopping at 
someone's house who lived in the area for a couple of hours on the way home 
from work.  Anyways, the chips could no longer display a solid screen.  The 
first couple of characters on each line were either missing or tearing, until 
the thing heated up, then they were just missing.  Also, the yield of chips 
that even worked this good fell to where they only got 3 or 4 working chips 
the last run.  A run is a Half-Lot at MOS and costs between $40,000 and 
$120,000 to run.  Pretty expensive couple of chips. 

The other problem takes a second to explain, but first a story.....  Back 
when TED (the Plus four) had been mutilated decimated and defecated upon, 
management decided to kick the body one last time.  "TED shall Talk" came the 
decree and the best minds in the industry were sought... We actually did have 
two of the most noted consumer speech people at the time, the guys who 
designed the "TI Speak an Spell" worked out of the Commodore Dallas office. 
They did a custom chip to interface a speech chip set to the processor. 
Operating open loop, in other words without feedback from any of the system 
design people (US) they defined the command registers.  There was a register 
that you wrote to request a transfer.  To REALLY request the transfer you 
wrote the same value a second time. We referred to this as the "do it, do it 
now" register or the "come on pretty please" request, or my favorite, "those 
#$%&@ Texans" register. ANYWAYS, the 8563 also had a problem where the 256 
'bite' transfer didn't always take place properly, leaving a character 
behind. This ended up having the effect of characters scrolling upwards 
randomly. 

 
22-Jan-93  14:17:45 

So to recap, going into December we had a chip with .001% yield, the left 
columns didn't work, anytime there was one pixel by itself you couldn't see 
it, the semi useless block transfer didn't work right, the power supply had 
to be adjusted for each chip, and it blew up before you loaded all of the 
fonts unless you took 10 seconds to load the fonts in which case it blew up 
only sometimes. Finger pointing was in High swing, (the systems guys should 
have said they wanted WORKING silicon) with one department pitted against the 
other, which was sad because the other hardworking chip designers had 
performed small miracles in getting their stuff done on time. Managers 
started getting that look rabbits get in the headlights of onrushing Mack 
trucks,  some started drinking, some reading poetry aloud and the worst were 
commonly seen doing both. Our favorite behavior was where they hid in their 
offices.  It was rumored that the potted plant in the lobby was in line for 
one of the key middle management positions. Programmer beatings had hit a new 
high only to fall off to almost nothing overnight as even this no longer 
quelled the growing tension. A sprinkler head busted and rained all over 
computer equipment stored in the hallway. Engineering gathered as a whole and 
watched on as a $100,000 worth of equipment became waterlogged, their 
expressions much like the bystanders at a grisly accident who can't tear 
their attention away from the ensuing carnage.  I can honestly say that it 
didn't seriously occur to me that we wouldn't be ready for CES, for if it 
had, I might have succumbed to the temptation to go hide in my office 
(checking the telephone).  There were just too many problems to stop and 
think what if.  Next time (hopefully) I'll try and bring all the problems and 
answers together and explain why I stopped to tell that rather out of place 
TED story. 

 
30-Jan-93  19:27:11 

No single custom chip was working completely as we went into December with 
the possible exception of the 8510 CPU.  The MMU had a problem where data was 
"bleeding through" from the upper 64K bank into the lower.  This was in part 
due to a mixup in the different revision of "layers" that are used to make 
chips. This chip essentially had one of the older layers magically appear 
bring old problems with it.  Unfortunately, this older layer had been used to 
fix newer problems so we didn't have a way to combine existing layers to fix 
ALL problems. Dave D'Orio (start telling ya some of the names of a few of the 
unsung types here) did a GREAT job of bringing most of the IC design efforts 
together.  I was sitting with Dave in a bar, we were of course discussing 
work, when he suddenly figured out what the problem was.  He had looked at 
the bad MMU chip under a microscope that day.  Later t

  
hat night, under the 
influence of a few Michelobs, his brain "developed" the picture his eyes had 
taken earlier and he realized that an earlier layer had gotten into the 
design. 

 
30-Jan-93  19:49:06 

This would not be the first time a problem would be addressed at this 
particular bar.  (The Courtyard.... If you ever saw the David Letterman where 
the guy stops the fan with his tongue, he was a bartender there). The PLA had 
a problem where my group had made a typo in specifying the hundred some terms 
that comprised the different operating parameters. Well the designer in 
charge of the PLA took this rev as an opportunity to sneak a change into the 
chip without really going public with the fact he was making a change.  When 
the change went through it caused one of the layers to shift towards one side 
and effectively shorted the input pins together. Ya should've seen the seen 
where the designer's boss was loudly proclaiming that Hardware must of 
screwed up because his engineer DIDN't make any changes (that would've been 
like admitting that something had been "broken"). You could tell by the way 
the designer's face was slowly turning red that he hadn't yet found a way of 
telling his boss that he had made a change. Talk about giving someone enough 
rope to hang themselves, we just kept paying it out yard by yard. 

 
30-Jan-93  19:53:45 

Anyways back to the 8563.  The first problem was relatively easy to fix, 
providing you didn't give a hang about your own self respect.  The 8563 
designer mentioned that the block copy seemed to work better when you wrote 
the same command twice in a row.  I made him explain this to me in public, 
mostly due to the mean streak I was starting to develop when it came to this 
particular subject.  He calmly explained that you merely wrote to this 
register and then wrote to it again.  I asked "you mean do it and do it now?" 
"Exactly", the designer exclaimed figuring he was on the home stretch  to 
understanding (Intel, at last his eyes unfurled), "kinda like a 'come on 
pretty please register' I asked with my best innocent expression, "Well sort 
of" he replied doubt creeping in to his voice, "you wouldn't be from Texas 
would you", I asked my face the definition of sincerity, (said in the voice 
of the wanna-be HBO director on the HBO made for TV commercial) "why yes.... 
yes I am" he replied. Mind you a crowd had formed by this time, that poor guy 
never understood what was so funny about being from Texas or what a 'Damm 
Texan' register was. 

 
30-Jan-93  19:53:50 

This 'fix' actually did work some what, the only problem was that no one told 
the guy (Von Ertwine) who was developing CP/M at home (consultant). Von had 
wisely chosen not to try to follow all of the current Revs of the 8563, 
instead he latched onto a somewhat working Rev4 and kept if for software 
development.  Later we would find out that Von, to make the 8563 work 
properly, was taking the little metal cup that came with his hot air popcorn 
popper (it was a buttercup to be exact) and would put an Ice cube in it and 
set it on the 8563.  He got about 1/2 hour of operation per cube. On our side 
there was talk of rigging cans of cold spray with foot switches for the CES 
show, "sparkle??? I don't <pissshhh> see any sparkle <pissshhh>". Anyways, 
no-one told Von.... but don't worry, he would find out the day before CES 
during setup in 'Vegas. 

 
    23-Oct-93  16:57:43 
Sb: C128, The Final Chapter 

Thought I'd finish what I'd started back in January of this year. I had been 
talkin 'bout how busted up the 8563, now we get to the part about how it got 
fixed... well fixed good enough... well patched good enough to give every 
possible attempt at the appearance of maybe passably working... 

One of the things that got worse instead of better was something called the 
back bias generator.  Now as much as I admired the blind ambition (as opposed 
to unmitigated gall... no one ever said it was unmitigated gall and I am not 
saying that here and now) of slipping in a major change like that right 
before a CES show, it became obvious that it needed fixed.  Now the back-bias 
generator connects to the substrate of the chip and if you've ever seen the 
ceramic versions of the 40 and 48 pin chips you would notice that the pin 1 
indicator notch is gold colored.  That is actually a contact to the 
substrate.  I have never heard of anyone ever soldering to the pin 1 
indicator notch but I had little to lose.  At this point all I did have to 
lose was a HUGE jar of bad 8563's. (One night a sign in my handwriting 
"appeared" on this jar asking "Guess how many working 8563's there are in the 
jar and win a prize."  Of course if the number you guessed was a positive 
real number you were wrong.) I soldered a wire between this tab and the 
closet ground pin.  The left column reappeared though still a little broken 
up!  The "EADY" prompt now proudly stated that the machine was "READY" and not 
really proclaiming it's desire to be known as the shortened version of 
Edward.  To  fix the remaining tearing we put 330 ohm pullups on the outputs 
and adjusted the power supply to 5.3 volts.  This is the equivalent of 
letting Tim-the-Tool-Man-Taylor soup up your blender with a chainsaw motor 
but it worked.  The side effect was that it would limit the useful life of 
the part to days instead of weeks as was the normal Commodore Quality 
Standard.  I was afraid that this fix might be deemed worthy for production. 
(said with the kind of sardonic cynical smile that makes parole officers 
really hate their jobs) 

Remember the synchronicity problem?  Remember the revolving door analogy?  We 
built a tower for the VIC chip that had something called a Phase Lock Loop on 
it which basically acted as a frequency doubler.  This took the 8.18 Mhz Dot 
Clock (I think it was 8.18 Mhz.... been too long and too many other dot clock 
frequencies since then) and doubled it.  We then ran a wire over to the 8563 
and used this new frequency in place of its own 16 Mhz clock.  Now this is 
equivalent to putting a revolving door at the other end of the room from the 
first door and synchronizing them so that they turn at the same rate.  Now if 
you get through the first door and walk at the right speed every time towards 
the second door you will probably get through.  This tower working amounted 
to a True Miracle and was accompanied by the sound of Hell Freezing over, the 
Rabbit getting the Trix, and several instances of Cats and Dogs sleeping 
together. This was the first time that making CES became a near possibility. 
We laughed, we cried, we got drunk.  So much in hurry were we that the little 
3" X 3" PCB was produced in 12 hours (a new record) and cost us about $1000 
each. 

A new problem cropped up with sparkle in multi-colored character mode when 
used for one of the C64 game modes.  Getting all too used to this type of 
crises , I try a few things including adjusting the power supply to 4.75 
volts.  Total time-to-fix, 2 minutes 18 seconds, course now the 80 column 
display was tearing again. Machines are marked as to whether they can do 40 
column mode, 80 column mode or both. We averaged 1-3 of these crises a day 
the last two weeks before CES.  Several of us suffered withdrawal symptoms if 
the pressure laxed for even a few minutes.  The contracted security guards 
accidentally started locking the door to one of the development labs during 
this time.  A hole accidentally appeared in the wall allowing you to reach 
through and unlock it.  They continued to lock it anyways even though the 
gaping hole stood silent witness to the ineffectiveness of trying to lock us 
out of our own lab during a critical design phase.  We admired this 
singleness of purpose and considered changing professions. 

We finished getting ready for CES about 2:00 in the morning of the day we 
were to leave at 6:00.  On the way to catch the couple of hours sleep I hear 
the Live version of Solsbury Hill by Peter Gabriel, the theme song of the 
C128 Animals and take this as a good omen. Several hapless Programmers are 
spared the ritual sacrifice this night... little do they know they owe their 
lives to some unknown disc jockey. 

Advertisements in the Las Vegas airport and again on a billboard enroute from 
the airport inform us that the C128 has craftily been designed to be 
expandable to 512K.  Now it had been designed to be expandable originally and 
had been respecified by management so as to not be expandable in case next 
year's computer needed the expendability as the "New" reason to buy a 
Commodore computer. That's like not putting brakes on this years model of car 
so that next year you can tote the New model as reducing those annoying 
head-on crashes. 

Upon arriving at the hotel we find that out hotel reservations have been 
canceled by someone who fits the description of an Atari employee.  Three 
things occur in rapid succession.  First I find the nearest person owning a 
credit card and briskly escort her to the desk were I rented a room for all 
available days, second, a phone call is placed to another nearby hotel 
canceling the room reservations for Jack Trameil and company, third, several 
of those C64's with built in monitors (C64DX's??? man it's been too long) are 
brought out and left laying around the hotel shift supervisors path 
accompanied by statements such as "My my, who left this nifty computer laying 
here... I'd bet they wouldn't miss it too much". 

The next day we meet up with the guy who developed CPM (Von) for the C128. 
As I mentioned earlier, someone forgot to tell him about the silly little 
ramifications of an 8563 bug.  His 'puter didn't do it as he had stopped 
upgrading 8563s on his development machine somewhere around Rev 4 and the 
problem appeared somewhere around Rev 6.  As Von didn't carry all the 
machinery to do a CPM rebuild to fix the bug in software, it looked like CPM 
might not be showable.  One third of the booth's design and advertising was 
based on showing CPM.  In TRUE Animal fashion Von sat down with a disk editor 
and found every occurrence of bad writes to the 8563 and hand patched them. 
Bear in mind that CPM is stored with the bytes backwards in sectors that are 
stored themselves in reverse order. Also bear in mind that he could neither 
increase or decrease the number of instructions, he could only exchange them 
for different ones.  Did I mention hand calculating the new checksums for the 
sectors?  All this with a Disk Editor. I was impressed. 

Everything else went pretty smooth, every supply was adjusted at the last 
moment for best performance for that particular demo.  One application has 
reverse green (black on green) and the 330 ohm pullups won't allow the 
monitor to turn off fast enough for the black characters.  I had had 
alternate pullup packs made up back in West Chester and put them in to 
service.  On the average,2 almost working 8563's would appear each day, hand 
carried by people coming to Vegas. Another crisis, no problem, this was 
getting too easy.  If a machine started to sparkle during the demo, I would 
pull out my ever present tweak tool and give a little demonstration as to the 
adjustability of the New Commodore power supplies.  People were amazed by 
Commodore supplies that worked, much less had a voltage adjustment and an 
externally accessible fuse.  I explained (and meant it) that real bad power 
supplies with inaccessible fuses were a thing of Commodore's past and that 
the New design philosophy meant increased quality and common sense. 

I'm told they removed the fuse access from production units the month after I 
left Commodore. 

The names of the people who worked on the PCB layout can be found on 
the bottom of the PCB. 

"RIP: HERD, FISH, RUBINO" 

The syntax refers to an inside joke where we supposedly gave our lives in an 
effort to get the FCC production board done in time, after being informed 
just the week before by a middle manager that all the work on the C128 must 
stop as this project has gone on far too long.  After the head of Engineering 
got back from his business trip and inquired as to why the C128 had been put 
on hold, the middle manger nimbly spoke expounding the virtues of getting 
right on the job immediately and someone else, _his_ boss perhaps, had made 
such an ill suited decision.  The bottom line was we lived in the PCB layout 
area for the next several days.  I slept there on an airmatress or was 
otherwise available 24 hours a day to answer any layout questions.  The 
computer room was so cold that the Egg Mcmuffins we bought the first day were 
still good 3 days later. 

About the Z80: 

What court ordered Commodore to install the Z80? 

It wasn't mandated by court order, it was mandated by a 23 year old 
engineer that realized that marketing had gone and said that we were 100% 
compatible.  This turned out to be a hard nut to crack as no-one knew 
what C64 compatibility meant.  Companies who designed cartridges for the 
C64 used glitches to clock their circuitry not realizing that the glitches 
were not to be depended on, etc. 

The Z80/CPM cartridge didn't work on all C64's, and no-one had really taken 
the time to figure out why.  Someone noticed that a certain brand of the 
address buffer used in the CPM cart worked better than others so someone 
concluded that it must be the timing parameters that made a difference. 
This wasn't true, it was a very subtle problem that dealt with the way the 
6502, the Z80 and the DRAM had been interlaced together.  So here we had 
a CPM cart that didn't work with all C64's and it worked even 
less reliably with the C128 even though the timing parameters in the C128 
were far better.  In my opinion you couldn't call the C128 compatible 
with the CPM cart as it only ran 20% of the time when tested overnight. 

ALSO, I worked hard to make sure the C128 had a reliable power supply.  I 
was told "no fuse'..... oops one got in there by accident... in fact it 
was easily accessible... darn it anyway.  However, with the wide 
variations in minimum and maximum power supply requirements we couldn't 
handle the CPM cart, it needed an additional .5 amp because of some 
wasteful power techniques that were used in it.  I couldn't foot the 
bill for an additional .5 amp that might only occasionally be used. 

SO, with that said, I accidentally designed the Z80 into the next rev of 
the board.  We designed the C128 in 6 months from start to finish 
INCLUDING custom silicon, these were records back then, the Z80 was added 
around the second month. 

The Z80 normally calls for a DRAM cycle whenever it needs one... it might 
go 3 clocks and then 4 and then 6 and then 5 and then 7  between dram 
accesses. Since the processor shares the bus with the VIC chip there are 
only certain time when the bus is available for a DRAM cycles.  Since the 
shortest cycle for a DRAM access for the Z80 is 3 clock cycles, you are 
sure to catch a DRAM access if you do 2 cycles (wait for vic) then 2 
cycles (wait for vic).  Whether you catch the Z80 between clock 1 and 2 or 
between 2 and 3 doesn't matter due to the special circuitry in the design. 
Otherwise if you just let the Z80 rip it crashes when it tries to grab 
DRAM while there is a video(vic) cycle going on.  And that's why it runs 
at a clock-stretched 2MHz.  The REAL bitch was the Ready circuitry when  
flipping between DMA/6502/Z80. 

The C128 design team: SYS32800,123,45,6 

Bil Herd       Original design and Hardware team leader. 
Dave Haynie    Integration, timing analysis, and all those dirty 
              jobs involving computer analysis which was something 
              totally new for CBM. 
Frank Palaia   One of three people in the world who honestly knows 
              how to make a Z80 and a 6502 live peacefully with 
              each other in a synchronous, dual video controller, 
              time sliced, DRAM based system. 
Fred Bowen     Kernal and all system like things.  Dangerous when 
              cornered.  Has been known to  brandish common sense 
              when trapped. 
Terry Ryan     Brought structure to Basic and got in trouble for 
              it. Threatened with the loss of his job if he ever 
              did anything that made as much sense again.  Has 
              been know to use cynicism in ways that violate most 
              Nuclear Ban Treaties. 
Von Ertwine    CPM.  Sacrificed his family's popcorn maker in the 
              search of a better machine. 
Dave DiOrio    VIC chip mods and IC team leader.  Ruined the theory 
              that most chip designers were from Pluto. 
Victor         MMU integration. Caused much dissention by being one 
              of the nicest guys you'd ever meet. 
Greg Berlin    1571 Disk Drive design.  Originator of Berlin-Speak. 
              I think of Greg every night.  He separated my 
              shoulder in a friendly brawl in a bar parking lot 
              and I still cant sleep on that side. 
Dave Siracusa  1571 Software.  Aka "The Butcher" 

Not to mention the 8563 designers who made this story possible. 

....... 
.... 
.. 
.                                    C=H #17 

				      -fin-