Copy Link
Add to Bookmark
Report

N64 Survival Guide

Nintendo64's profile picture
Published in 
N64 various
 · 7 Jan 2020

1. CPU overview


 
RISC R4300 series.

32 general registers, of which Nintendo has given a naming convention.

R0 = value '0'. Hard-wired.
T0-T9 = scratch registers. CPU RAM.
S0-S7 = registers saved upon function protocol. Trash at will if you know how.
A0-A3 = parameter passing to subroutines. Formal but not rigid.
RA = return address from subroutine. Not pulled from 'stack'. Change at convenience.
V0-V1 = arithmetic values, function return values.
SP = stack pointer. Informal.
AT = assembler temporary. Free use.

These are formal definitions but not strictly enforced. Except R0.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

BYTE = 8-bits
HALF-WORD = 16-bits
WORD = 32-bits
DOUBLE WORD = 64-bits

As lower-level coders, it's best to respect the alignment boundaries.

Data Multiples of *
---- --------------
BYTE 1
HALF-WORD 2
WORD 4
DWORD 8

So WORD goes to multiples of 4 (0,4,8,C).

You may get an exception otherwise. Especially with DMA transfers.

Instructions are WORD sized (32-bits). So keep data at proper addresses.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to the CPU, we get three additional coprocessors (COPs).

COP0 = memory management unit (MMU). Better known as 'virtual memory'.
COP1 = floating-point unit (FPU).
COP2 = video coprocessor (RCP).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Branch delays.

When performing branches, a 1-cycle delay is incurred.

So 'beq r0,r0,8006D234h' would also execute the instruction following it.
There is a limit on which opcodes can be placed in the delay slot.

Note that 'beq r0,r0,TARGET' is effectively 'bra TARGET'.


ex. Mario Golf
[1120:0027] 800B0130: BEQ t1[800FBBD0],r0[00000000],800B01D0h
[0000:0000] 800B0134: NOP
(delay slot = NOP)

[0c02:c0d7] 800B01B4: JAL 800B035C
[0120:2021] 800B01B8: ADDU a0[00000038],t1[800FBBD0],r0[00000000]
(delay slot = ADDU)


There is a 'likely' version (BEQL). If the branch is taken, then use
the delay. Otherwise it is skipped.


For our hobbyist purposes, it is much safer to always inefficiently waste
a 'NOP' in the slot.

If speed is needed, then optimize after the code works.

1a. Video coprocessor


 
The RCP (VDP,PPU,video controller) is mainly interesting for polygon crunching
and post-filter effects.

Formed from two components - RSP and RDP.

Texture cache and framebuffers are shared in RDRAM with the CPU.

Actual texture memory is only 4KB -- only allowed to operate on this amount
at a time.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RSP is your transform and lighting unit (TnL).
Manipulates world data and textures.

Mathematically, lots of matrices to transform from
local data -> world space -> view space (projection w/ z-perspective correction).

Transform means scaling, translating and rotating for polygons, lighting normals,
texture UVs.

Creates primitive lists of triangles and lines for the RDP to render.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RDP is the display unit.

Rasterizer, fog, environmental, color blending. Anti-aliasing effects.

Lower-level pixel handler.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The RCP has its own language, dubbed 'uCodes' (256 microcodes).

Think of modern vertex and pixel DirectX shaders - both stages combined.

R4000 coprocessor (COP2). Each uCode is a string of ASM instructions
run by the RSP. Also sets up the RDP batch renderer.

Display lists are a sequence of uCodes defined by the game. This is fed
to the RSP.


Note:
Emulator authors choose to translate uCodes into higher-level languages.

The programmers can define their own vertex / texture formats. Lighting
methods. Overdraw detection and other flexible wizardry.

The microcodes are uploaded to the RCP at run-time. So each game has its
own library of drawing functions (some are like DSP1 -> DSP1A -> DSP1B
and others are akin to DSP2,DSP3).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To add complexity, there's several texture variations (linear mode).

Games use custom formats. Furthermore, linear bitmaps can be any size
(320x8, 48x13). 4/8/16/32-bpp is the norm.

The microcode libraries tend to define several 'accepted' formats.


This is a sample list from video plugin authors.

16-bit RGBA = 5551. Red,green,blue,alpha (transparency).
32-bit RGBA = 8888.

4-bit IA = 31. Grayscale intensity (luminosity,brightness) + alpha.
8-bit IA = 44.
16-bit IA = 88.

4-bit I = 40. Grayscale only.
8-bit I = 80. Grayscale.
16-bit I = (16)(0).

4-bit CI = 40. Palette lookup --> 16-bit RGBA or 16-bit IA.
8-bit CI = 80. Palette lookup --> 16-bit RGBA or 16-bit IA.

YUV = some other color format. Output is RGB (888) + Full alpha.

1b. Memory mapping


 
1b. Memory mapping


CPU Main Memory (RDRAM) (2) CDROM (8)
| | |
------------------------------------------
| |
Registers (1) Cartridge (ROM) (4)


Naturally, we'd like to keep our data in registers (1 clock cycle).

Having the CPU execute from CDROM would be slow (8 cycles).
Even cartridge memory is slower.

Most likely, the game will copy the important code and data to
RDRAM to decrease load times. And improve execution.

This is done via Direct-Memory Access (DMA), speedy transfer.

Note that you may also find self-modifying code since we're running
from RAM.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Because the code is run off of memory, it is compiled from RAM
and not ROM addresses.

Think of 8-bit NES/SMS/GB page offsets.


ex.
[0361:d824] 8002A14C: AND k1[0000FF03],k1[0000FF03],at[FFFF00FF]
[0369:d825] 8002A150: OR k1[00000003],k1[00000003],t1[0000FF00]
[3c09:a430] 8002A154: LUI t1[0000FF00],FFFFA430h

is at ROM $554C (Fushigi no Dungeon 2).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our memory map goes from $0000:0000-FFFF:FFFF.

COP0 can touch $2000:0000 upwards. Meaning it can change the physical
addresses pointed at 24-bit offsets (16 MB). Or 8MB down to 4KB pages.

Generally,

- $0000:0000 = ROM.
- $1000:0000 = ROM.

- $8000:0000 = RDRAM. Code.
- $A000:0000 = RDRAM. Data.

- $A400:0000 = PI,SI. DMA registers.

- $B000:0000 = ROM (DMA, LD).

You'll see this as 'Translation Look-aside buffer' (TLB).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We can DMA from:
= RDRAM <--> Cartridge (ROM,SRAM,FlashRAM,..)
= RDRAM <--> RCP


$A460:0000 = DMA destination address
$A460:0004 = DMA source address
$A460:0008 = DMA from RAM to cartridge
$A460:000C = DMA from cartridge to RAM

These two addresses accept the DMA copy length (minus 1) -- BPL loop.
Then starts the transfer.

Important to note that memory transfers are done in 32-bit blocks.

And since we prefer the big-endian format (ABCD), the RAM will appear
as little-endian (DCBA).

2. Practice - Zelda 64


 
Make sure the ROM is in Z64 format (big-endian).
Let's start with the known bitmap 'Attack'.

Using a plugin that dumps textures, we get a 48x16 bitmap. 4-bpp IA linear.

Address = 20000000, Offset = 00215230
Size = 3072 bits = 384 bytes ($180).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We will follow this from TMEM (RCP texture memory) -> RAM -> ROM.

Dump the RAM and check $215230. Don't forget that the bytes will appear
'swapped'. Endian-order.

We see 'Attack' and 'Return'. And 'Attack' above these ones.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You'll need a debugger for this approach.
Place a write hook at RAM $215230.

8000488c: PI Copy CART to RDRAM 384b ($180) from B08C4DA0 to 80215230

Now known that it lives at ~$8C4DA0 ROM.


Set your tile editor to 48 pixels width to see the other text.

- Tile Molester says 2-dimensional, block size = 6
+ 2D means pixels are stored as scanlines (row1, row2, .., rowN)
1D means tile-based (8x8 tile 1, 8x8 tile 2, ..)

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[0c00:0ed8] 80004830: JAL 80003B60
[afa5:0024] 80004834: SW a1[00000000],0024h(sp[80009B68])
[8fa5:0024] 80004838: LW a1[00000000],0024h(sp[80009B68])
[3c0a:a460] 8000483C: LUI t2[00000001],FFFFA460h

; Write DMA destination address

[ad42:0000] 80004840: SW v0[001B9CE0],0000h(t2[A4600000])

; ---------------------------------------------------------------------

[8fac:0028] 80004844: LW t4[00000001],0028h(sp[80009B68])
[8e0b:000c] 80004848: LW t3[80006330],000Ch(s0[80009EA0])
[3c01:1fff] 8000484C: LUI at[1FFFFFFF],1FFFh
[3421:ffff] 80004850: ORI at[1FFF0000],at[1FFF0000],FFFFh
[016c:6825] 80004854: OR t5[80127618],t3[B0000000],t4[003B62F0]
[01a1:7024] 80004858: AND t6[801B9CE0],t5[B03B62F0],at[1FFFFFFF]

; T7 = $A460(0000)

[3c0f:a460] 8000485C: LUI t7[00000001],FFFFA460h

; Write DMA source ($3B62F0 - $300*5 = $3B53F0)

[10a0:0006] 80004860: BEQ a1[00000000],r0[00000000],8000487Ch
[adee:0004] 80004864: SW t6[103B62F0],0004h(t7[A4600000])

(..)

; ---------------------------------------------------------------------

; T8 = STACK[ $9B68 + $30 ] = $300

[8fb8:0030] 8000487C: LW t8[00000000],0030h(sp[80009B68])

; T0 = $A460(0000)

[3c08:a460] 80004880: LUI t0[00000000],FFFFA460h

; T9 = T8-1

[2719:ffff] 80004884: ADDIU t9[00000000],t8[00000300],FFFFFFFFh

; Start DMA, jump to $48a8

[1000:0007] 80004888: BEQ r0[00000000],r0[00000000],800048A8h
[ad19:000c] 8000488C: SW t9[000002FF],000Ch(t0[A4600000])

2a. Practice - Shiren 2


 
This time, we talk to a villager. VWF tiles.

Address = 20000000, Offset = 0023ec80
320x8, 4-bit I. $500 bytes.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Three lines of dialogue at RAM location. The terms are backwards. >.<

Write RAM, addr = 2023EF18, PC=80082F90
Read RAM, addr = 8023EF18, PC=80082F9C

Write RAM, addr = 2023EE78, PC=80082FC0
Read RAM, addr = 8023EE78, PC=80082FDC

Write RAM, addr = 2023EF1C, PC=80082F84
Read RAM, addr = 8023EF1C, PC=80082F9C

Write RAM, addr = 2023EF23, PC=80082FD0
Read RAM, addr = 8023EF23, PC=80082FDC

VWF code appears below.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

More interestingly, the font should likely be in ROM uncompressed.

80029c44: PI Copy CART to RDRAM 136b ($88) from B016D2D6 to 80165980
80029c44: PI Copy CART to RDRAM 136b ($88) from B016B08C to 80165980
80029c44: PI Copy CART to RDRAM 136b ($88) from B0157F10 to 80165980
80029c44: PI Copy CART to RDRAM 136b ($88) from B015814A to 80165980

Looks like some header bytes - shows up as pixels.

On really close examination, we see shadow pixels in-game.
So the font uses lots of different colors.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

; VWF: Importantly, data is 4-bit linear storage (!)
; So (1234)(1234) equals two pixels. This isn't planar math.

[1aa0:0013] 80083194: BLEZ s5[0000000E],r0[00000000],800831E4h
[0000:802d] 80083198: DADDU s0[00000000],r0[00000000],r0[00000000]

[0014:1400] 8008319C: SLL v0[00000070],s4[00000030],0010h
[0002:a403] 800831A0: SRA s4[00000030],v0[00300000],0010h
[0013:1400] 800831A4: SLL v0[00300000],s3[0000000C],0010h

; ---------------------------------------------------------------------

[0002:1403] 800831A8: SRA v0[000C0000],v0[000C0000],0010h
[0440:0006] 800831AC: BLTZ v0[0000000C],800831C8h
[2652:fff8] 800831B0: ADDIU s2[801659F2],s2[801659F2],FFFFFFF8h

[0054:102a] 800831B4: SLT v0[0000000C],v0[0000000C],s4[00000030]
[1040:0003] 800831B8: BEQ v0[00000001],r0[00000000],800831C8h
[0220:202d] 800831BC: DADDU a0[0000000C],s1[0000E128],r0[00000000]

[0c02:0bcf] 800831C0: JAL 80082F3C
[0240:282d] 800831C4: DADDU a1[00000006],s2[801659EA],r0[00000000]

[2631:fec0] 800831C8: ADDIU s1[0000E128],s1[0000E128],FFFFFEC0h
[0620:0005] 800831CC: BLTZ s1[0000DFE8],800831E4h
[2673:ffff] 800831D0: ADDIU s3[0000000C],s3[0000000C],FFFFFFFFh

; One pixel row done

[2610:0001] 800831D4: ADDIU s0[00000000],s0[00000000],0001h

; Continue looping if rows not done

[0215:102a] 800831D8: SLT v0[00000000],s0[00000001],s5[0000000E]

[1440:fff2] 800831DC: BNE v0[00000001],r0[00000000],800831A8h
[0013:1400] 800831E0: SLL v0,s3,0010h

; =====================================================================
; =====================================================================

; Clear horizontal lcv

[0000:402d] 80082F3C: DADDU t0[0000FF01],r0[00000000],r0[00000000]

[0004:1fc2] 80082F40: SRL v1[0000E128],a0[0000E128],001Fh
[3c02:8014] 80082F44: LUI v0[00000001],FFFF8014h
[8c42:e8e4] 80082F48: LW v0[80140000],FFFFE8E4h(v0[80140000])
[0083:1821] 80082F4C: ADDU v1[00000000],a0[0000E128],v1[00000000]
[0002:1080] 80082F50: SLL v0[00000000],v0[00000000],0002h
[3c01:801b] 80082F54: LUI at[80000000],FFFF801Bh
[0022:0821] 80082F58: ADDU at[801B0000],at[801B0000],v0[00000000]

[8c22:9050] 80082F5C: LW v0[00000000],FFFF9050h(at[801B0000])
[0003:1843] 80082F60: SRA v1[0000E128],v1[0000E128],0001h

; Src tile address (VWF cache)

[0043:3021] 80082F64: ADDU a2[0000000E],v0[80236A80],v1[00007094]

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

; Load next pair of pixels <-- a1

[90a2:0000] 80082F68: LBU v0[80236A80],0000h(a1[801659EA])

; Retrieve pixel #1 --> v1 (upper 4)
; Retrieve pixel #2 --> a3 (lower 4) [see delay slot]

[0002:1902] 80082F6C: SRL v1[00007094],v0[00000000],0004h

; ---------------------------------------------------------------------

; If blank pixel #1, skip VWF code

[1060:000b] 80082F70: BEQ v1[00000000],r0[00000000],80082FA0h
[3047:000f] 80082F74: ANDI a3[00000000],v0[00000000],000Fh

; a0 is even --> store to first 4-bits of VWF 'tile' (byte)
; a0 is odd --> store to last 4-bits of VWF 'tile' (byte)

[3082:0001] 80082F78: ANDI v0[000000F0],a0[0000E12A],0001h
[5040:0004] 80082F7C: BEQL v0[00000000],r0[00000000],80082F90h
[0003:1900] 80082F80: SLL v1[0000000F],v1[0000000F],0004h

; Load upper 4-bits of VWF cache

[90c2:0000] 80082F84: LBU v0[00000001],0000h(a2[8023DB1A])
[0802:0be6] 80082F88: J 80082F98
[3042:00f0] 80082F8C: ANDI v0[00000000],v0[00000000],00F0h

; Load lower 4-bits of VWF cache

[90c2:0000] 80082F90: LBU v0[00000000],0000h(a2[8023DB15])
[3042:000f] 80082F94: ANDI v0[00000000],v0[00000000],000Fh

; Add old pixel and store new

[0043:1025] 80082F98: OR v0[00000000],v0[00000000],v1[000000F0]
[a0c2:0000] 80082F9C: SB v0[000000F0],0000h(a2[8023DB15])

; Bump VWF cursor position (even/odd alignment)

[2484:0001] 80082FA0: ADDIU a0[0000E128],a0[0000E128],0001h

; Bump VWF dst ptr

[3082:0001] 80082FA4: ANDI v0[00000000],a0[0000E129],0001h
[5040:0001] 80082FA8: BEQL v0[00000001],r0[00000000],80082FB0h
[24c6:0001] 80082FAC: ADDIU a2[8023DB1A],a2[8023DB1A],0001h

; --------------------------------------------------------------------

; If blank pixel #2, skip VWF code (and bump VWF cursor)

[50e0:000c] 80082FB0: BEQL a3[00000000],r0[00000000],80082FE4h
[2484:0001] 80082FB4: ADDIU a0[0000E129],a0[0000E129],0001h

; a0 is even --> store to first 4-bits of VWF 'tile' (byte)
; a0 is odd --> store to last 4-bits of VWF 'tile' (byte)

[1040:0005] 80082FB8: BEQ v0[00000001],r0[00000000],80082FD0h
[0007:1900] 80082FBC: SLL v1[000000F0],a3[0000000F],0004h

; Load upper 4-bits of VWF cache

[90c2:0000] 80082FC0: LBU v0[00000001],0000h(a2[8023DB18])
[3042:00f0] 80082FC4: ANDI v0[000000F0],v0[000000F0],00F0h

; Add old pixel and store new

[0802:0bf7] 80082FC8: J 80082FDC
[00e2:1025] 80082FCC: OR v0[000000F0],a3[0000000F],v0[000000F0]


; Load lower 4-bits of VWF cache

[90c2:0000] 80082FD0: LBU v0[00000000],0000h(a2[8023DB1B])
[3042:000f] 80082FD4: ANDI v0[00000000],v0[00000000],000Fh

; Add old pixel

[0043:1025] 80082FD8: OR v0[00000000],v0[00000000],v1[00000070]

; Store new result

[a0c2:0000] 80082FDC: SB v0[000000FF],0000h(a2[8023DB18])

; Bump VWF cursor position (even/odd alignment)

[2484:0001] 80082FE0: ADDIU a0[0000E131],a0[0000E131],0001h

; Bump VWF dst ptr

[3082:0001] 80082FE4: ANDI v0[00000001],a0[0000E12A],0001h
[5040:0001] 80082FE8: BEQL v0[00000000],r0[00000000],80082FF0h
[24c6:0001] 80082FEC: ADDIU a2[8023DB14],a2[8023DB14],0001h

; --------------------------------------------------------------------

; Two more pixels done --> t0 + 2

[2508:0002] 80082FF0: ADDIU t0[00000000],t0[00000000],0002h

; Loop through 16 pixels in row (check t0)
; Bump src ptr --> a1 + 1

[2902:0010] 80082FF4: SLTI v0[00000000],t0[00000002],0010h
[1440:ffdb] 80082FF8: BNE v0[00000001],r0[00000000],80082F68h
[24a5:0001] 80082FFC: ADDIU a1[801659EA],a1[801659EA],0001h

; Exit

[03e0:0008] 80083000: JR ra[800831C8]
[0000:0000] 80083004: NOP

2b. Practice - Sin and Punishment


 
Bitmap for '3D/Z/B'

Width = 128, Height = 32 ($800 bytes)
format = CI, size = 4bit
Address = 20000000, Offset = 0028fb70.

note: Actually our trace found a different one. But nonetheless.

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This example lives @ ROM $A49AD0 (search for '00 00 0e 48' in that order).

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[0c01:1748] 8003A5B4: JAL 80045D20
[0000:0000] 8003A5B8: NOP

; Load length of output data

[8e02:0004] 8003A2E4: LW v0[00000000],0004h(s0[800E4780])
[0222:8823] 8003A2E8: SUBU s1[802A0370],s1[802A0370],v0[00022F00]

; Src / Dst

[0200:2021] 8003A2EC: ADDU a0[00000001],s0[800E4780],r0[00000000]
[0220:2821] 8003A2F0: ADDU a1[00000000],s1[8027D470],r0[00000000]

; Run LZ decoder

[0c01:5b74] 8003A2F4: JAL 80056DD0
[0000:0000] 8003A2F8: NOP

; ===========================================================
; ===========================================================

[0270:9821] 8003A5BC: ADDU s3[00A49AD0],s3[00A49AD0],s0[00001000]
[0250:9021] 8003A5C0: ADDU s2[800E4780],s2[800E4780],s0[00001000]
[0230:8823] 8003A5C4: SUBU s1[0000B0F0],s1[0000B0F0],s0[00001000]
[0220:8021] 8003A57C: ADDU s0[00001000],s1[000000F0],r0[00000000]
[8fbf:0030] 8003A5D0: LW ra[8003A5BC],0030h(sp[800676D0])
[8fb3:002c] 8003A5D4: LW s3[00A54BC0],002Ch(sp[800676D0])
[8fb2:0028] 8003A5D8: LW s2[800EF870],0028h(sp[800676D0])
[8fb1:0024] 8003A5DC: LW s1[00000000],0024h(sp[800676D0])
[8fb0:0020] 8003A5E0: LW s0[000000F0],0020h(sp[800676D0])
[27bd:0070] 8003A5E4: ADDIU sp[800676D0],sp[800676D0],0070h
[03e0:0008] 8003A5E8: JR ra[8003A2E4]
[0000:0000] 8003A5EC: NOP

; ===========================================================
; ===========================================================

; a0 = src ptr
; a1 = dst ptr
; a2 = barrel lcv
; a3 = LZ ptr
;
; t8 = stop ptr
; t9 = run count

; Cache bytes

[8c98:0004] 80056DD0: LW t8[00000000],0004h(a0[800E4780])
[8c87:0008] 80056DD4: LW a3[000000E7],0008h(a0[800E4780])
[8c99:000c] 80056DD8: LW t9[00000000],000Ch(a0[800E4780])

; a2 <-- 0

[0000:3021] 80056DDC: ADDU a2[00000001],r0[00000000],r0[00000000]

; stop address

[0305:c020] 80056DE0: ADD t8[00022F00],t8[00022F00],a1[8027D470]

; Offset -> Absolute address

[00e4:3820] 80056DE4: ADD a3[00000E48],a3[00000E48],a0[800E4780]
[0324:c820] 80056DE8: ADD t9[000068A8],t9[000068A8],a0[800E4780]

; 16 source bytes read --> a0 + 16

[2084:0010] 80056DEC: ADDI a0[800E4780],a0[800E4780],0010h

; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

; Check if barrel empty (a2 == 0)

[14c0:0004] 80056DF0: BNE a2[00000000],r0[00000000],80056E04h
[0000:0000] 80056DF4: NOP

; Load 32-bit pattern

[8c88:0000] 80056DF8: LW t0[2000FF01],0000h(a0[800E4790])

; Set lcv (32 bits), bump src ptr

[2406:0020] 80056DFC: ADDIU a2[00000000],r0[00000000],0020h
[2084:0004] 80056E00: ADDI a0[800E4790],a0[800E4790],0004h

; -------------------------------------------------------------

; 0 = LZ, 1 = raw

[0100:482a] 80056E04: SLT t1[2000FF00],t0[FFFFFFFF],r0[00000000]
[1120:0006] 80056E08: BEQ t1[00000001],r0[00000000],80056E24h
[0000:0000] 80056E0C: NOP

; Copy raw byte and update src ptr

[832a:0000] 80056E10: LB t2[80060000],0000h(t9[800EB028])
[2339:0001] 80056E14: ADDI t9[800EB028],t9[800EB028],0001h
[a0aa:0000] 80056E18: SB t2[00000000],0000h(a1[8027D470])

; J 80056E58
; - bump dst ptr

[1000:000e] 80056E1C: BEQ r0[00000000],r0[00000000],80056E58h
[20a5:0001] 80056E20: ADDI a1[8027D470],a1[8027D470],0001h

; -------------------------------------------------------------

; read LZ mask

[94ea:0000] 80056E24: LHU t2[00000017],0000h(a3[800E55C8])

; bump LZ ptr

[20e7:0002] 80056E28: ADDI a3[800E55C8],a3[800E55C8],0002h

; grab window ptr

[000a:5b02] 80056E2C: SRL t3[0000FF00],t2[00003021],000Ch
[314a:0fff] 80056E30: ANDI t2[00003021],t2[00003021],0FFFh

; run == 0 --> extended run
; - setup window ptr

[1160:000d] 80056E34: BEQ t3[00000003],r0[00000000],80056E6Ch
[00aa:4822] 80056E38: SUB t1[00000000],a1[8027D492],t2[00000021]

; run + 2

[216b:0002] 80056E3C: ADDI t3[00000003],t3[00000003],0002h

; -------------------------------------------------------------

; copy LZ byte

[812a:ffff] 80056E40: LB t2[00000021],FFFFFFFFh(t1[8027D471])

; bump run count, window ptr

[216b:ffff] 80056E44: ADDI t3[00000005],t3[00000005],FFFFFFFFh
[2129:0001] 80056E48: ADDI t1[8027D471],t1[8027D471],0001h

; write LZ byte

[a0aa:0000] 80056E4C: SB t2[00000000],0000h(a1[8027D492])

; continue copy, bump dst ptr

[1560:fffb] 80056E50: BNE t3[00000004],r0[00000000],80056E40h
[20a5:0001] 80056E54: ADDI a1[8027D492],a1[8027D492],0001h

; -------------------------------------------------------------

; Check next pattern bit

[0008:4040] 80056E58: SLL t0[FFFFFFFF],t0[FFFFFFFF],0001h

; If dst ptr == exit ptr, return
; - bump lcv

[14b8:ffe4] 80056E5C: BNE a1[8027D471],t8[802A0370],80056DF0h
[20c6:ffff] 80056E60: ADDI a2[00000020],a2[00000020],FFFFFFFFh

; Exit

[03e0:0008] 80056E64: JR ra[8003A2FC]
[0000:0000] 80056E68: NOP

; -------------------------------------------------------------

; load run count, bump run ptr

[932b:0000] 80056E6C: LBU t3[00000000],0000h(t9[800EB8DC])
[2339:0001] 80056E70: ADDI t9[800EB8DC],t9[800EB8DC],0001h

; run count + 18

[1000:fff2] 80056E74: BEQ r0[00000000],r0[00000000],80056E40h
[216b:0012] 80056E78: ADDI t3[00000000],t3[00000000],0012h

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT