Copy Link
Add to Bookmark
Report

N64d (v0.2) - Technical info on the N64 processor (now 100% complete)

Nintendo64's profile picture
Published in 
N64 various
 · 25 Jan 2020

Technical info on the R4300 processor
Compiled by LeatherWing of Denary Notation
Sources: http://www.sgi.com (+ various others)

Q: What am I reading?

A: The following outlines the architecture of the Nintendo64 MIPS R4300-class processor.
This documentation is pretty much complete with the exception of schematic diagrams. If anyone is willing to provide us with graphics, either in bmp, gif or (if at all possible) ASCII, then E-mail LeatherWing (Jim Lambert) at the address found in the Readme.txt file.

Q: Ok! To start off with: What Data Format is used by the R4300?

A: The format defines 64-bit double-word, 32-bit word, 16-bit h/word and an 8-bit byte. The byte ordering can either be configured in Big-endian (where most significant byte is at lowest address) or Little-endian format (where most significant byte is at highest address).

Q: Can you give me any register information?

A: Yeah! The R4300 CPU constitutes sixty-four 64-bit wide registers:-

Thirty-two of them are reserved for integer operations and are referred to as General Purpose Registers (or GPRs).
The other thirty-two are reserved for floating point operations and, unsurprisingly, are referred to as Floating Point General Purpose Register (or FGRs).

The width of these registers depends on the operational mode. these modes of operation are either 32-bit or 64-bit.
Quite simply, 32-bit mode = 32 bits wide, 64-bit mode = 64 bits wide.

The R4300 also contains six special registers. These are:
The program counter (PC) - contains the address of the current instruction.
Multiply/divide result Hi and Lo - stores the result of integer multiply operations and the quotient and remainder of integer divide operations.
Load/Link (LL) bit - dedicated bit for load-link and store-conditional instructions used to perform SYNC operations.
Floating point implementation and control registers FCR0 and FCR31 - these provide the implementation/revision information and also the control/status of the floating point coprocessor (CP1)

Q: Ok. Is the processor stand-alone? Does it use co-processors?
A: The R4300 operates with up to three coprocessors. These are:

Coprocessor zero (CP0) which is utilised as an integral part of the main CPU and supports the virtual memory system and exception handling.
Coprocessor one (CP1) which is reserved for the floating point unit.
Coprocessor two (CP2) Which er... does, erm... We have no information on this right now :(

The CP0 registers can be considered as two separate functional groups:
Group one comprises registers supporting TLB operations.
Group two constitutes registers reflecting processor status.

All of the CP0 registers are software readable. Software can write all registers except Random, BadVAddr, PrId, and MskId (see CP0reg.doc)

Q: What about instruction formats for this processor?

A: All R4300 instructions are 32 bits (single word) long aligned on word boundaries, simplifying the decoding of instructions. More complicated operations and addressing
modes are synthesized by the compiler.

There are three instruction formats:
Immediate (referred to as I-type),
Jump (referred to as J-Type),
Register (referred to as R-Type).

A description of typical instructions and how they work with the given formats:-


Load and Store instructions

These move data between memory and general registers. They are all classed as I-type instructions. The only addressing mode that these instructions directly support is base register plus 16-bit signed extended immediate offset.

Load and store instruction opcodes determine the access type which indicates the size of the data item to be loaded or stored. The address field specifies the lowest byte of the address location being accessed. This is regardless of byte-numbering order or access type.

Computational instructions

These perform arithmetic, logical, shift, multiply and divide operations on values contained in registers. They occur in both R-Type format (where both operands are registers) and I-Type format (where one operand is a 16-bit immediate).
When operating in 64-bit mode, 32-bit operands must be correctly sign extended.

Jump and branch instructions

These change the control flow of a program. Occuring in both R-Type and I-Type format, all jump and branch instructions occur with a one instruction delay. In other words, whilst the target instruction is being fetched, the instruction immediately following the jump/branch is executed.

Coprocessor instructions

These perform operations in the respective coprocessors. Coprocessor loads and stores are I-type, and coprocessor computational instructions have formats dependant on the co-processor.

Special instructions

These instructions allow the software to initiate traps. They are always in the R-Type format.

Exception instructions

These offer a trapping mechanism to assist in software debug. We currently in the dark concerning the instruction format - we reckon R-type, but they could easily be J-type.

Q: Tell me about the pipeline of execution!

A: That's not a question! That's a statement! Anyway, the processor utilises a five-stage execution pipeline. Each pipeline stage takes 1 pclock to execute - The pclock frequency being either 1, 1.5, 2, or 3 times the MasterClock frequency (this is dependant on the state of DivMode 1:0 signals).
The execution of each instruction therefore has a minimum latency of five pcycles. Once the pipeline has been filled, five instructions are always being executed simultaneously. When the pipeline is not stalled, the processor has a throughflow of one instruction for every pcycle. The pipeline is in-order issue, in-order execution, and in-order completion, ie; the same order as in the instruction stream.

The five stages of the R4300 pipeline are:
IC (Instruction Cache)
RF (Instruction Decode and Register File Read
EX (Execution)
DC (Data Cache Read)
WB (Write Back to Register File or Data Cache)

Q: What happens at each of these stages?

A: Read on and find out! The following occurs at each stage:

  • IC - An instruction address is presented by the address unit and the instruction cache fetch begins. The instruction micro-TLB starts the virtual-to-physical address translation.

  • RF - The instruction becomes available and the instruction decoder decodes the instruction and checks for interlock conditions. The instruction cache tag is compared against the page frame number (from the micro-TLB) and any required operands are read from the register file. The result from the EX or DC stages is then bypassed to the next EX stage (if required). The next instruction address is then generated by the address unit.

  • EX - The activity here is determined by the instruction class. For load and store class instructions, the ALU generates the data virtual address. For ALU class instructions, the ALU performs an arithmetic or logical operation. For branch instructions, the ALU decides whether the branch condition is true.

  • DC - For load and store instructions, the data cache is accessed and data virtual-to-physical address translation is performed. At the end of this particular stage, the data becomes available and the load-align shifts the data to a word or double-word boundary. Also, the data cache tag is checked against the page frame number (which is obtained from the joint TLB).

  • WB - For load instructions, the result is written back to the register file. For register to register instructions, the result is also written back to the register file. For store instructions, the data cache is updated with whatever the store data is. There is a branch delay and a load delay each of one cycle.

Direct quote from source: "The branch delay is observed by noting that the branch compare logic operates during the EX pipestage, producing the target address which is available for the IC pipestage of the second subsequent instruction. The first subsequent instruction is the instruction in the delay slot and will be allowed to complete whether the branch is taken or not".
"Similarly, the load delay of one is evident when the completion of a load at the end of the DC pipeline stage produces an operand which is available for the EX pipestage of the second subsequent instruction".

The hardware detects if the instruction currently in the branch delay slot is dependant on the register to be loaded, and interlocks to accomodate this. The pipeline flow is interrupted when such an interlock is detected or if an exception occurs. Stalling the pipeline will resolve an interlock condition. However, an exception will abort the relevant (and all the following) instructions.

If more than one pipestage requests a stall, the priority of resolve is as follows. A request from the DC pipestage has higher priority than a stall request from the RF pipestage. This priority minimises possible resource conflicts.

Q: How does the processor execution unit perform?

A: The design purpose of the R4300 execution unit is to reduce power consumption and simplify hardware requirements whilst providing a high level of performance. This higher performance level is brought about by maximizing the total use of each functional element. Integer performance is optimized for the R4300's target applications.

The execution unit is coupled to the cache memory system, instruction/data caches and the on-chip memory management unit (see CPO.doc). The unit has a multifunction pipe and is responsible for the execution of:

Integer arithmetic and logic instructions
Floating-point Coprocessor (CP1) instructions
Branch/Jump instructions
Load/Store instructions
Exception instructions
Special instructions

All floating-point instructions are processed by the same hardware as for the integer instructions. However, the execution of floating-point instructions can still be disabled via the CU bit defined in the CP0 Status register (see CPOreg.doc)

The datapath of the execution unit consists of:
A 64-bit integer/mantissa datapath
An operand bypass network
32 64-bit integer registers
32 64-bit floating-point registers
A 12-bit exponent datapath
A 64-bit instruction virtual address generator

Note:
The integer/mantissa datapath is 64 bits wide and is compatible with both 32 and 64-bit operands for integer and floating-point numbers. It has a Carry-Propagate Adder, a CSA Multiplier, a bi-directional Shifter and a Boolean Logic functional unit.
The Logical Operation unit performs all integer logical operations.
The carry-propagated adder is used for all other integer and floating-point computational instructions.
The adder is also used to compute data virtual address for load and store instructions and to compare two operands in trap instructions. The CSA multiplier is used for both integer and floating-point multiplication, in single or double precision.
The shifter is responsible for integer variable shifts, store align shifts, and floating-point post-normalization. It also has build-in guard/round/sticky collection logic for floating-point pre-alignment shift.

In addition, the datapath has a Leading Zero Counter for floating-point normalization shift calculation, and a floating-point unpacker and repacker.

For load and store class instructions, the datapath is capable of handling partial-words in either big or little-endian form.
For store instructions, the main bi-directional shifter performs an alignment shift on the register read data.
For load instructions, a load delay of one pclock cycle needs to be maintained. Due to the timing requirements of this delay, a Load Aligner is needed to shift the memory read data in bytes/halfwords/words right or left.

The exponent datapath has a 12-bit width. The twelfth bit (or MSB) is used as both sign bit and an overflow bit. A feedback mux and 2 operand muxes are used to select the inputs from the adder, constant generating logic, a carry select adder, random logic to perform exception detection, and a register to hold the selected result from the adder.

The inputs to the above come from the unpack logic. Here, the exponents are extracted from single or double-precision floating-point operands. The carry-selected adder performs exponent subtraction, pre-alignment shift calculation and exponent addition for the final post-normalization update. The result is sent to the repack logic to be merged with the mantissa.
The result is compared (by the results checker) with constants or ranges to check for various conditions. These conditions may include underflow, overflow (either in single or dopuble precision numbers) one, zero and convert limit check. The checks are performed as soon as data is available from the carry-select adder.

The instruction virtual address unit is responsible for the generation of 64-bit instruction virtual addresses to be used by the micro-TLB, I-Cache and CP0 (see CPO.doc). It has its own incrementor to calculate the next sequential address. It also has an equality comparator and a separate ripple-carry adder to generate the branch target address.

The address unit also has exception vector generator logic used in decoding the type of exception and presenting the appropriate vector as the next PC address. It also has the exception PC register pipe chain to maintain a history of PC addresses for each pipestage so that the PC address associated with the exception causing instruction can be loaded into the EPC (Exception Program Counter)register.

Q: Does the processor use a cache of some description?

A: Yes. The incorporation of on-chip instruction/data caches achieves a higher performance level, increases memory access bandwidth and reduces the latency of load and store instructions.

Each cache has an individual 64-bit datapath and can be accessed in parallel with the other cache. Both the instruction and data caches are directly mapped, virtually indexed and use physical tags.

Q: How are the R4300 caches organized?

A: The two caches are organized as follows:

The instruction cache

16 kilobytes in size, the instruction cache is organized as eight-word (or 32-byte) lines with a 21-bit tag entry per line. The tag entry consists of a valid bit and a 20-bit physical tag (bit 31:12 of the physical address).
Each line has two possible states; Invalid or Valid. Determined simply be whether the line contains valid or invalid information.

The instruction cache is accessible in one p-cycle. Access begins on phase 2 of the IC pipestage and completes at the end of phase 1 of the RF pipestage. Two instructions are fetched for every one access - therefore instruction fetching is required only on every other run cycle or when there is a jump/branch instruction in the EX pipestage. When there is a miss detected during an access, a memory block read will be initiated from the system interface to replace the current cache line with the desired line.

The data cache

8 kilobytes in size, the data cache is organized as four-word (or 16-byte) lines with a 22-bit tag entry per line. The tag entry consists of a valid bit (V), a dirty bit (D) and a 20 bit physical tag (bit 31:12 of the physical address).
Each line has three possible cache states. These are invalid, valid clean, valid dirty. A data cache line with an invalid state does not contain valid information. A cache line in valid clean state contains valid information and is consistent with main memory. A cache line in valid dirty state contains valid data but is not consistent with main memory.

The data cache uses a write-back cache policy meaning that store data is written to the cache line rather than main memory. The modified cache line will only be written back to main memory when it needs to be replaced. For load or store misses, a cache block read will be issued to main memory to bring in a new line and the missed line will be handled like this:

For a data load miss:
If the missed line is not dirty it will be replaced with the new line.
If the missed line is dirty, the missed line will be moved to the flush-buffer, the new line will replaced the missed line, and the data in the flush-buffer will be written back to main memory.

For a data store miss:
If the missed line is not dirty, it will be replaced with the new line.
If the missed line is dirty, then it will be moved to the flush-buffer, the new line will be written to the cache, and the data in the flush-buffer will be written back to main memory. In either case, the store data is merged with the new line. The data cache is accessible on reads in one p-cycle. The access begins on phase 1 of the DC pipestage and completes at the end of phase 2 of the DC pipestage. Each access will fetch a double word. However, the data cache writes execute in two p-cycles. A cache read is initiated in the first p-cycle, and a cache write with dirty bit set is initiated in the second p-cycle.

Byte, half-word, three-byte, word, five-byte, six-byte, seven-byte, and double-word data cache accesses are permitted. The data size of a partial load is given from the access type from the integer control unit and the lower three address bits. The data alignment is performed by the datapath load aligner.

Direct source quote: "To reduce the cache miss penalty, the address of the block read request will point to the location of the desired double-word. Since the data cache has a two double-word line size, the system interface will return the critical double-word first, followed by the remaining double-word. The return data will be written to the cache as it is put on the data bus to be used by the execution unit."

A variety of cache operations are provided for use in maintaining the state and contents of both caches. During the execution of cache operation instructions, the processor may issue processor block read or write request to fill or write-back a cache line.

Q: So what is the function of the flush buffer (mentioned earlier)?

A: The flush buffer is used as temporary data storage for outgoing data. It is organized as a 4 deep fifo, ie. it can buffer 4 addresses along with 4 double-words of data.
For uncached write operations the flush buffer can accept any combination of single or double-word data until it reaches capacity (each write occupies one entry in the buffer). For data cache block write operations, the flush buffer accepts 2 double-words with 1 address (this time occupying two entries in the buffer). It is able to take two block references at a time. Instruction cache block writes use 4 doublewords with 1 address and occupy the entire flush buffer. The flush buffer is able to take one read memory reference at a time.

During an uncached store, data will be stored in the flush buffer until it is taken by the external interface. While data awaits in this area, processor pipeline continues to execute.

During a load miss or a store miss to a cache line in the dirty state, a read request for the missing cache line is sent to the external interface. The dirty data is stored in the flush buffer until the requested data is returned from the external interface. The processor pipeline continues to run while the flush buffer writes the dirty data to the external interface.

If the flush buffer is full and the processor attempts a load or a store which requires external resources, the processor pipeline will stall until a point where the buffer is emptied.


Q: How is memory management implemented?

The R4300 Memory Management Unit (or MMU) uses an on-chip TLB used to translate virtual addresses into physical addresses. The TLB holds 32 entries which provide mapping to 32 odd/even pairs - or a total of 64 pages.
When address mapping is indicated, each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the EntryHi register. The address is mapped to a page between 4Kbytes and 16Mbytes in size.

The processor virtual address can be either 32 or 64 bits wide. This is dependant on whether the processor is operating in 32-bit or 64-bit mode. Simple (!) as that. See...

In 32-bit mode, addresses are 32 bits wide. The maximum user process size is 2gigabytes (231).
In 64-bit mode, addresses are 64 bits wide. The maximum user process size is 1 terabyte (240).

Q: How is a virtual address converted to a physical address?

A: Firstly, a comparison is made between the virtual address from the processor snd the virtual address in the TLB. There is a match when the virtual page number of the address is the same as the VPN field of the entry and either:

The Global (G) bit of the TLB entry is set, or...
The ASID field of the virtual address is the same as the ASID field of the TLB entry.

The match is named a TLB hit.

If no match is found, a TLB Miss exception is taken by the processor and software is allowed to refill the TLB from a page table of virtual/physical addresses in memory.

If there is a virtual address match in the TLB, the physical address is output from the TLB and brought together with the Offset, which represents an address within the page frame space.


The processor has three operating modes that function in both 32 and 64-bit operations:

User mode
Supervisor mode
Kernel mode

Q: What address space is available for each mode of operation?

A: See the following:

In User mode, a single, uniform virtual address space labelled User segment is available; its size is:

2 Gbytes (231 bytes) in 32-bit mode (useg)
1 Tbyte (240 bytes) in 64-bit mode (xuseg)

The User segment starts at address 0 and the current active user process resides in either useg (32-bit mode) or xuseg (64-bit mode). The TLB maps all references identically to useg/xuseg from all modes and controls cache accessibility.

The processor operates in User mode when the Status register contains bit-values as follows:

KSU bits = 102
EXL bits = 0
ERL bits = 0

The UX bit in the Status register selects between 32 or 64-bit User mode to use in conjunction with these bits. This is determined as follows:

When UX = 0, 32-bit useg space is selected.
When UX = 1, 64-bit xuseg space is selected.


Supervisor mode is designed for layered operating systems in which a true kernel runs in R4300 Kernel mode, and the rest of the operating system runs in Supervisor mode. Nb: Can anyone tell us if this pertains to the N64? We'll remove it if it doesn't.


The processor operates in Supervisor mode when the Status register contains bit values as follows bit-values:

KSU= 012
EXL= 0
ERL= 0

In conjunction with these above, the SX bit in the Status register selects between 32 or 64-bit Supervisor mode addressing. This is determined as follows:

When SX = 0, 32-bit supervisor space is selected
When SX = 1, 64-bit supervisor space is selected


The processor operates in Kernel mode when the Status register contains one of the following values:

KSU= 002
EXL= 1
ERL= 1

In conjunction with these bits, the KX bit in the Status register selects between 32 or 64-bit Kernel mode addressing in the following way:

When KX = 0, 32-bit kernel space is selected
When KX = 1, 64-bit kernel space is selected

The processor enters Kernel mode whenever an exception is detected. It remains as such until an Exception Return (or ERET) instruction is executed. The ERET restores the processor to whatever mode it was in before the exception occurred. Kernel mode virtual address space is divided into regions differentiated by the high-order bits of the virtual address

Q: How are exceptions generally managed?

A: The processor receives exceptions from a number of sources. These include:

TLB misses
Arithmetic overflows
System calls
I/O interrupts

When the CPU detects one of these, the normal instruction execution sequence is suspended and the processor enters Kernel mode.

The processor then disables interrupts and forces the execution of a software exception processor (also called a handler) which is located at a fixed address. This handler saves the processor context - including the contents of the program counter, the current operating mode (ie User or Supervisor) and the status of the interrupts (ie enabled or disabled). This is saved so it can be restored when the exception has been serviced.

When an exception occurs, the CPU loads the EPC (Exception program Counter) register with a location where execution can be restarted after the exception has been seen to. The restart location in the EPC register is the address of the instruction that caused the exception. Alternatively, if the instruction was executing in a branch delay slot, the restart location is the address of the branch instruction immediately preceding the delay slot.

Exceptions are separated into four vector spaces:

Reset and NMI vector
TLB Refill vector
XTLB Refill vector
General exception vector

The value of each of the above (except the Reset and NMI vector) depends on the Boot Exception Vector (BEV) bit of the Status register, which allows two alternate sets of vectors to be used: One set pointing to the PROM address space and the other pointing to cacheable address space.

The Reset and NMI exceptions are always vectored to location 0xBFC0 0000 in 32-bit mode, and location 0xFFFF FFFF BFC0 0000 in 64-bit mode.

The addresses for all other exceptions are a combination of a vector offset and a base address. The base address is determined by the BEV bit of the Status register.

While more than one exception can occur for a single instruction, only the exception with the highest priority is reported. Table 3 lists all exceptions in the order of their priority.

Exception Priority Order

Reset (highest priority)
Soft Reset
NMI
Address Error -- Instruction fetch
TLB Refill -- Instruction fetch
TLB Invalid -- Instruction fetch
Bus Error -- Instruction fetch
System Call
Breakpoint
Coprocessor Unusable
Reserved Instruction
Trap Instruction
Integer Overflow
Floating-point Exception
Address Error -- Data access
TLB Refill -- Data access
TLB Invalid -- Data access
TLB Modified -- Data write
Watch
Bus Error -- Data access
Interrupt

The instruction that causes an exception and all those that follow it are aborted (generally before committing any state) and can be re-executed after the exception has been serviced. When following instructions are aborted, exceptions associated with those instructions are also aborted. Therefore, exceptions are not taken in the order detected, but rather in order of instruction fetch.

The exception handling system is responsible for the efficient handling of events that occur with reasonable frequency. These may include:

Translation misses
Arithmetic overflow
I/O interrupts
System calls

Such events cause the interruption of the normal flow of execution; This means instructions which cause exceptional conditions and all those which follow (and those that have have already begun executing) are aborted. A direct jump into a designated handler routine then occurs.

Q: What are the purpose of interface signals?

A: Interface signals allow the processor to access the external resources needed to satisfy cache misses and uncached operations. This is whilst permitting an external agent access to some of the processor internal resources. The signals include:

The System interface
The Clock/Control interface
The Interrupt interface
The Joint Test Action Group (JTAG) interface
The Initialization interface.


These interface signals provide the connection between the processor and other system components. The system interface consists of:

32-bit address and data bus
5-bit command bus
Multiple "handshaking" signals

Description of System Interface Signals

 
Name Direction Description
SysAD(31:0) Input/Output A 32-bit address and data bus for communication between the processor and an external agent.
SysCmd(4:0) Input/Output A 5-bit bus for command and data identifier transmission between the processor and an external agent.
EValid* Input Signals that an external agent is driving a valid address or valid data on the SysAD bus.
PValid* Output Signals that the processor is driving valid address or valid data on the SysAD bus.
EReq* Input Signals that an external agent is requesting the system interface bus.
PReq* Output Signals that the processor is requesting the system interface bus.
EOK* Input Signals that an external agent is capable of accepting a processor request.
PMaster* Output Signals that the processor is the master of the system interface bus.



The clock/control interface signals make up the interface for clocking and clock synchronization as shown below:

Table of Clock/Control Interface Signals

 
Name Direction Description
MasterClock Input Master clock input that establishes the processor operating frequency.
TClock Output Transmit clocks that establish the System interface frequency. Tclock is aligned with SyncIn at the MasterClock frequency.
SyncOut Output Synchronization clock output. Must be connected to SyncIn through an interconnect that models the interconnect between TClock and the external agent aligned with MasterClock.
SyncIn Input Synchronization clock input.
DivMode 1:0* Input These signals determine the ratio between the MasterClock and the internal processor PClock. DivMode 1:0 are encoded as follows: 00 = 1:1 MasterClock to PClock ratio. 01 = 1.5:1 MasterClock to PClock ratio. 10 = 2:1 MasterClock to PClock ratio. 11 = 3:1 MasterClock to PClock ratio.


The initialization interface signals make up the interface by which an external agent may initialize the processor operating parameters.


Table of Initialization Interface Signals

 
Name Direction Description
Reset* Input Used to initiate a soft reset sequence.
ColdReset* Input When asserted, this signal indicates to the R4300 processor that the +3.3 volt power supply is stable and the R4300 chip should initiate a cold reset sequence. The assertion of ColdReset* will reset the PLL.
PLLCAP0 Input A capacitor is connected between PLLCAP0 and the clock VssP to insure proper operation of the PLL.
PLLCAP1 Input A capacitor is connected between PLLCAP1 and the clock VccP to insure proper operation of the PLL.
TestMode* Input Used for cache testing. This signal must be connected to Vcc during normal operation. This pin will be part of the JTAG scan chain.


The interrupt/status interface signals make up the interface used by the external agent to interrupt the processor and also to monitor instruction execution (for the current processor cycle).

Table of Interrupt/Status Interface Signals

 
Name Direction Description
Int*(4:0) Input Five general processor interrupts. These are visible as bits 14 to 10 of the Cause register.
NMI* Input NMI


The JTAG interface signals make up the interface that provides the JTAG boundary scan mechanism.

Table of JTAG Interface Signals

 
Name Direction Description
JTDI Input Data is serially scanned in through this pin.
JTCK Input The processor receives a serial clock on JTCK. On the rising edge of JTCK, both JTDI and JTMS are sampled.
JTDO Output Data is serially scanned out through this pin.
JTMS Input JTAG command signal, indicating the incoming serial data is command data.

The primary communication paths for the System interface constitute:

A 32-bit address/data bus
SysAD
A 5-bit command bus (SysCmd)

These buses are bidirectional. They are driven by the processor to issue a processor request and by the external agent to issue an external request.

A request through the System interface consists of:

An address
A system interface command specifying the nature of the request
A series of data elements specifying if the request is for a write or read response

Some terminology:

Address cyles: Cycles in which the SysAD bus contains a valid address.
Data cyles: Cycles in which the SysAD bus contains valid data.


The most significant bit of the SysCmd bus is used to indicate whether the current cycle is an address cycle or a data cycle. depending on the result a) The remainder of SysCmd contains a system interface command (an address cycle), b) the remainder of SysCmd contains the data identifier (a data cycle).


The processor will repeat the address cycle until the external agent indicates that accepting the request is possible. The last address cycle is called the issue cycle - there are two types of these:

Processor read request issue cycles
Processor write request issue cycles

When the R4300 processor is driving the SysAD and SysCmd buses, the System interface is in master state. When the external agent is driving the SysAD and SysCmd buses, the System interface is in slave state. The processor is the default master of the system interface.

The external agent becomes master of the system interface only through protocols or an uncompelled change to the slave state. The latter is initiated by the processor.
There are two broad categories of requests:

Processor requests - include read responses and write requests.
External requests - also include both read responses and write requests.

A processor request is a request through the System interface, to access some external resource. The following rules apply to processor requests.

After issuing a processor read request, the processor cannot issue another read request until it a read response has been recieved. A Processor Read Request is issued by driving a read command on the SysCmd bus, driving a read address on the SysAD bus, and asserting PValid.

Only one processor read request may be pending at a time.

Direct quote from main source "The processor must wait for an external read response before starting a further read. The processor transitions to slave after the issue cycle of the read request by de-asserting the PMaster signal. An external agent may then return the requested data via a read response. The external agent, which has become master, may issue any number of writes before sending the read response data."

"A Processor Write Request is issued by driving a write command on the SysCmd bus, driving a write address on the SysAD bus, and asserting PValid* for one cycle, followed by driving the appropriate number of data identifiers on the SysCmd bus, driving data on the SysAD bus, and asserting PValid*. For 1- to 4-byte writes, a single data cycle is required. Byte writes of size 5, 6, & 7 are broken up into 2 address/data transactions; one 4 bytes in size, the other 1, 2, or 3 bytes. For all sizes greater than 7 bytes (e.g. 8, 16, 32), 4 bytes will be sent on each data cycle until the appropriate number of bytes have been transferred. When the last piece of data is being transferred, this final data cycle will be tagged as "Last Data" on the command bus."

An external agent should be able to receive write data over any number of cycles with any number of idle cycles between any two data cycles - this ensures compliancy with all implementations of the protocol. However, for the R4300, the data will begin on the cycle immediately following the write issue cycle and data will be transferred at a programmed cycle data rate from there. The processor drives data at the rate specified by the data rate configuration signals.

Writes may be cancelled and retried with an EOK signal.

External Write Requests have similar properties to a processor single write with the exception of the signal EValid, which is asserted instead of a PValid signal.
An external write request consists of an external agent driving a write command on the SysCmd bus and a write address on the SysAD bus and asserting EValid for one cycle. This is followed by driving a data identifier on the SysCmd bus and data on the SysAD bus and asserting EValid again for one cycle. The data identifier in this case must contain a last data cycle indication.

During An External Read Response the external agent returns data to the processor in response to a processor read request by waiting for the processor to transition to slave, and then returning the data via a single data cycle or a series of data cycles sufficient to transmit the requested data. After the last data cycle is issued the read response is complete and the processor will become the master (assuming EReq* was not asserted). If at the end of the read response cycles, EReq* has been asserted, the processor will remain slave until the external agent relinquished the bus. When the processor is in slave mode and needs access to the SysAD bus, it will assert PReq* and wait until EReq* is de-asserted.

The data identifier may indicate that the data transmitted during that cycle is incorrect. However, an external agent must return a block of data of the correct size regardless of any rogue data cycles. A bus error will occur if a read response includes one or more of such data cycles.

Read response data must be delivered to the processor only prior to a pending processor read request.

Q: How are processor clocks used?

A: Processor clocks are controlled by an on-chip Phase Locked Loop circuit. This circuit keeps the internal clock edges aligned with the clock edges of the MasterClock signal - which acts as the system master clock.

The MasterClock signal is multiplied by a factor determined by DivMode 1:0. All internal clocks are then derived by dividing that signal down. There are two primary internal clocks:

The pipeline clock (PClock) - a multiple of the MasterClock frequency as determined by DivMode 1:0.
The system interface clock SClock - equal to the MasterClock frequency.

TClock is generated by the processor at the same frequency as SClock - it is aligned with SClock. External agents use it to drive data and also as the global clock for the external agent. TClock can be considered as the synchronized external system interface clock.


Q: Are there any user selctable operation modes?

A: Yes. There are several user selectable modes supported by the register, the majority of which can be set and reset by writing to the CP0 Status register (see CP0.doc).

The Reduced Power mode (RP)

RP mode allows the user to change the processor operating frequency to 1/4 speed. This is implemented by setting bit 27 of the Status register.

The use of this feature is beyond the scope of this document.

The Floating-point Register mode (FR)

FR mode allows the user to access the full set of 32 64-bit floating point registers as defined in MIPS-III. When reset, the processor will access the registers as defined in the MIPS II architecture. This is implemented by setting bit 26 of the Status register.

The Reverse Endianness mode (RE)

RE mode allows the user to switch byte ordering between BigEndian and LittleEndian. This is implemented by setting bit 25 of the Status register.

The Instruction Trace Support mode (ITS)

ITS mode allows the user to track branches or jumps. When the ITS bit is set, the physical address to which the CPU has branched will be reported on the SysAD bus by forcing an instruction cache miss whenever a branch, jump or exception is taken. The mode is implemented by setting bit 24 of the Status register.

The Bootstrap Exception Vectors mode (BEV)

BEV (implemented through bit 22 in the Status register), causes the TLB refill exception vector to be relocated to a virtual address of 0xbfc00200 and the general exception vector to 0xbfc00380.

The Supervisor Extended Addressing mode (SX)

SX (implemented through bit 6 of the Status register) enables MIPS III opcodes in supervisor-mode and causes TLB misses on supervisor addresses to use the Extended TLB refill exception vector.

The User Extended Addressing mode (UX)

UX is implemented through setting bit 5 of the Status register. It enables MIPS III opcodes, this time in user-mode and causes TLB misses on user addresses to use the Extended TLB refill exception vector. If clear, implements MIPS II compatibility on virtual address translation.

The Interrupt Enable mode (IE)

IE implemted through clearing bit 0 of the Status register, will not allow interrupts with the exception of reset and NMIs

Appendix


 
R4300 Pin-Out table

1 Vcc 31 Vss 61 Vss 91 Vcc
2 Vss 32 Vcc 62 Vcc 92 Vss
3 SysAD22 33 SysAD16 63 JTDI 93 NMI
4 SysAD21 34 SysAD15 64 SysAD4 94 SysAD26
5 Vcc 35 Vss 65 JTDO 95 PMaster*
6 Vss 36 Vcc 66 SysAD3 96 Vcc
7 SysAD20 37 SysAD14 67 Vss 97 Vss
8 Vcc 38 SysAD13 68 Vcc 98 SysAD25
9 VccP 39 Vss 69 SysAD2 99 EReq*
10 VssP 40 Vcc 70 SysAD1 100 SysCmd0
11 PLLCAP0 41 SysAD12 71 Vss 101 Vcc
12 PLLCAP1 42 SysAD11 72 Vcc 102 Vss
13 VccP 43 Vss 73 SysAD0 103 SysCmd1
14 VssP 44 Vcc 74 PReq* 104 Reset*
15 Vcc 45 SysAD10 75 Vss 105 EValid*
16 MasterClock 46 Int0* 76 Vcc 106 SysCmd2
17 Vss 47 SysAD9 77 SysAD31 107 Vcc
18 TClock 48 VsSysADs 78 PValid* 108 Vss
19 Vcc 49 Vcc 79 Vss 109 SysCmd3
20 Vss 50 SysAD8 80 Vcc 110 ColdReset
21 SyncOut 51 SysAD7 81 SysAD30 111 SysCmd4
22 SysAD19 52 JTMS 82 EOK* 112 DivMode1
23 Vcc 53 Vss 83 SysAD29 113 Vcc
24 SyncIn 54 Vcc 84 Vss 114 Vss
25 Vss 55 SysAD6 85 Vcc 115 SysAD24
26 SysAD18 56 SysAD5 86 SysAD28 116 DivMode0
27 SysAD17 57 JTCK 87 SysAD27 117 SysAD23
28 Int4* 58 Int1* 88 Int2* 118 Int3*
29 Vcc 59 Vss 89 Vss 119 Vcc
30 Vss 60 Vcc 90 Vcc 120 Vss


Part of the N64d compilation of Nintendo64 documents.
1997 Denary Notation

← previous
next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT