O2 N64 emulator

bplaa.yai

Member
Aug 6, 2020
19
51
13
I had this little idea for a long time and I recently decided to give it a chance. Given how closely related the O2 and the N64 are from an architecture point of view, would it be possible to write an N64 "emulator" for the 02 ? It would be very cool to have N64 ROMs to run as close as possible to the metal, with minimum modifications, and achieve full speed emulation :cool:

To get an idea wether it is achievable or not, I've started to poke around and run various experiments that I'll relate in this thread. If all goes well, these experiments could lead to something close to an emulator, or these will prove to be a dead end and fail miserably.

Disclaimer : I obviously don't know what I'm doing ! But that's the point, to learn and have fun hacking on the way :)
 
CPU and Video Interface emulation.

Beginning with the low hanging fruits here.
We will need some test reference code and ROMs, and these will be provided by the excellent Peter Lemon's N64 bare metal code repository .
We will also need a reference emulator that can be debugged. Mame and cen64 will do a great job at it. Unfortunately I was not able to make the cen64 debugger working, probably because it was designed with the gnu toolchain in mind, where Peter Lemons code uses the Bass compiler. But the Mame debugger is working fine.

Now, what happens if you load an N64 ROM into memory, and just jump into the code entry point ? Well, obviously this won't last long, and you will quickly hit a segfault.
Then, what happens if you allocate various memory regions, and patch the ROM code to replace LUI instructions that load known memory regions with your allocated memory addresses ? This time, as long as the code is CPU only, it goes surprisingly well and the code will run without crashing (HelloWorld examples from the code repository above). Below are some screenshots of the Mame debugger on the PC, and GDB running the experiment on the O2 :
Screenshot from 2024-10-07 19-31-04.png

Screenshot from 2024-10-07 19-27-24.png


Obviously, the allocated memory regions have to be aligned on a 0x10000 boundary so you don't have to worry about the lower part of the addresses. Also, the cartridge domain will need to be aligned on a larger boundary (0x10000000) so that the "J" instruction will work without patching.

Now, what are we missing to make the HelloWorld and Framebuffer samples to work ? Some code that handles the VI (Video Interface) registers and displays the framebuffer content according the the pixel format specified in the VI registers. For now we don't care about nasty timing stuff, and just setup an SDL1 window with the correct resolution and pixel format. And here we are, Hello World and Framebuffer setup working !

snap1.png


snap2.png


snap3.png


At this stage there is a lot of questions are raising, like :
  • Will the patching approach last long, will we need at some point a full blown recompiler ? Also, how to detect code boundaries in the ROM and avoid patching some data stuff
  • How to handle control registers ? Possible approach would be
    • Have a separate thread poll control registers memory region (like the current video interface code)
    • Setup control registers memory with mmap and WRITE protect them to a SEGFAULT signal is triggered when a write is attempted
    • Aggressively recompile the code so all memory access goes through proxy functions
Next will be some tinkering with the VICE.
 
So, why specifically targeting the O2 you may ask ? Because of the VICE !

The N64 RSP is very similar the the O2 MSP. Still not sure about the similarities between the N64's RDP and the O2's BSP, but that's worth investigating.

To check if the O2 VICE can execute N64 microcode, we first have to "take control" of the VICE, as there is unfortunately no documented Irix interface with the VICE. As I understand, there would be 3 ways to control the VICE :
  • Interact through Irix's libvice.so
  • Directly interact through exposed device files, like /dev/vice, /dev/vicedbg, /dev/videdms or even /dev/mem or /dev/kmem as a last resort
  • If none of the above userland options are available, go kernel side with a driver
First I went the libvice route. As this lib is not documented and not supposed to be used by the end user, we'll have to do some kind of RE on the lib. So I imported libvice.so in Ghidra and see what we've got.

Lots of function symbols are available, and decompilation works pretty fine on most of them
Screenshot from 2024-10-07 21-12-21.png

Screenshot from 2024-10-07 21-16-49.png


What can we learn from here ?
  • There is a VICENOISE environment variable that makes libvice verbose :)
  • There are several global variables in the lib that can be used when linked in user code. Some of them are
    • _viceid which holds the version of the VICE chip. The valid values are 0xe1 for the first VICE version (not met in the wild ?), 0xe2 for the VICE DX version, and 0xe3 for the VICE TRE version, as reported in hinv
    • _vice_base which contains an address to a memory mapped region where libvice does most of its interaction
    • _etcp which is an offset in _vicebase
    • _vicefd which holds file descriptors for all opens on the device, seems limited to 64 file descriptors opens at once.
  • The accesses to the vice mapped memory are all offset according to the vice fd, so there is clearly management in the library for concurrent / multiprocess access
  • There is some cryptic ioctls() that will be almost impossible to decypher without kernel code :(
With this I attempted to decode and understand some function signatures in the lib. For example :
  • longlong vice_open(char *param1)
    • param1 : the device special file, eg. /dev/vice or /dev/vicedms
    • return value is a "vice index" (fd) that has to be passed to subsequent calls
  • void *vice_mmap(int param1, longlong param2, int param3)
    • param1 : the vice index/fd
    • param2 : the offset
    • param3 : the length
    • return : pointer to the mapped region
  • undefined8
    vice_load_msp_code(undefined8 param_1,undefined8 param_2,undefined8 param_3,undefined8 param_4)
    • param1 : vice index
    • param2 : microcode path (note vice_gfxfile() and vice_file() function are used to select appropriate arch dependent microcodes in the filesystem)
    • param3 : some kind of opaque structure
    • param4 : often 0 ?
  • functions vice_load_bsp_code(...), vice_load_bsp_table(...) and so on have similar signatures to vice_load_msp_code() without the 4th parameter
  • lVar2 = vice_event_start(*param_1,param_1 + 8) : starts microcode execution. param1 is the opaque structure found above.

All of these function are called from various libdmedia encoder/decoders libraries, like vicejpeg.so, vicempeg.so and so on.

With all of that in mind, we can guess that libvice is designed to work closely with dmedia in a multiprocess environment, and implementing some kind of work queue / event mechanism. Also, the mysterious ioctls() and the infamous opaque structure may be very difficult to RE.
I went as far as beeing able to load msp/bsp code and data to the VICE, but couldn't get vice_event_start() to do anything usefull without understanding the structure of the opaque struct argument.


Second attempt was trying with mmap'ing /dev/vice and friends. No luck here either. Not only because the exposed mapped memory is closely related to libvice's specifics described above, but also because control registers do not seem to be exposed, so no point in our use case.

Finally, direct access through /dev/mem and /dev/kmem didn't bring anything, because the (supposed) physical address of the VICE 0xb7000000 is located in kseg1, so not accessible through /dev/mem and no exposed by /dev/kmem (see /var/sysgen/master.d/mem for region accessible via /dev/kmem).

To be continued kernel side
 
Last edited:
VICE kernel side

At this point I have to mention the great work of Ilya Volynets and Vivien Chappelier that can be found here . This work is from 2002, and targets linux-mips. It implements a kernel device driver for the VICE, and a libvice for interacting with the device driver. Also worth mentioning is a binutils patch to support BSP.
The libvice code comes with a user code example which supposedly loads a custom microcode in the MSP, loads data associated with it (did I hear "display lists" ? ;)), and retrieves the result.
That looks very promising, but unfortunately it is Linux specific so I couldn't test it as is. But there is definitely a lot to learn here.

Also, the most interesting part eg. the kernel driver is not provided in the page, and does not seem to have ever been merged into the kernel. The work on the kernel driver was supposedly hosted at https://www.linux-mips.org/~glaurung/O2/, but linux-mips.org seems long gone too. Hopefully it is still available through web.archive.org, and the patches can still be downloaded here http://web.archive.org/web/20180830062036/https://www.linux-mips.org/~glaurung/ .

So, back to Irix. With this Linux knowledge, maybe it's time to locate and attempt reversing Irix's VICE kernel driver ? As we did previously, import the Irix kernel (/unix, for the kernel dependencies) and the vice module (vice.o) into the Ghidra project. Unfortunately this time it does not work as well as the libvice stuff, and after importing we are greeted with a lot of error messages looking like
Elf Relocation Failure: R_MIPS_HI16 (5, 0x5) at 00010070 (Symbol = vice_regs_init) - Relocation missing required LO16 Relocation
Elf Relocation Failure: R_MIPS_HI16 (5, 0x5) at 00010084 (Symbol = .bss) - Relocation missing required LO16 Relocation
Elf Relocation Failure: R_MIPS_HI16 (5, 0x5) at 0001009c (Symbol = vice_atom_init) - Relocation missing required LO16 Relocation
....
And indeed, the decompilation is not as useful as before :
Screenshot from 2024-10-07 22-21-06.png


So, what do these error message even mean ? Using readelf on vice.o gives us he following relevant parts :
Relocation section '.rel.text' at offset 0xbfec contains 478 entries:
Offset Info Type Sym.Value Sym. Name
0000000c 00003a06 R_MIPS_LO16 00001b0c vice_regs_init
00000020 00000406 R_MIPS_LO16 00000000 .bss
00000038 00003b06 R_MIPS_LO16 00003c88 vice_atom_init
00000058 00003d06 R_MIPS_LO16 00000000 lock_alloc
00000060 00003c06 R_MIPS_LO16 00000000 plbase
00000068 00000406 R_MIPS_LO16 00000000 .bss
00000078 00003e06 R_MIPS_LO16 000026d4 vice_intr_init
....

Relocation section '.rela.text' at offset 0xdea4 contains 485 entries:
Offset Info Type Sym.Value Sym. Name + Addend
00000008 00003a05 R_MIPS_HI16 00001b0c vice_regs_init + 0
0000001c 00000405 R_MIPS_HI16 00000000 .bss + 0
00000034 00003b05 R_MIPS_HI16 00003c88 vice_atom_init + 0
00000050 00003c05 R_MIPS_HI16 00000000 plbase + 0
00000054 00003d05 R_MIPS_HI16 00000000 lock_alloc + 0
00000064 00000405 R_MIPS_HI16 00000000 .bss + 0
00000074 00003e05 R_MIPS_HI16 000026d4 vice_intr_init + 0
0000008c 00000205 R_MIPS_HI16 00000000 .rodata + 18
00000090 00003f05 R_MIPS_HI16 00000000 printf + 0
With this we understand that the R_MIPS_HI16 are located in one section (.rela.text) and the R_MIPS_LO16 in another (.rel.text). From Ghidra's error message and some reading, it seems that either rel or rela is supposed to be processed depending on various conditions, but not both at the same time. So maybe we can help Ghidra by appending R_MIPS_LO16 relocations in the RelA section ?
For this, I will use the excellent patchelf utility. It's very handy for patching ELF files and if it does not provide the patching operation you need, the code is very easy to understand and well documented.
After a bit of code addition in patchelf for messing with Rel and RelA sections as we need, we get a patched vice device driver (vice_patched.o below) with all relocs in the same section. Time to load in Ghidra :

Screenshot from 2024-10-07 22-33-26.png


Great, no more error messages, and this time the decompilation looks a lot better :)
By having a look in vice_regs_init() init code, we can see some familiar stuff :

Screenshot from 2024-10-07 22-35-12.png


And we get the confirmation that the physical address of the vice is indeed located at 0xb7000000 (-0x49000000 signed in the screenshot above), and some interesting control registers located just after.

Last piece of the puzzle is now : can we access the vice memory from the kernel side with our own device driver ?

Short answer : yes :) After a lot of reading SGI techpubs, I was able to write a small kernel driver that can access the vice memory, and provide a device special file in /hw/vicen64. That file can be opened, mmap'ed and ioctl'ed from userland to interact with the kernel driver.

Next step is attempting to make a minimal kernel driver based on the Linux one, mmap VICE registers and IO memory to userland, and try to load and execute microcode from userland.
 
Last edited:
Wow you're amazing :) keep up the good work!

I always thought that with the rom decompilation tools available these days & the availability of sm64ex on IRIX (which can provide a workable front-end, input, sound and graphics driver output) that something could be made.

But it sounds like you're doing an even better job, actually working with the bare metal and taking advantage of O2 specific chips. Very cool!
 
  • Like
Reactions: Geoman
Holy cow! Im going to absoutly bookmark this project and can't wait to start playing n64 games on the o2! Will you ever add IS-Viewer or even the Indy n64 dev board emulation to this aswell. Ultimate n64 development environment? What spec o2 are you running this on? Incredible work
 
  • Like
Reactions: Geoman
Vice MSP and BSP

After spending more time poking at the VICE memory mapped by my driver, it occurred to me that for now a kernel driver may not be required after all... The memory exposed by my driver looks exactly the same as the one mapped in /dev/vice and, given sufficient permission we can write at interesting location (eg. control registers). But having a kernel device driver may be handy later.

So, my next goal was being able to control the MSP and the BSP, and make them execute custom code. For these experiments, I unloaded the Irix Vice device driver, to make sure Irix would not interfere with what I was doing.

I started by the BSP, thinking that it was simpler than the MSP, but I was quickly proven wrong. I went as far as (somewhat) loading some code in the MSP IRAM, start execution, watch the Program Counter bouncing around and finally stop. I could also stop and reset execution. Not bad, but at that point I wanted execute something and not let the PC go crazy all over the place. But for that I'll have to tackle 2 problems :
  1. Understand what kind of code actually expects the BSP. I expected it to be some kind of R4k derivative, but it looks like it's not. I probably lack some fundamental understanding here...
  2. Handle writing to the BSP Instruction RAM. My attempts resulted in strange results, with values duplicated every 16bits... once again probably missing something here

Not much success here, so let the BSP aside for a moment and have a look at the MSP.

Thanks to the knowledge provided by the Linux Vice driver, we can try again all of the above with the MSP.
First step was to reset the MSP, fill MSP IRAM (located at 0x2000) with zeros, and start execution. For this, we need to set the Program Counter register at the beginning of the code, and hit the start button :

1728822258190.png


And watch our program counter :
PC 0000000000002000
PC 00000000000020EC
PC 0000000000003010
PC 0000000000003010
PC 0000000000003010
PC 0000000000003010
...
PC 0000000000003010
EPC 3000 Cause 8000001C Flags 80
As expected, the PC started from 0x2000, quickly proceed through NOPs up to 0x3000 and stopped there. Because the MSP IRAM is 0x1000 (4K) long, that seems a pretty sane behavior :) Also, we can see that after stopping the exceptions registers are set : the exception occurred at PC 0x3000, but unfortunately we don't have the insights for understanding the Cause and Flags register.

So far so good. No let's try to put some code in the MSP IRAM. For this I will keep using Bass that can output code for the N64 RSP (which we expect to be similar to our MSP), and try the simplest possible code :
1728822943546.png


Modify our test program to add code loading, rinse and repeat :

MSP fill IRAM
Read code size 20
MSP IRAM
2000: 08000000 00000000 00000000 08000800 00000000 00000000 00000000 00000000
2020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
2fe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
MSP run
PC 0000000000002000
PC 0000000000000000
PC 0000000000000000
PC 0000000000000000
...
EPC 0 Cause 8000001C Flags 80

Not what we hoped, but not bad :) At the first instruction the PC jumped straight to 0 and stopped here. Maybe we should give some hints to Bass :

1728823466379.png


Try again :

MSP fill IRAM
Read code size 20
MSP IRAM
2000: 08000800 00000000 00000000 08000800 00000000 00000000 00000000 00000000
2020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
2fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2fe0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
MSP run
PC 0000000000002000
PC 0000000000002000
PC 0000000000002008
PC 0000000000002008
PC 0000000000002008
PC 0000000000002008
PC 0000000000002000
PC 0000000000002000
PC 0000000000002008
PC 0000000000002008
...
PC 0000000000002008
PC 0000000000002008
PC 0000000000002008
PC 0000000000002008
PC 0000000000002000
PC 0000000000002008
PC 0000000000002000
PC 0000000000002008
PC 0000000000002008
PC 0000000000002000
EPC 0 Cause 8000001C Flags 0

Success ! Our PC keeps jumping from 0x2008 (delay slot) to 0x2000 and so on. No more exceptions either (Flags = 0) and if we don't stop it the MSP keeps looping in the background.
Last attempt, we load our code at the very far end of available MSP IRAM and try again :

MSP fill IRAM
Read code size 20
MSP IRAM
2000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
2fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
2fe0: 08000800 00000000 00000000 08000800 00000000 00000000 00000000 00000000
MSP run
PC 0000000000002000
PC 0000000000002170
PC 0000000000002ADC
PC 00000000000022B8
PC 0000000000002E2C
...
PC 0000000000002DB0
PC 0000000000002768
PC 00000000000020D8
EPC 3000 Cause 8000001C Flags 0

And here we have it, our MSP happily looping through Instruction RAM :)
 
Last edited:
First I'll say thank you! I've had a very similar thought about the O2 being a Workstation with the architecture of the N64. Although I haven't had the skills/knowledge to prove it.

A few tidbits, that may plain wrong, theory that needs to be proven or simply rumors, here goes:
1. On N64 IMEM is 4KB, on O2 I understood IRAM to be 6KB. I agree 6KB seems like an odd value, but it could explain your exception at 0x3000. On N64 the block is 4KB and ends at 0x3000, the PC register can be 12 bits and will "naturally" roll over ignoring the upper truncated bits, no error. If it's 6 KB and the PC register is 13 bits the range becomes 8KB and weird things can happen. This kind of extra oddness would be a reason to not promote the creation of custom microcode.
2. The N64 Microcode runs on the RSP, a stripped down R4K processor w/ a COP2 Vector Co-Processor. Decompiling this code is much harder because these are not common instructions in MIPS and certainly not in most decompilers.
3. The O2 COP2 has a few more instructions than the N64, these are byte oriented versions of other instructions.
4. The Microcode was intended to decode standards like MP3 or MPEG video. There would of been extra licensing fees to advertise these features or to include a "logo" in their manuals, so instead I think it "just worked".

Further N64 References: https://ultra64.ca/resources/documentation/
In the Silicon Graphics section I recommend the RSP and RDP manuals, there are some interesting tidbits

P.S. Don't waste time or money on the N64 Keyboard: I already did and RE'd it here: https://sites.google.com/site/conso...s-documentation/n64-specific/randnet-keyboard
 
Thanks for sharing your thoughts and for the linked resources :)

Regarding your comments on IMEM, all the infos I could find about the VICE indicates 4k of MSP IMEM and 4k of BSP IMEM. So the MSP PC behavior looks perfectly fine to me, having IMEM starting at 0x2000 and getting an exception 4k later at 0x3000.
But indeed the VICE has 6k of DMEM (3 x 2k banks) where the RSP/RDP shares 4k, maybe that's what you recall ? For sure that raises some questions regarding DMEM addressing differences between the Vice and the RSP, but for the purpose of emulation I guess it shouldn't be an issue (at least at this stage !).

I didn't have much visible progress lately, but I'm not stuck either ! With the (relative) success with the CPU and the RSP/MSP, I decided to focus my attention on the BSP. Being unable to write anything coherent to the BSP "IMEM" makes me think that's it's maybe not intended to be accessed directly from the main CPU, and the correct way to handle this may be DMA transfers.
That's what I've been working on lately, but this involves a lot more house keeping on the driver's side, in particular TLB and page management.
 
Following a cold trail...

After a bit more experiments, I finally got why I couldn't get anything into the BSP's IMEM. The chip is supposedly 16 bits and, despite writing BSP code word by word to the IMEM, I was trying to read it back 64 bits at a time :( (see disclaimer in 1st post !). After fixing this mistake, I could read back exactly what was written to instruction memory.

So, now that we can feed *something* to the BSP, maybe it's time to have a look at that binutils BSP patch from 2002. There is no indications about which binutils version it is supposed to apply against. By having a look at binutils version around that time frame, I downloaded and attempted to patch all versions from 2.12 (~ March 2002) to 2.17 (~June 2006). All patch attempts failed more or less, so I decided to settle with the version that failed less, that is 2.14. After a few hours of tweaking and compiling I finally got the patched binutils to compile.

Time to try to compile something. After having a look at the source code and the instruction set, there is a lot of unusual stuff there, but we can see some familiar MIPS looking instructions adapted to a 16 bits ISA. Here is a quick grep that should give you an idea of the instruction set of the beast :
lh $rAB $imm
lbl $rAB $imm
lbh $rAB $imm
sh $rAB $imm
sbl $rAB $imm
sbh $rAB $imm
lil $rAB $imm
lih $rAB $imm
nop
cmpi $rAB $imm
andi $rAB $imm
addi $rAB $imm
b $imm
beq $imm
bne $imm
bge $imm
blt $imm
bext0 $imm
bext1 $imm
bext2 $imm
jr $rT
jreq $rT
jreq $rT
jrge $rT
jrge $rT
jrext0 $rT
jrext1 $rT
jrext2 $rT
break
resume
add $rDCD $rSC $rT
addc $rDCD $rSC $rT
sub $rDCD $rSC $rT
subc $rDCD $rSC $rT
and $rDCD $rSC $rT
or $rDCD $rSC $rT
sll $rDCD $rSC $rT
sra $rDCD $rSC $rT
mul $rDCD $rSC $rT
xor $rDCD $rSC $rT
abs $rDCD $rT
copyto $rAltC $rSC
copyfrom $rAltC $rDCD
lhr $rDCD $rSC
lhr $rSC $rT
gtbitsi $bitswallow $rDCD $N
probebitsi $rDCD $N
shiftstream $bitswallow $N
getbitsr $bitswallow $rDCD $rT
genlookuppack $rT
leafrunlevelparse $bitswallow
blockrunlevelparse $bitswallow
loadcodepackH261 $bitswallow $p $imm
genericleafparse
blockrunsizeparse $bitswallow
codesearch $q $p
packbitstream $q $L $rT
loadcodepack $q $p $imm
bytealign
There are also 8 general purpose registers, and some other exotic stuff I still have no clue about.

So let's set the bar low and compile the simplest thing we can, just a "break" instruction (interestingly there is also a "resume" instruction). Then load and attempt to make the BSP run the code :
regs 4000000
buffers 400F000
BSP reset
loop halt
loop halt reset
MSP fill IRAM
BSP fill IRAM
Read code size 18
BSP IRAM

4000: 07010200 00000000 00000000 00000100 20000000 00000000 00000000 00000000
4020: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4040: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
4060: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
4fc0: 00060006 00060006 00060006 00060006 00060006 00060006 00060006 00060006
4fe0: 00060006 00060006 00060006 00060006 00060006 00060006 00060006 00060006
EPC 8 Cause 0
BSP run
PC 0000
PC 0016
PC 0016
PC 0016
...
PC 0016
PC 0016
EPC 10 Cause 4

Not so impressive, but at least it's doing what we asked for ! Started with a PC at 0x0000 (or absolute 0x4000, it does not seems to matter which one we load in the PC register), stopped at 0x16 with an exception PC at 0x10 wich is the address of our "break" instruction. I attempted more sophisticated stuff with branches and jumps but didn't get anything to work yet.

It would be nice to be able to disassemble some known BSP code and have a look at how it works. In the previous posts, we noted that libvice loads code from /var/arch/vicedx/ or /var/arch/vicetre/ depending on your VICE chip revision. Here is an example of what your will find on your O2 :
total 144
drwxr-xr-x 2 root sys 4096 Feb 1 2001 .
drwxr-xr-x 8 root sys 79 Feb 1 2001 ..
-r--r--r-- 1 root sys 2716 Feb 1 2001 cjfif.mex
-r--r--r-- 1 root sys 7680 Feb 1 2001 cjpeg-mcoef.bin
-r--r--r-- 1 root sys 650 Feb 1 2001 cjpeg.bex
-r--r--r-- 1 root sys 2176 Feb 1 2001 cjpeg.btbl
-r--r--r-- 1 root sys 2564 Feb 1 2001 cjpeg.mex
-r--r--r-- 1 root sys 650 Feb 1 2001 cjpeg_luma.bex
-r--r--r-- 1 root sys 2564 Feb 1 2001 cjpeg_luma.mex
-r--r--r-- 1 root sys 1184 Feb 1 2001 dfjpeg.bex
-r--r--r-- 1 root sys 3664 Feb 1 2001 dfjpeg.mex
-r--r--r-- 1 root sys 3088 Feb 1 2001 djfif.mex
-r--r--r-- 1 root sys 304 Feb 1 2001 djpeg-mcoef.bin
-r--r--r-- 1 root sys 1276 Feb 1 2001 djpeg.bex
-r--r--r-- 1 root sys 2688 Feb 1 2001 djpeg.btbl
-r--r--r-- 1 root sys 3056 Feb 1 2001 djpeg.mex
-r--r--r-- 1 root sys 1088 Feb 1 2001 dvcVLC.bin
-r--r--r-- 1 root sys 1450 Feb 1 2001 dvcntsc.bex
-r--r--r-- 1 root sys 4064 Feb 1 2001 dvcntsc.mex
-r--r--r-- 1 root sys 1460 Feb 1 2001 dvcpal.bex
-r--r--r-- 1 root sys 4060 Feb 1 2001 dvcpal411.mex
-r--r--r-- 1 root sys 3528 Feb 1 2001 dvcpal420.mex
-r--r--r-- 1 root sys 4024 Feb 1 2001 dvencodentsc.mex
-r--r--r-- 1 root sys 4024 Feb 1 2001 dvencodepal411.mex
-r--r--r-- 1 root sys 3688 Feb 1 2001 dvencodepal420.mex
-r--r--r-- 1 root sys 5056 Feb 1 2001 m1tabs.btbl
-r--r--r-- 1 root sys 5056 Feb 1 2001 m2tabs.btbl
-r--r--r-- 1 root sys 2034 Feb 1 2001 mpeg1dec.bex
-r--r--r-- 1 root sys 3492 Feb 1 2001 mpeg1dec.mex
-r--r--r-- 1 root sys 2032 Feb 1 2001 mpeg2dec.bex
-r--r--r-- 1 root sys 3548 Feb 1 2001 mpeg2dec.mex
-r--r--r-- 1 root sys 2032 Feb 1 2001 mpeg2dec_fld.bex
-r--r--r-- 1 root sys 3548 Feb 1 2001 mpeg2dec_fld.mex
-r--r--r-- 1 root sys 2976 Feb 1 2001 rs.mex
The MEX files are MSP executable code, the BEX files are BSP executables and the BTBL are BSP tables to be loaded in BSP table memory.

Let's attempt to disassemble djpeg.bex (supposedly the BSP part of the JPEG decoder) :
bash-5.0$ ./binutils/objdump -b a.out-bsp -m bsp -D /var/arch/vicetre/djpeg.bex
./binutils/objdump: /var/arch/vicetre/djpeg.bex: File format not recognized
Pretty disappointing indeed :(

But the attentive reader may have noticed that when we loaded our compiled BSP code, we had an object 18 bytes long for a single instruction that should be only 2 bytes :
Read code size 18
BSP IRAM

4000: 07010200 00000000 00000000 00000100 20000000 00000000 00000000 00000000
The first 16 bytes looks like a header and, if we have a closer look, bytes at offset 2 and 3 looks like our code length. This can be verified by compiling a few code sample and confirm that only these bytes are changing and reflect code length.

So let's take our djpeg.bex object, prepend this 16 bytes header, and hexedit bytes 2 & 3 to reflect our code length, and try again objdump :
Disassembly of section .text:

0000000000000000 <.text>:
0: 64 00 abs r0 r0
2: bd 01 lih r5 0x1
4: b5 f0 lil r5 0xf0
6: b4 00 lil r4 0x0
8: 1f 04 *unknown*
a: bc 80 lih r4 0x80
c: 20 00 break
e: 00 00 nop
10: 20 00 break
12: be 80 lih r6 0x80
14: b6 00 lil r6 0x0
16: 00 00 nop
18: 79 c5 *unknown*
1a: 4d fe and r3 r7 r6
1c: 48 3e *unknown*
1e: 19 fb bne 0xfb
20: 00 00 nop
22: 70 20 copyto rpage r4
24: 00 00 nop
26: 81 09 lh r1 0x9
28: 82 0b lh r2 0xb
2a: 64 db xor r1 r3 r3
2c: 86 0f lh r6 0xf
2e: b5 00 lil r5 0x0
30: bd 14 lih r5 0x14
32: 41 6c add r2 r5 r4
34: 70 28 copyto rpage r5
36: 00 00 nop
38: 99 00 sh r1 0x0
3a: 99 02 sh r1 0x2
3c: 9a 01 sh r2 0x1
3e: 9a 03 sh r2 0x3
40: 9b 04 sh r3 0x4
42: 9b 05 sh r3 0x5
44: 98 06 sh r0 0x6
46: 0e 04 cmpi r6 0x4
48: 19 04 bne 0x4
4a: 9e 07 sh r6 0x7
4c: b6 01 lil r6 0x1
4e: be 00 lih r6 0x0
50: 9e 07 sh r6 0x7
52: 81 07 lh r1 0x7
54: 77 b8 *unknown*
56: b7 04 lil r7 0x4
58: bf 00 lih r7 0x0
5a: 51 b7 or r3 r6 r7
5c: 73 f0 copyto mask_h r6
5e: b5 04 lil r5 0x4
60: 77 b8 *unknown*
62: bd 00 lih r5 0x0
64: 65 b7 xor r3 r6 r7
66: 73 f0 copyto mask_h r6
68: b7 00 lil r7 0x0
6a: bf 70 lih r7 0x70
6c: 7c 2f *unknown*
6e: 00 00 nop
70: 79 87 *unknown*
72: 00 00 nop
74: 65 b5 xor r3 r6 r5
76: 00 00 nop
78: 7c 2f *unknown*
7a: c1 cf gtbitsi puke r3 0xf
7c: c1 cf gtbitsi puke r3 0xf
7e: b7 db lil r7 0xdb
80: bf ff lih r7 0xff
82: 71 78 copyto alpha_l r7
84: 71 80 copyto beta_h r0
86: b7 ff lil r7 0xff
88: bf ff lih r7 0xff
8a: 71 f8 copyto beta_h r7
8c: 72 00 copyto beta_l r0
8e: ea 00 codesearch 0x1 0x0
90: b5 30 lil r5 0x30
92: bd 00 lih r5 0x0
94: 41 6c add r2 r5 r4
96: 70 28 copyto rpage r5
98: 00 00 nop
9a: c1 cf gtbitsi puke r3 0xf
9c: c1 cf gtbitsi puke r3 0xf
9e: c1 c7 gtbitsi puke r3 0x7
a0: c1 87 gtbitsi puke r3 0x7
a2: c1 c7 gtbitsi puke r3 0x7
a4: 9e 00 sh r6 0x0
a6: 9f 01 sh r7 0x1
a8: c1 87 gtbitsi puke r3 0x7
aa: c1 c7 gtbitsi puke r3 0x7
ac: 9e 08 sh r6 0x8
ae: 9f 10 sh r7 0x10
b0: c1 87 gtbitsi puke r3 0x7
b2: c1 c7 gtbitsi puke r3 0x7
b4: 9e 09 sh r6 0x9
b6: 9f 02 sh r7 0x2
b8: c1 87 gtbitsi puke r3 0x7
ba: c1 c7 gtbitsi puke r3 0x7
bc: 9e 03 sh r6 0x3
be: 9f 0a sh r7 0xa
c0: c1 87 gtbitsi puke r3 0x7
c2: c1 c7 gtbitsi puke r3 0x7
c4: 9e 11 sh r6 0x11
c6: 9f 18 sh r7 0x18
c8: c1 87 gtbitsi puke r3 0x7
ca: c1 c7 gtbitsi puke r3 0x7
cc: 9e 20 sh r6 0x20
ce: 9f 19 sh r7 0x19
d0: c1 87 gtbitsi puke r3 0x7
d2: c1 c7 gtbitsi puke r3 0x7
d4: 9e 12 sh r6 0x12
d6: 9f 0b sh r7 0xb
d8: c1 87 gtbitsi puke r3 0x7
da: c1 c7 gtbitsi puke r3 0x7
dc: 9e 04 sh r6 0x4
de: 9f 05 sh r7 0x5
e0: c1 87 gtbitsi puke r3 0x7
e2: c1 c7 gtbitsi puke r3 0x7
e4: 9e 0c sh r6 0xc
e6: 9f 13 sh r7 0x13
e8: c1 87 gtbitsi puke r3 0x7
ea: c1 c7 gtbitsi puke r3 0x7
ec: 9e 1a sh r6 0x1a
ee: 9f 21 sh r7 0x21
f0: c1 87 gtbitsi puke r3 0x7
f2: c1 c7 gtbitsi puke r3 0x7
f4: 9e 28 sh r6 0x28
f6: 9f 30 sh r7 0x30
f8: c1 87 gtbitsi puke r3 0x7
fa: c1 c7 gtbitsi puke r3 0x7
fc: 9e 29 sh r6 0x29
fe: 9f 22 sh r7 0x22
100: c1 87 gtbitsi puke r3 0x7
102: c1 c7 gtbitsi puke r3 0x7
104: 9e 1b sh r6 0x1b
106: 9f 14 sh r7 0x14
108: c1 87 gtbitsi puke r3 0x7
10a: c1 c7 gtbitsi puke r3 0x7
10c: 9e 0d sh r6 0xd
10e: 9f 06 sh r7 0x6
110: c1 87 gtbitsi puke r3 0x7
112: c1 c7 gtbitsi puke r3 0x7
114: 9e 07 sh r6 0x7
116: 9f 0e sh r7 0xe
118: c1 87 gtbitsi puke r3 0x7
11a: c1 c7 gtbitsi puke r3 0x7
11c: 9e 15 sh r6 0x15
11e: 9f 1c sh r7 0x1c
120: c1 87 gtbitsi puke r3 0x7
122: c1 c7 gtbitsi puke r3 0x7
124: 9e 23 sh r6 0x23
126: 9f 2a sh r7 0x2a
128: c1 87 gtbitsi puke r3 0x7
12a: c1 c7 gtbitsi puke r3 0x7
12c: 9e 31 sh r6 0x31
12e: 9f 38 sh r7 0x38
130: c1 87 gtbitsi puke r3 0x7
132: c1 c7 gtbitsi puke r3 0x7
134: 9e 39 sh r6 0x39
136: 9f 32 sh r7 0x32
138: c1 87 gtbitsi puke r3 0x7
13a: c1 c7 gtbitsi puke r3 0x7
13c: 9e 2b sh r6 0x2b
13e: 9f 24 sh r7 0x24
140: c1 87 gtbitsi puke r3 0x7
142: c1 c7 gtbitsi puke r3 0x7
144: 9e 1d sh r6 0x1d
146: 9f 16 sh r7 0x16
148: c1 87 gtbitsi puke r3 0x7
14a: c1 c7 gtbitsi puke r3 0x7
14c: 9e 0f sh r6 0xf
14e: 9f 17 sh r7 0x17
150: c1 87 gtbitsi puke r3 0x7
152: c1 c7 gtbitsi puke r3 0x7
154: 9e 1e sh r6 0x1e
156: 9f 25 sh r7 0x25
158: c1 87 gtbitsi puke r3 0x7
15a: c1 c7 gtbitsi puke r3 0x7
15c: 9e 2c sh r6 0x2c
15e: 9f 33 sh r7 0x33
160: c1 87 gtbitsi puke r3 0x7
162: c1 c7 gtbitsi puke r3 0x7
164: 9e 3a sh r6 0x3a
166: 9f 3b sh r7 0x3b
168: c1 87 gtbitsi puke r3 0x7
16a: c1 c7 gtbitsi puke r3 0x7
16c: 9e 34 sh r6 0x34
16e: 9f 2d sh r7 0x2d
170: c1 87 gtbitsi puke r3 0x7
172: c1 c7 gtbitsi puke r3 0x7
174: 9e 26 sh r6 0x26
176: 9f 1f sh r7 0x1f
178: c1 87 gtbitsi puke r3 0x7
17a: c1 c7 gtbitsi puke r3 0x7
17c: 9e 27 sh r6 0x27
17e: 9f 2e sh r7 0x2e
180: c1 87 gtbitsi puke r3 0x7
182: c1 c7 gtbitsi puke r3 0x7
184: 9e 35 sh r6 0x35
186: 9f 3c sh r7 0x3c
188: c1 87 gtbitsi puke r3 0x7
18a: c1 c7 gtbitsi puke r3 0x7
18c: 9e 3d sh r6 0x3d
18e: 9f 36 sh r7 0x36
190: c1 87 gtbitsi puke r3 0x7
192: c1 c7 gtbitsi puke r3 0x7
194: 9e 2f sh r6 0x2f
196: 9f 37 sh r7 0x37
198: c1 87 gtbitsi puke r3 0x7
19a: c1 c7 gtbitsi puke r3 0x7
19c: 9e 3e sh r6 0x3e
19e: 9f 3f sh r7 0x3f
1a0: 09 00 cmpi r1 0x0
1a2: 19 04 bne 0x4
1a4: b7 c8 lil r7 0xc8
1a6: bf 02 lih r7 0x2
1a8: 37 07 *unknown*
1aa: 00 00 nop
1ac: c5 87 probebitsi r3 0x7
1ae: 00 00 nop
1b0: 0e ff cmpi r6 0xff
1b2: 19 04 bne 0x4
1b4: 00 00 nop
1b6: ea 00 codesearch 0x1 0x0
1b8: c1 cf gtbitsi puke r3 0xf
1ba: c1 cf gtbitsi puke r3 0xf
1bc: b6 80 lil r6 0x80
1be: be 00 lih r6 0x0
1c0: 41 6e add r2 r5 r6
1c2: 70 28 copyto rpage r5
1c4: 00 00 nop
1c6: c1 c7 gtbitsi puke r3 0x7
1c8: c1 87 gtbitsi puke r3 0x7
1ca: c1 c7 gtbitsi puke r3 0x7
1cc: 9e 00 sh r6 0x0
1ce: 9f 01 sh r7 0x1
1d0: c1 87 gtbitsi puke r3 0x7
1d2: c1 c7 gtbitsi puke r3 0x7
1d4: 9e 08 sh r6 0x8
1d6: 9f 10 sh r7 0x10
1d8: c1 87 gtbitsi puke r3 0x7
1da: c1 c7 gtbitsi puke r3 0x7
1dc: 9e 09 sh r6 0x9
1de: 9f 02 sh r7 0x2
1e0: c1 87 gtbitsi puke r3 0x7
1e2: c1 c7 gtbitsi puke r3 0x7
1e4: 9e 03 sh r6 0x3
1e6: 9f 0a sh r7 0xa
1e8: c1 87 gtbitsi puke r3 0x7
1ea: c1 c7 gtbitsi puke r3 0x7
1ec: 9e 11 sh r6 0x11
1ee: 9f 18 sh r7 0x18
1f0: c1 87 gtbitsi puke r3 0x7
1f2: c1 c7 gtbitsi puke r3 0x7
1f4: 9e 20 sh r6 0x20
1f6: 9f 19 sh r7 0x19
1f8: c1 87 gtbitsi puke r3 0x7
1fa: c1 c7 gtbitsi puke r3 0x7
1fc: 9e 12 sh r6 0x12
1fe: 9f 0b sh r7 0xb
200: c1 87 gtbitsi puke r3 0x7
202: c1 c7 gtbitsi puke r3 0x7
204: 9e 04 sh r6 0x4
206: 9f 05 sh r7 0x5
208: c1 87 gtbitsi puke r3 0x7
20a: c1 c7 gtbitsi puke r3 0x7
20c: 9e 0c sh r6 0xc
20e: 9f 13 sh r7 0x13
210: c1 87 gtbitsi puke r3 0x7
212: c1 c7 gtbitsi puke r3 0x7
214: 9e 1a sh r6 0x1a
216: 9f 21 sh r7 0x21
218: c1 87 gtbitsi puke r3 0x7
21a: c1 c7 gtbitsi puke r3 0x7
21c: 9e 28 sh r6 0x28
21e: 9f 30 sh r7 0x30
220: c1 87 gtbitsi puke r3 0x7
222: c1 c7 gtbitsi puke r3 0x7
224: 9e 29 sh r6 0x29
226: 9f 22 sh r7 0x22
228: c1 87 gtbitsi puke r3 0x7
22a: c1 c7 gtbitsi puke r3 0x7
22c: 9e 1b sh r6 0x1b
22e: 9f 14 sh r7 0x14
230: c1 87 gtbitsi puke r3 0x7
232: c1 c7 gtbitsi puke r3 0x7
234: 9e 0d sh r6 0xd
236: 9f 06 sh r7 0x6
238: c1 87 gtbitsi puke r3 0x7
23a: c1 c7 gtbitsi puke r3 0x7
23c: 9e 07 sh r6 0x7
23e: 9f 0e sh r7 0xe
240: c1 87 gtbitsi puke r3 0x7
242: c1 c7 gtbitsi puke r3 0x7
244: 9e 15 sh r6 0x15
246: 9f 1c sh r7 0x1c
248: c1 87 gtbitsi puke r3 0x7
24a: c1 c7 gtbitsi puke r3 0x7
24c: 9e 23 sh r6 0x23
24e: 9f 2a sh r7 0x2a
250: c1 87 gtbitsi puke r3 0x7
252: c1 c7 gtbitsi puke r3 0x7
254: 9e 31 sh r6 0x31
256: 9f 38 sh r7 0x38
258: c1 87 gtbitsi puke r3 0x7
25a: c1 c7 gtbitsi puke r3 0x7
25c: 9e 39 sh r6 0x39
25e: 9f 32 sh r7 0x32
260: c1 87 gtbitsi puke r3 0x7
262: c1 c7 gtbitsi puke r3 0x7
264: 9e 2b sh r6 0x2b
266: 9f 24 sh r7 0x24
268: c1 87 gtbitsi puke r3 0x7
26a: c1 c7 gtbitsi puke r3 0x7
26c: 9e 1d sh r6 0x1d
26e: 9f 16 sh r7 0x16
270: c1 87 gtbitsi puke r3 0x7
272: c1 c7 gtbitsi puke r3 0x7
274: 9e 0f sh r6 0xf
276: 9f 17 sh r7 0x17
278: c1 87 gtbitsi puke r3 0x7
27a: c1 c7 gtbitsi puke r3 0x7
27c: 9e 1e sh r6 0x1e
27e: 9f 25 sh r7 0x25
280: c1 87 gtbitsi puke r3 0x7
282: c1 c7 gtbitsi puke r3 0x7
284: 9e 2c sh r6 0x2c
286: 9f 33 sh r7 0x33
288: c1 87 gtbitsi puke r3 0x7
28a: c1 c7 gtbitsi puke r3 0x7
28c: 9e 3a sh r6 0x3a
28e: 9f 3b sh r7 0x3b
290: c1 87 gtbitsi puke r3 0x7
292: c1 c7 gtbitsi puke r3 0x7
294: 9e 34 sh r6 0x34
296: 9f 2d sh r7 0x2d
298: c1 87 gtbitsi puke r3 0x7
29a: c1 c7 gtbitsi puke r3 0x7
29c: 9e 26 sh r6 0x26
29e: 9f 1f sh r7 0x1f
2a0: c1 87 gtbitsi puke r3 0x7
2a2: c1 c7 gtbitsi puke r3 0x7
2a4: 9e 27 sh r6 0x27
2a6: 9f 2e sh r7 0x2e
2a8: c1 87 gtbitsi puke r3 0x7
2aa: c1 c7 gtbitsi puke r3 0x7
2ac: 9e 35 sh r6 0x35
2ae: 9f 3c sh r7 0x3c
2b0: c1 87 gtbitsi puke r3 0x7
2b2: c1 c7 gtbitsi puke r3 0x7
2b4: 9e 3d sh r6 0x3d
2b6: 9f 36 sh r7 0x36
2b8: c1 87 gtbitsi puke r3 0x7
2ba: c1 c7 gtbitsi puke r3 0x7
2bc: 9e 2f sh r6 0x2f
2be: 9f 37 sh r7 0x37
2c0: c1 87 gtbitsi puke r3 0x7
2c2: c1 c7 gtbitsi puke r3 0x7
2c4: 9e 3e sh r6 0x3e
2c6: 9f 3f sh r7 0x3f
2c8: bd 01 lih r5 0x1
2ca: b5 f8 lil r5 0xf8
2cc: 7c 05 *unknown*
2ce: b7 da lil r7 0xda
2d0: bf ff lih r7 0xff
2d2: 71 78 copyto alpha_l r7
2d4: 71 80 copyto beta_h r0
2d6: b7 ff lil r7 0xff
2d8: bf ff lih r7 0xff
2da: 71 f8 copyto beta_h r7
2dc: 72 00 copyto beta_l r0
2de: eb 00 codesearch 0x1 0x1
2e0: c3 cf gtbitsi swallow r7 0xf
2e2: c3 cf gtbitsi swallow r7 0xf
2e4: be 00 lih r6 0x0
2e6: b6 01 lil r6 0x1
2e8: 45 fe *unknown*
2ea: 45 fe *unknown*
2ec: c3 47 gtbitsi swallow r6 0x7
2ee: 45 fe *unknown*
2f0: 19 fd bne 0xfd
2f2: 00 00 nop
2f4: 71 48 copyto alpha_l r1
2f6: 64 49 xor r0 r1 r1
2f8: 64 92 xor r1 r2 r2
2fa: 64 db xor r1 r3 r3
2fc: bd 01 lih r5 0x1
2fe: b5 f0 lil r5 0xf0
300: be 80 lih r6 0x80
302: b6 00 lil r6 0x0
304: 00 00 nop
306: 79 c5 *unknown*
308: 4d fe and r3 r7 r6
30a: 48 3e *unknown*
30c: 19 fb bne 0xfb
30e: 00 00 nop
310: 73 60 copyto cmp_l r4
312: 70 20 copyto rpage r4
314: 73 80 copyto mask_h r0
316: 00 00 nop
318: b5 00 lil r5 0x0
31a: bd 00 lih r5 0x0
31c: 72 68 copyto beta_l r5
31e: 00 00 nop
320: 00 00 nop
322: e6 00 blockrunsizeparse swallow
324: b5 06 lil r5 0x6
326: bd 00 lih r5 0x0
328: 72 68 copyto beta_l r5
32a: 00 00 nop
32c: 00 00 nop
32e: e4 00 blockrunsizeparse puke
330: b5 00 lil r5 0x0
332: 77 f8 *unknown*
334: bd 01 lih r5 0x1
336: 4d 7d and r2 r7 r5
338: 18 fc beq 0xfc
33a: b5 00 lil r5 0x0
33c: 85 00 lh r5 0x0
33e: 40 4d add r0 r1 r5
340: 99 00 sh r1 0x0
342: 77 68 *unknown*
344: b6 80 lil r6 0x80
346: be 00 lih r6 0x0
348: 41 6e add r2 r5 r6
34a: 73 68 copyto cmp_l r5
34c: 70 28 copyto rpage r5
34e: 73 80 copyto mask_h r0
350: 00 00 nop
352: b5 00 lil r5 0x0
354: bd 00 lih r5 0x0
356: 72 68 copyto beta_l r5
358: 00 00 nop
35a: 00 00 nop
35c: e6 00 blockrunsizeparse swallow
35e: b5 06 lil r5 0x6
360: bd 00 lih r5 0x0
362: 72 68 copyto beta_l r5
364: 00 00 nop
366: 00 00 nop
368: e4 00 blockrunsizeparse puke
36a: b5 00 lil r5 0x0
36c: 77 f8 *unknown*
36e: bd 01 lih r5 0x1
370: 4d 7d and r2 r7 r5
372: 18 fc beq 0xfc
374: b5 00 lil r5 0x0
376: 85 00 lh r5 0x0
378: 40 4d add r0 r1 r5
37a: 99 00 sh r1 0x0
37c: 75 68 *unknown*
37e: 48 28 *unknown*
380: 18 74 beq 0x74
382: 00 00 nop
384: 0d 01 cmpi r5 0x1
386: 18 3b beq 0x3b
388: 00 00 nop
38a: 77 68 *unknown*
38c: b6 80 lil r6 0x80
38e: be 00 lih r6 0x0
390: 41 6e add r2 r5 r6
392: 73 68 copyto cmp_l r5
394: 70 28 copyto rpage r5
396: 73 80 copyto mask_h r0
398: 00 00 nop
39a: b5 00 lil r5 0x0
39c: bd 00 lih r5 0x0
39e: 72 68 copyto beta_l r5
3a0: 00 00 nop
3a2: 00 00 nop
3a4: e6 00 blockrunsizeparse swallow
3a6: b5 06 lil r5 0x6
3a8: bd 00 lih r5 0x0
3aa: 72 68 copyto beta_l r5
3ac: 00 00 nop
3ae: 00 00 nop
3b0: e4 00 blockrunsizeparse puke
3b2: b5 00 lil r5 0x0
3b4: 77 f8 *unknown*
3b6: bd 01 lih r5 0x1
3b8: 4d 7d and r2 r7 r5
3ba: 18 fc beq 0xfc
3bc: b5 00 lil r5 0x0
3be: 85 00 lh r5 0x0
3c0: 40 4d add r0 r1 r5
3c2: 99 00 sh r1 0x0
3c4: 77 68 *unknown*
3c6: b6 80 lil r6 0x80
3c8: be 00 lih r6 0x0
3ca: 41 6e add r2 r5 r6
3cc: 73 68 copyto cmp_l r5
3ce: 70 28 copyto rpage r5
3d0: 73 80 copyto mask_h r0
3d2: 00 00 nop
3d4: b5 00 lil r5 0x0
3d6: bd 00 lih r5 0x0
3d8: 72 68 copyto beta_l r5
3da: 00 00 nop
3dc: 00 00 nop
3de: e6 00 blockrunsizeparse swallow
3e0: b5 06 lil r5 0x6
3e2: bd 00 lih r5 0x0
3e4: 72 68 copyto beta_l r5
3e6: 00 00 nop
3e8: 00 00 nop
3ea: e4 00 blockrunsizeparse puke
3ec: b5 00 lil r5 0x0
3ee: 77 f8 *unknown*
3f0: bd 01 lih r5 0x1
3f2: 4d 7d and r2 r7 r5
3f4: 18 fc beq 0xfc
3f6: b5 00 lil r5 0x0
3f8: 85 00 lh r5 0x0
3fa: 40 4d add r0 r1 r5
3fc: 99 00 sh r1 0x0
3fe: 77 68 *unknown*
400: 41 6e add r2 r5 r6
402: 73 68 copyto cmp_l r5
404: 70 28 copyto rpage r5
406: 73 80 copyto mask_h r0
408: 00 00 nop
40a: b5 03 lil r5 0x3
40c: bd 00 lih r5 0x0
40e: 72 68 copyto beta_l r5
410: 00 00 nop
412: 00 00 nop
414: e6 00 blockrunsizeparse swallow
416: b5 17 lil r5 0x17
418: bd 00 lih r5 0x0
41a: 72 68 copyto beta_l r5
41c: 00 00 nop
41e: 00 00 nop
420: e4 00 blockrunsizeparse puke
422: b5 00 lil r5 0x0
424: 77 f8 *unknown*
426: bd 01 lih r5 0x1
428: 4d 7d and r2 r7 r5
42a: 18 fc beq 0xfc
42c: b5 00 lil r5 0x0
42e: 85 00 lh r5 0x0
430: 40 95 add r1 r2 r5
432: 9a 00 sh r2 0x0
434: 77 68 *unknown*
436: 41 6e add r2 r5 r6
438: 73 68 copyto cmp_l r5
43a: 70 28 copyto rpage r5
43c: 73 80 copyto mask_h r0
43e: 00 00 nop
440: b5 03 lil r5 0x3
442: bd 00 lih r5 0x0
444: 72 68 copyto beta_l r5
446: 00 00 nop
448: 00 00 nop
44a: e6 00 blockrunsizeparse swallow
44c: b5 17 lil r5 0x17
44e: bd 00 lih r5 0x0
450: 72 68 copyto beta_l r5
452: 00 00 nop
454: 00 00 nop
456: e4 00 blockrunsizeparse puke
458: b5 00 lil r5 0x0
45a: 77 f8 *unknown*
45c: bd 01 lih r5 0x1
45e: 4d 7d and r2 r7 r5
460: 18 fc beq 0xfc
462: b5 00 lil r5 0x0
464: 85 00 lh r5 0x0
466: 40 dd add r1 r3 r5
468: 9b 00 sh r3 0x0
46a: 77 68 *unknown*
46c: 41 6e add r2 r5 r6
46e: 70 28 copyto rpage r5
470: 00 00 nop
472: bd 01 lih r5 0x1
474: b5 f8 lil r5 0xf8
476: 7c 05 *unknown*
478: b4 00 lil r4 0x0
47a: bc 80 lih r4 0x80
47c: b5 00 lil r5 0x0
47e: bd 14 lih r5 0x14
480: 41 25 add r2 r4 r5
482: 70 20 copyto rpage r4
484: 00 00 nop
486: 84 04 lh r4 0x4
488: 00 00 nop
48a: 48 20 *unknown*
48c: 18 0a beq 0xa
48e: 00 00 nop
490: bf de lih r7 0xde
492: b7 ad lil r7 0xad
494: be 00 lih r6 0x0
496: b6 02 lil r6 0x2
498: 73 f0 copyto mask_h r6
49a: 00 00 nop
49c: 64 49 xor r0 r1 r1
49e: 64 92 xor r1 r2 r2
4a0: 64 db xor r1 r3 r3
4a2: 84 02 lh r4 0x2
4a4: b5 01 lil r5 0x1
4a6: bd 00 lih r5 0x0
4a8: 45 25 *unknown*
4aa: 19 0a bne 0xa
4ac: 9c 02 sh r4 0x2
4ae: 84 00 lh r4 0x0
4b0: 00 00 nop
4b2: 9c 02 sh r4 0x2
4b4: 84 03 lh r4 0x3
4b6: 45 25 *unknown*
4b8: 19 03 bne 0x3
4ba: 9c 03 sh r4 0x3
4bc: 1f 10 *unknown*
4be: 00 00 nop
4c0: 77 28 *unknown*
4c2: b6 00 lil r6 0x0
4c4: be f8 lih r6 0xf8
4c6: 4d 26 and r2 r4 r6
4c8: b6 00 lil r6 0x0
4ca: be 08 lih r6 0x8
4cc: 65 26 xor r2 r4 r6
4ce: bd 01 lih r5 0x1
4d0: b5 f0 lil r5 0xf0
4d2: be 80 lih r6 0x80
4d4: b6 00 lil r6 0x0
4d6: b7 04 lil r7 0x4
4d8: bf 03 lih r7 0x3
4da: 37 07 *unknown*
4dc: 00 00 nop
4de: bd 01 lih r5 0x1
4e0: b5 f0 lil r5 0xf0
4e2: be 80 lih r6 0x80
4e4: b6 00 lil r6 0x0
4e6: 00 00 nop
4e8: 79 c5 *unknown*
4ea: 4d fe and r3 r7 r6
4ec: 48 3e *unknown*
4ee: 19 fb bne 0xfb
4f0: 00 00 nop
4f2: b7 0d lil r7 0xd
4f4: bf 90 lih r7 0x90
4f6: 00 00 nop
4f8: 1f fe *unknown*
And that looks like a very plausible BSP source code to me :)

With what we just learned, we can indeed strip the 16 bytes header from compiled code and it will run just fine on the BSP.

Now that we have a better understanding of the BSP, it is clear that it is a very different thing to the N64's RDP... But is it so different ? After all, they share a common goal that is bitstream processing. So, would it be possible to replicate the RDP behavior with our BSP ? :unsure: That would definitly be a very long shot...
 
Last edited:
  • Like
Reactions: Geoman
Just checking, do you already have the "VICE Design Specification 099-0123-003" PDF?
I stumbled onto it by accident, turns out NetBSD developer Michael Lorenz (macallan), actually had it uploaded on the NetBSD CDN besides the CRIME 1.5 specification, in a file called "docs.tar.bz2", which also contains the "GBE ASIC spec for REV1.1" PDF.

With the MACE, VICE, GBE, and CRIME 1.5 specification documents being available, I think other than the R5k/R7k specific CRIME 1.1 specification (which the R10k/R12k CRIME 1.5 spec does touch upon somewhat, detailing differences between the revisions), that should be all O2 ASICs :cool:

Doesn't seem like it's particularly well-known, given the documents aren't indexed on search engines and such due to being in a compressed tarball.
 
Wow :oops:

I must say I have no words to THANK YOU enough for sharing this ! This is a pure gold mine of infos I couldn't even hope to find for this project... Everything is in there !
I've always wondered how the authors of the drivers and patches from 2002 gathered all this knowledge of the chip, now I have the answer.

Once again, a HUGE THANK YOU :)
 
Wow :oops:

I must say I have no words to THANK YOU enough for sharing this ! This is a pure gold mine of infos I couldn't even hope to find for this project... Everything is in there !
I've always wondered how the authors of the drivers and patches from 2002 gathered all this knowledge of the chip, now I have the answer.

Once again, a HUGE THANK YOU :)

No problem, as soon as I realized what was in that tarball, I figured it'd be good to share it as it seems to be thought of as a long lost document.
If the tarball truly has been there since 2011, it's been hiding in plain sight for a long time!
 
  • Like
Reactions: mapesdhs and Geoman
Less VICE, more CRIME

After some good quality reading, it appears that our BSP may not be the best candidate to implement RDP rasterization operations, but it's maybe not out of the game yet ;)

The RDP commands are encoded as 64 bits payloads, that drives the RDP to output primitives (points, lines, triangles or rectangles) to a frame buffer. These primitives can be flat, shaded, textured, z buffered, or a combination of these, as summarized in the example below for triangle operations :

(all the following screenshots comes from SGI's "Nintendo Ultra64 RDP Command Summary" document)

1731680075348.png


The O2 provides a chip dedicated to this kind of operations in the form of the CRIME's Rendering Engine, and more specifically its Pixel Pipeline. All operations like primitives output, texture mapping, antialiasing, fog, scissoring, logic ops, etc etc are provided through an extensive set of registers. The host has to directly write to the desired registers to configure the requested operation, and then trigger execution by offsetting the last register write. As register writes may occur faster than the rendering engine can process them, an internal FIFO buffer is used to queue requests, and a some features are provided to prevent the FIFO buffer to overflow.

If we wanted to perform a "simple" RDP operation like, say, draw a flat triangle, the RDP command would look something like this :

1731681305574.png


Where one would expect a simple set of X,Y coordinates for 3 vertices (at least that's what the CRIME pixel pipeline expects !), we are greeted with multiple coordinates and slope factors :(. The meaning of the various parameters is explained in the following figure :

1731681610683.png


In fact, that makes more sense from the rasterizer point of view : the triangle is not defined by 3 vertices, but by 3 edges functions E(x,y). These functions makes it fairly easy to know for each pixel if it lies within the triangle (eg. all 3 functions give E(x,y) > 0), outside the triangle ( E(x,y) < 0 ) or on the edge ( E(x,y) = 0 ).
This is illustrated in the following figures :

1731682451257.png
1731682461976.png

(source www.scratchapixel.com)

The coordinates and coefficients from the RDP command described above are derived from the edge (or "half plane") functions.

This is all great and dandy for our RDP, but if we wanted to emulate it with the help of the CRIME, we would need to convert all of this to plain X,Y coordinates for each triangle vertex.

Hopefully the N64 emulation scene is vast :), and a few people working on RDP LLE (Low Level Emulation) faced this exact same issue. The author of the GlideN64 plugin known by the name of Gonetz described this very well in two excellent blog posts (part 1 and part 2). TLDR : he solved many of his previous plugin issues (performance and quality related) by computing the line equations and the sub pixel coordinates of the vertices.

If we had to implement such an RDP emulation layer on the O2, a possible approach could be to use the BSP for this specific purpose : expose its FIFO buffer to the RSP (or the CPU in some cases) for it to push RDP commands, then decode the RDP commands, compute vertices coordinates and parameters for triangle operations, and push the corresponding operations and parameters to the CRIME pixel pipeline registers. Not sure if the BSP would have enough horsepower for that, the only computationally intensive task being the triangle parameters conversion.


Now, enough with the theory, time to code bits and bobs to experiment with the CRIME ! An interesting point I didn't mention is that the CRIME framebuffer is not linear (eg. a contiguous chunk of memory), but tiled. Each tile is 64k and represents a 128x128 32bits pixels area, or a 256x128 area of 16bits pixels, or 512x128 in 8bits mode. As the framebuffer memory is scattered all over the place, the CRIME accesses it through its TLB, with one entry for each tile. This is fine as long as the framebuffer is accessed through the CRIME engine, but may be a problem if our N64 CPU wanted to access it directly (like in the code samples of my second post).
In fact, the CRIME provides many TLBs : 3 tiled TLBs, 2 linears TLBs and a few other. So I guess this issue would be adressed diferently depending on wether we want to draw directly in the Xsgi framebuffer or use a dedicated onscreen/offscreen framebuffer.

So, once we have properly populated the used TLB, we can start pushing commands to it :

1731685587945.png

(push some red triangles and play with various logic ops)

And here is the result
snap4.png

(triangles on my toolchest !)
 
Hey Bplaa, this is very interesting work and I appreciate that you're doing some exploration in it.

I'll be excited to see what you come up with in the coming months.
 
Wow :oops:

I must say I have no words to THANK YOU enough for sharing this ! This is a pure gold mine of infos I couldn't even hope to find for this project... Everything is in there !
I've always wondered how the authors of the drivers and patches from 2002 gathered all this knowledge of the chip, now I have the answer.

Once again, a HUGE THANK YOU :)
WOW is an understatement, it looks as though the full suite of O2 technical documents has been hiding in plain sight elsewhere also

googling for VICE Design Specification 099-0123-003 yielded this result

 
DMA and TLB management

Not much update on my side in this thread lately but I was able to get some progress nonetheless.

As a short term goal, I would like to be able to emulate this kind of ROMs, and it will be the common thread of the following posts.
These are nice samples because they involve the CPU, the RSP and the RDP with DMA transfers in-between. Basically, this listing defines 3 segments : CPU code, RSP code and RDP commands. Then, the execution sequence looks like :
  • CPU code :
    • Initialization
    • Sets DMEM for RDP commands (more on that later, RDP commands can be fed from RCP DMEM or main system memory)
    • DMA (CPU side) the RSP code to IMEM
    • DMA (CPU side) the RDP commands to DMEM
    • Kick off RSP execution
    • Busy loop
  • RSP Code
    • Sets the RDP command list START and AND addresses
    • Break
  • RDP Commands
    • Clear the framebuffer background (Fill rectangle)
    • Draw 4 overlapping rectangle with different depths
Expected result :
1737200267747.png



So, first stop will be DMA management, as it's a key component for moving data in and out of the N64's RCP.

The N64 DMA engine can be driven either by the CPU by writing to memory mapped registers (SP registers, base address 0x04040000), or by the CPU/RSP by using various COP0 (co-processor 0) registers. The mapping looks like he following from the CPU point of view (note these register are not only related to DMA engine, but also RSP/RDP status and RDP I/O) :

1737200871153.png


On the other side, the VICE DMA engine provides a similar interface with memory mapped registers (from 0x1000 offset) and lots of COP3 (MSP view) registers. Apart from a few main control / status registers, you will find not less than 64 16 bit registers (!!) for managing the VICE DMA engine. The reason for this abundance of registers is that this engine offers 2 independent DMA channels, and each channel provides 4 descriptors. You can configure each descriptor for a different DMA operation and the engine will process each descriptor sequentially if requested to do so.

I won't go into too much details about the various operation modes and capabilities of the VICE DMA engine, but from our emulation perspective there are some points of interest :
  • Triggering a DMA transfer on the Vice requires more house keeping than on the RCP. This means that we can't rely on a simple 1:1 register mapping for the emulation, and we'll have to intercept writes to the RCP DMA control registers to setup the Vice DMA engine accordingly
  • For now all the N64 samples I've seen work with physical addresses, I still have 0 clue about the TLB management of the N64. On the other side, the Vice DMA engine requires the usage of the Vice TLB, so we have to properly setup the TLB prior to any transfer. The Vice DMA can operate on physical addresses, but for now that's not very usefull for our use case.
To prepare the TLB and Vice DMA buffers we'll need some help from our kernel driver. More specifically, we need to :
  1. Allocate some 64k buffers in the kernel and store the physical addresses. No need for the buffers to be contiguous.
  2. Map the buffers to virtual memory pages
  3. When requested to from userland (via an IOCTL) fill the TLB with the physical addresses of the requested buffers
From userland this would look like :
  1. mmap() a buffer of desired length
  2. Fill the buffer with appropriate data (cartridge memory for example)
  3. IOCTL the desired buffer set to fill the TLB
You may wonder what's the point of filling the TLB on demand from userland when that could be a one time setup ? The problem is that the TLB contains 64 entries for 64k pages, which gives us 4 MB of mapped memory. This would have been sufficient for handling the whole 4 MB of main memory of a base N64 without expansion pack, but unfortunately the DMA transfers can (and I believe most of the times) operate from cartridge memory which can be far bigger than this (64 MB ?). So the idea is beeing able to switch the TLB buffers set according to the memory region we want to DMA. Of course this TLB refill is a costly operation and we will need to minimize it based on some strategy, but that's for another day, we'll live with 4 MB for now !

So, let's put all of this together and setup a DMA transfer from system memory to the Vice DRAM, and dump Vice DRAM to check the result :

C:
    int fd = open("/hw/vicen64", O_RDWR);
    void *regs = mmap(NULL, VICE_IO_MAX_OFFSET + 1,
             PROT_READ | PROT_WRITE,
             MAP_SHARED, fd, 0);
    printf("regs %X\n", regs);

    void *vice_buffers = mmap(NULL, VICE_BUFF_LEN,
             PROT_READ | PROT_WRITE,
             MAP_SHARED, fd, VICE_MIN_OFFSET);
    printf("buffers %X\n", vice_buffers);
 
    // Clear Vice DRAM and buffers
    memset(regs + VICE_DRAM_A, 0, 0x800);
    memset(vice_buffers, 0, VICE_BUFF_LEN);

    // Fill a known pattern in our buffer
    ((unsigned char *)vice_buffers)[0] = 0x23;
    ((unsigned char *)vice_buffers)[1] = 0x24;
    ((unsigned char *)vice_buffers)[2] = 0x25;
    ((unsigned char *)vice_buffers)[3] = 0x26;

    // setup DMA TLB
    if(ioctl(fd, VICE_IOCTL_MAP_DMA, 0) == -1) {
        fprintf(stderr, "error mmapping buffer 0 into VICE: %s\n",
                    strerror(errno));
        return NULL;
    }

    vice_write_reg(regs, DMA_CH1_CTL,VICE_DMA_CTL_RESET);
    vice_write_reg(regs, DMA_CH2_CTL,VICE_DMA_CTL_RESET);
    usleep(16);
    vice_write_reg(regs, DMA_CH1_CTL,VICE_DMA_CTL_DESC1);
    vice_write_reg(regs, DMA_CH2_CTL,VICE_DMA_CTL_DESC1);
    usleep(16);

    print_vice_dma_status(vice_read_reg(regs, DMA_CH1_STAT));

    // Setup the transfer
    vice_write_reg(regs, 0x1000, VICE_DMA_TOVICE|VICE_DMA_HALT); // DMA_CTL_CH1_D1
    vice_write_reg(regs, 0x1008, 0x0080); // DMA_SMEM_HI_CH1_D1
    vice_write_reg(regs, 0x1010, 0x0); // DMA_SMEM_LO_CH1_D1
    vice_write_reg(regs, 0x1018, 64); // DMA_WIDTH_CH1_D1
    vice_write_reg(regs, 0x1020, 0); // DMA_STRIDE_CH1_D1
    vice_write_reg(regs, 0x1028, 1); // DMA_LINES_CH1_D1
    vice_write_reg(regs, 0x1030, VICE_DRAM_A); // DMA_VMEM_Y_CH1_D1
    vice_write_reg(regs, 0x1038, 0); // DMA_VMEM_C_CH1_D1

    printf("DMA RUN");
    vice_write_reg(regs, DMA_CH1_CTL,(1 << 4)|VICE_DMA_CTL_GO);
    while (!(vice_read_reg(regs, DMA_CH1_STAT) & 1));
        printf(".");
    printf(" Done\n");

    dump_buf(regs + VICE_DRAM_A, 0x800);

    print_vice_dma_status(vice_read_reg(regs, DMA_CH1_STAT));

And the result :

regs 4000000
buffers 400F000
DMA Done : DMA Complete
DMA Error : No DMA error has occurred
DMA Active : DMA is not running
DMA R/W : Write descriptor
DMA Descriptor : DMA working on or pointing to First Descriptor Set
DMA Status Code : DMA Idle
DMA RUN. Done
000000 23 24 25 26 00 00 00 00 00 00 00 00 00 00 00 00 #$%&............
000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
<snip>
DMA Done : DMA Complete
DMA Error : No DMA error has occurred
DMA Active : DMA is not running
DMA R/W : Read descriptor
DMA Descriptor : DMA working on or pointing to First Descriptor Set
DMA Status Code : DMA Halted from DMA Halt bit in Descriptor
 
Last edited:
Memory access interception

Now that we have various bits and bobs of experimentation code, it's time to put the pieces we have together. To use these pieces in some kind of emulation, we will need to be able to call various routines when the emulated code writes (or read) memory regions of interest, specifically memory mapped control registers.

As I mentioned in the first post, there are various options to handle this.

The first one that comes to mind is write-protecting the memory segments we're interesting in, then handle the page fault, call our emulation routine, un-protect the page, write the data, re-protect the page, then continue execution. I didn't test this scenario because not only it feels terribly slow and a bit painful, and also because it can't intercept read access which is also required (unless you disable read access, not sure if it's possible). This would also require protecting full memory segments where we would want more fine grained control to catch access to specific registers.

The second possibility would be to patch the code to jump to emulation routine for each memory access we're interested in. This is probably the best scenario in terms of performance, but also the most complex because this means detecting access to specific memory region when statically analyzing the assembly code, so not trivial at all. More on the challenges of static patching later.

The third one involves memory watchpoints and debugging tooling. I've explored various possibilities kernel side, but the most obvious and simple way seems to be the procfs filesystem. This pseudo filesystem provides userland access to running process informations. More interestingly, it also provides some debugging facilities, especially the memory watchpoints we're interested in. Let's dig into that option, and put some code together to test it.

First, we need to setup a memory region to watch, then set a SIGTRAP signal handler to catch the signal generated by the kernel on memory watchpoint access, and finally setup the specific watchpoints we want to catch :
C:
    // The memory we're interested in
    unsigned char test[32];

    // Open a file descriptor to our own process in the procfs
    char proc_file[32];
    mypid = getpid();
    snprintf(proc_file, 32, "/proc/%010d", mypid);
    proc_fd = open(proc_file, O_RDWR);

    // Set a SIGTRAP handler, and get some infos in our handler
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_sigaction = sigtrap_handler;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGTRAP, &sa, 0);

    // Setup 2 watchpoints, 1 byte each, one for write access and one for read access
    prwatch_t prw;
    prw.pr_vaddr = test;
    prw.pr_size = 1;
    prw.pr_wflags = MA_WRITE;
    int res = ioctl(proc_fd, PIOCSWATCH, &prw);

    prw.pr_vaddr = test + 1;
    prw.pr_size = 1;
    prw.pr_wflags = MA_READ;
    res = ioctl(proc_fd, PIOCSWATCH, &prw);

The signal handler could look like this :

C:
void sigtrap_handler(int sig, siginfo_t* siginfo, ucontext_t* ucontext)
{
    printf("Got SIGTRAP SIGNO %d SICODE %d ERRNO %d ADDR %X EPC %X\n",
        siginfo->si_signo, siginfo->si_code, siginfo->si_errno, siginfo->si_addr, ucontext->uc_mcontext.__gregs[CTX_EPC]);
<snip>
}

Now, the the next step is to get the memory address that generated the SIGTRAP so we can call the right routine. From the documentation, we are supposed to get the accessed memory address in the si_addr member of the siginfo_t structure. Unfortunately my experiments showed that we don't get anything there :(.

Hopefully, we can get valuable info in the ucontext parameter of the handler. This ucontext_t structure contains the full user context (general registers, floating point registers, exception PC and a few other stuff) at the time of exception raising.
The first usefull info is the exception PC, because it means we can get the specific instruction that raised the exception. Then, the user context provides all the infos we need to complete the puzzle, that is all the registers at the time of the exception.
So, by decoding the instruction we can tell if the access was read or write, and with the registers we can tell the specific address of the memory access. Now our SIGTRAP handler could look like :

C:
void sigtrap_handler(int sig, siginfo_t* siginfo, ucontext_t* ucontext)
{
    printf("Got SIGTRAP SIGNO %d SICODE %d ERRNO %d ADDR %X EPC %X\n",
        siginfo->si_signo, siginfo->si_code, siginfo->si_errno, siginfo->si_addr, ucontext->uc_mcontext.__gregs[CTX_EPC]);
    unsigned int insn = *(unsigned int *)(ucontext->uc_mcontext.__gregs[CTX_EPC]);

    unsigned char base, rt;
    short offset;
    switch (insn >> 26)
    {
        // Read
        case 0b100100: // LBU
            base = (insn >> 21) & 0b11111;
            rt = (insn >> 16) & 0b11111;
            offset = insn & 0xFFFF;
            printf("LBU $%d, %d($%d)\n", rt, offset, base);
            printf("%d <- %X\n", ucontext->uc_mcontext.__gregs[rt], ucontext->uc_mcontext.__gregs[base] + offset);
            // Now we know it's read access into rt register, at ucontext->uc_mcontext.__gregs[base] + offset location
        break;
        // Write
        case 0b101000: // SB
            base = (insn >> 21) & 0b11111;
            rt = (insn >> 16) & 0b11111;
            offset = insn & 0xFFFF;
            printf("SB $%d, %d($%d)\n", rt, offset, base);
            printf("%X <- %d\n", ucontext->uc_mcontext.__gregs[base] + offset, ucontext->uc_mcontext.__gregs[rt]);
            // Now we know it's write access to ucontext->uc_mcontext.__gregs[base] + offset location, from rt register
        break;
    }
}
(printf and syscalls are very very bad in signal handlers, I promise not to do it anymore !)

Progress ! But unfortunately we still have an issue in our hands. When the signal handler returns control to the kernel, the process execution resumes where the exception occured, so we'll get an inifinite loop...
We need to find a way to resume execution at the next instruction. I experimented various stuff with inline assembly, from jumping back at EPC + 4 or messing with the stack to change the return address, but all this proved to be a dead end. Finally I settled with :
C:
ucontext->uc_mcontext.__gregs[CTX_EPC] += 4;
And it does the trick :)
When the user process returns control to the kernel, the kernel will restore user context from the ucontext_t structure. More interestingly, it will restore PC from the CTX_EPC "register" of the ucontext_t structure.

This is all fine : we caught read/write access at a specific memory location, were able to get the specific instruction, registers and memory address at access time, but we have one last problem.
If we want our read/write intruction to *effectively* read/write the memory, we can't simply read/write memory from our handler, because it will trigger the watchpoint again and end up in an infinite loop !

Hopefully, the procfs gives us a solution to this issue : we can read() or write() to our procfs file descriptor at an offset representing the process virtual address, and the memory will be read/written without trigering our watchpoints :)

All in all, this gives us all we need to intercept reads and writes and trigger the right emulation handlers. This is probably *very* slow because of the context switches, but that will do for now !
 
Last edited:
  • Like
Reactions: rooprob and Hubster

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu