Memory access interception
Now that we have various bits and bobs of experimentation code, it's time to put the pieces we have together. To use these pieces in some kind of emulation, we will need to be able to call various routines when the emulated code writes (or read) memory regions of interest, specifically memory mapped control registers.
As I mentioned in the first post, there are various options to handle this.
The first one that comes to mind is write-protecting the memory segments we're interesting in, then handle the page fault, call our emulation routine, un-protect the page, write the data, re-protect the page, then continue execution. I didn't test this scenario because not only it feels terribly slow and a bit painful, and also because it can't intercept read access which is also required (unless you disable read access, not sure if it's possible). This would also require protecting full memory segments where we would want more fine grained control to catch access to specific registers.
The second possibility would be to patch the code to jump to emulation routine for each memory access we're interested in. This is probably the best scenario in terms of performance, but also the most complex because this means detecting access to specific memory region when statically analyzing the assembly code, so not trivial at all. More on the challenges of static patching later.
The third one involves memory watchpoints and debugging tooling. I've explored various possibilities kernel side, but the most obvious and simple way seems to be the
procfs filesystem. This pseudo filesystem provides userland access to running process informations. More interestingly, it also provides some debugging facilities, especially the memory watchpoints we're interested in. Let's dig into that option, and put some code together to test it.
First, we need to setup a memory region to watch, then set a SIGTRAP signal handler to catch the signal generated by the kernel on memory watchpoint access, and finally setup the specific watchpoints we want to catch :
C:
// The memory we're interested in
unsigned char test[32];
// Open a file descriptor to our own process in the procfs
char proc_file[32];
mypid = getpid();
snprintf(proc_file, 32, "/proc/%010d", mypid);
proc_fd = open(proc_file, O_RDWR);
// Set a SIGTRAP handler, and get some infos in our handler
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = sigtrap_handler;
sa.sa_flags = SA_SIGINFO;
sigaction(SIGTRAP, &sa, 0);
// Setup 2 watchpoints, 1 byte each, one for write access and one for read access
prwatch_t prw;
prw.pr_vaddr = test;
prw.pr_size = 1;
prw.pr_wflags = MA_WRITE;
int res = ioctl(proc_fd, PIOCSWATCH, &prw);
prw.pr_vaddr = test + 1;
prw.pr_size = 1;
prw.pr_wflags = MA_READ;
res = ioctl(proc_fd, PIOCSWATCH, &prw);
The signal handler could look like this :
C:
void sigtrap_handler(int sig, siginfo_t* siginfo, ucontext_t* ucontext)
{
printf("Got SIGTRAP SIGNO %d SICODE %d ERRNO %d ADDR %X EPC %X\n",
siginfo->si_signo, siginfo->si_code, siginfo->si_errno, siginfo->si_addr, ucontext->uc_mcontext.__gregs[CTX_EPC]);
<snip>
}
Now, the the next step is to get the memory address that generated the SIGTRAP so we can call the right routine. From the documentation, we are supposed to get the accessed memory address in the si_addr member of the siginfo_t structure. Unfortunately my experiments showed that we don't get anything there

.
Hopefully, we can get valuable info in the ucontext parameter of the handler. This ucontext_t structure contains the full user context (general registers, floating point registers, exception PC and a few other stuff) at the time of exception raising.
The first usefull info is the exception PC, because it means we can get the specific instruction that raised the exception. Then, the user context provides all the infos we need to complete the puzzle, that is all the registers at the time of the exception.
So, by decoding the instruction we can tell if the access was read or write, and with the registers we can tell the specific address of the memory access. Now our SIGTRAP handler could look like :
C:
void sigtrap_handler(int sig, siginfo_t* siginfo, ucontext_t* ucontext)
{
printf("Got SIGTRAP SIGNO %d SICODE %d ERRNO %d ADDR %X EPC %X\n",
siginfo->si_signo, siginfo->si_code, siginfo->si_errno, siginfo->si_addr, ucontext->uc_mcontext.__gregs[CTX_EPC]);
unsigned int insn = *(unsigned int *)(ucontext->uc_mcontext.__gregs[CTX_EPC]);
unsigned char base, rt;
short offset;
switch (insn >> 26)
{
// Read
case 0b100100: // LBU
base = (insn >> 21) & 0b11111;
rt = (insn >> 16) & 0b11111;
offset = insn & 0xFFFF;
printf("LBU $%d, %d($%d)\n", rt, offset, base);
printf("%d <- %X\n", ucontext->uc_mcontext.__gregs[rt], ucontext->uc_mcontext.__gregs[base] + offset);
// Now we know it's read access into rt register, at ucontext->uc_mcontext.__gregs[base] + offset location
break;
// Write
case 0b101000: // SB
base = (insn >> 21) & 0b11111;
rt = (insn >> 16) & 0b11111;
offset = insn & 0xFFFF;
printf("SB $%d, %d($%d)\n", rt, offset, base);
printf("%X <- %d\n", ucontext->uc_mcontext.__gregs[base] + offset, ucontext->uc_mcontext.__gregs[rt]);
// Now we know it's write access to ucontext->uc_mcontext.__gregs[base] + offset location, from rt register
break;
}
}
(printf and syscalls are very very bad in signal handlers, I promise not to do it anymore !)
Progress ! But unfortunately we still have an issue in our hands. When the signal handler returns control to the kernel, the process execution resumes where the exception occured, so we'll get an inifinite loop...
We need to find a way to resume execution at the next instruction. I experimented various stuff with inline assembly, from jumping back at EPC + 4 or messing with the stack to change the return address, but all this proved to be a dead end. Finally I settled with :
C:
ucontext->uc_mcontext.__gregs[CTX_EPC] += 4;
And it does the trick

When the user process returns control to the kernel, the kernel will restore user context from the ucontext_t structure. More interestingly, it will restore PC from the CTX_EPC "register" of the ucontext_t structure.
This is all fine : we caught read/write access at a specific memory location, were able to get the specific instruction, registers and memory address at access time, but we have one last problem.
If we want our read/write intruction to *effectively* read/write the memory, we can't simply read/write memory from our handler, because it will trigger the watchpoint again and end up in an infinite loop !
Hopefully, the procfs gives us a solution to this issue : we can read() or write() to our procfs file descriptor at an offset representing the process virtual address, and the memory will be read/written without trigering our watchpoints
All in all, this gives us all we need to intercept reads and writes and trigger the right emulation handlers. This is probably *very* slow because of the context switches, but that will do for now !