OpenBSD/sgi

johnnym · Jan 5, 2023

First of all I wish you all a happy new year!

I'm not sure if this is the right forum, so please move this thread to where it belongs if need be.

I'd like to make you aware of something I've been plotting on since a while now - actually since OpenBSD retired the sgi architecture after 6.9 in Oct 2021. Back then I was pretty sad about that move, although it was on the table since a while already back then - actually since 6.6 IIRC, but some person or some people kept it alive until after 6.9. Much obliged for that.
Back then I by chance - well, it was more like trial and error - discovered that the octeon userland is actually compatible with sgi kernels - they use the same packages tree, so they ought to. You only need to change the baud rate for the console device to 9600 for sgi. I later found out that the 6.9 IP30 kernels even work with 7.0 octeon file systems. As I run my machines diskless switching between kernels and userlands is as easy as changing a symlink. Back then I thought this could allow to only forward port the sgi kernels and use the octeon userland stuff and hence avoid creating sgi userlands.

Well, time passed on and I didn't get anywhere for that, but in late November 2022 I finally found the time and dedication to get things going. First things I noticed was, that it seems to be impossible to compile sgi kernels from an octeon userland, as these are missing the gcc 4.2.1 that's needed for compiling the sgi kernels, it looks like LLVM/clang can't do that for sgi kernels. Well, so I was stuck with a 6.9 sgi userland, but this one so far allowed to build all sgi kernels.

I used the GitHub mirror of OpenBSD's CVS src repository for this endeavour - mainly because I'm much more familiar with git than with cvs and rather wanted to invest my time in the actual task than to invest it in learning all about cvs. I started by reverting the commits that removed the sgi related stuff, based on the src repo leaving 7.0-beta. Then I tried to compile the sgi kernels and worked through all errors during compilation by reverting related commits that removed additional sgi stuff and by replaying changes that seemed to be needed from the octeon arch. Unfortunately not all missing things show already during compilation but so far I was able to get everything going on my test machines, except for one kernel (see below).

I've created three branches, check the links (and the history of those branches) for details about the changes I made:

My available resources for compilation are unfortunately rather limited: I am doing everything on my dual 300 MHz R12K Octane, which is good enough for building the kernels (about 50 mins per kernel and a little faster than my Octane2 with a single 400 MHz R12K) but struggles with userland builds. I tried to build an OpenBSD/sgi 7.0 userland for a while now - so far it ran for an aggregated 110 hours, and mostly for LLVM/clang actually - but it's still not done. I modified the main Makefile to be able to interrupt a build and later continue without loosing the already compiled stuff as I don't want to run my machine unattended for too long.

I'm not sure what LLVM/clang is used for on OpenBSD/sgi other than compiling LLVM/clang, the sgi userland was compiled mainly with gcc 4.2.1 IIRC, the kernels can only be compiled with gcc 4.2.1 IIUIC. Hence I'm thinking about skipping the build of LLVM/clang and modify the release file lists accordingly so I can build an OpenBSD/sgi userland w/o building LLVM/clang. It looks like I simply don't have the computing resources to build LLVM/clang in an acceptable amount of time. Assuming an Origin 350 with 4 x 1 GHz R16K could really be 2 x 3 times as fast as my Octane with 2 x 300 MHz R12K this could be done in maybe under a day with such a machine.

If someone is interested in trying that, please feel free to contact me, I can get you going. I needed 2 GiB of memory to be able to compile some LLVM/clang parts with two processes. I started with three processes, but ran out of memory. So a four processor machine will likely need a minimum of 4 GiB of memory, better more to be able to use all processors for the compilation.

So far I have built kernels for 7.0, 7.1 and 7.2:

...and for 7.0 and 7.1 everything runs well on the machines I have tested it:

R4400 Indy (IP22)
R12000 Origin200 (IP27, both SP and MP)
R10000 Indigo² (IP28)
R12000 Octane (IP30, both SP and MP)
R5000 O2 (IP32)

You can check the logs linked from GitHub over at https://dmesgd.nycbug.org/ for details. I haven't yet uploaded logs for 7.2. But everything seems to work so far for Indy, Octane (SP and MP) and O2, Origin200 no yet tested.

Current problem

Now to my current problem:

Unfortunately between 7.1 and 7.2 something broke for the R10000 Indigo² (IP28) and up until now I wasn't able to figure out what. The problem only affects the R10000 Indigo², all the other machines happily boot their 7.2 kernels with the 7.2 octeon file system. But for the R10000 Indigo² things fall apart as soon as parts of the userland start execution with segfault after segfault. Now the interesting thing here is, this does not happen when using single user mode. Trying a few userland tools where I could see core files left from the last boot(s), even with the root FS mounted r/w doesn't show any problems. But when exiting single user mode things again fall apart, though this time with bus errors instead of segfaults. So everything works in single user mode but falls apart in multi-user mode.

I'd appreciate any help to figure out what's causing this problem.

EDIT: A workaround is available on GitHub, the IP28 kernel at GitHub was updated accordingly.

Help wanted

If you're interested in helping me and be part of this "project" please let me know. I'd also be interested in test results from machines I don't have available or not in working order, like Tezro, Fuel, Origin 350, R12K O2, R4K Indigo², R8K Indigo², R5K Indy and R4K Indigo (have one, but was gutted and partly destroyed by some idiot). If you have the time and interest I'll try to get you going in no time.

Outlook

My rough plan for the future is to get new sgi userlands built and from there work on the other stuff, like RAMdisk kernels and maybe an ISO so users can install their systems on disk if needed and don't have to resort to network booting and nfsrb2 for building their file systems.

Elf · Jan 11, 2023

You might see if there is anything qemu on a fast machine, or cross-compiling, could do to help accelerate builds. This is sort of along the lines of what the SGUG-RSE team is looking at

Neat find though, and an interesting project!

johnnym · Jan 11, 2023

Cross-compiling OpenBSD seems to be some higher magic and I couldn't find much information about how it could be done, apart from the official FAQ and two gists:

http://www.openbsd.org/faq/faq5.html (see below "Can I cross-compile?")
https://gist.github.com/uebayasi/6328591 (states that one can't make a release with cross-comiling)
https://gist.github.com/nullnilaki/8073700 (at least makes it look like kernels can be cross-compiled, but that is already doable in an acceptable amount of time on my current hardware)

In this regard qemu also wouldn't be an option as it can't boot OpenBSD/sgi.

Really, I think the best way to "speed things up" would be to not build LLVM/clang for OpenBSD/sgi, because frankly, trying to build a new OpenBSD userland feels more like building LLVM/clang.

Elf said:
Neat find though, and an interesting project!

Indeed!

johnnym · Jan 18, 2023

Made some progress since my last post:

Guess what: if you do not intend to build LLVM/clang for OpenBSD/sgi userlands, you can do the whole userland build on a dual 300 MHz R12K driven Octane with 2 GiB of RAM in just a little more than 12 hours! Much better than 110 hours plus extra (for the parts that were not build after I did not continue the build process).

It then took another 10 hours to make the actual release files (sets, kernels (needed to be made again, respective Makefile patched to make use of the second processor), RAMdisk kernels (repective Makefile unfortunately not yet patched, so needed extra time for this run because it was done in serial), boot loaders and CDROM ISO) out of the userland created earlier.

So I now have an OpenBSD/sgi 7.0 userland available which includes the usual files (minus X) that were available in the past, e.g. on https://ftp.eu.openbsd.org/pub/OpenBSD/6.9/sgi/

I still need to test this, but at least during compilation of the files for the userland and release LLVM/clang wasn't needed, so it might indeed be save to do without on OpenBSD/sgi.

****

Actually I also came here (and also to the IRIXNetwork forums) with this topic because I'm in need of testers for all the machinery I don't have available. Hence I'd be really grateful for people with the following machines:

Tezro (uses the IP27 kernels)
Fuel (dito)
Origin 350 (dito)
R12K O2 (uses the IP32 kernels)
R4K Indigo² (uses the IP22 kernels)
R8K Indigo² (uses the IP26 kernels)
R5K Indy (uses the IP22 kernels)
R4K Indigo (dito)

...to test the kernels (and in the future also userlands) I have available on GitHub (linked from the original post).

johnnym · Jan 30, 2023

I did some work on OpenBSD/sgi since my last post about it:

1. I tested the created 7.0 userland (incl. the recreated kernel) on my Octane with the help of nfsrb2 and it boots fine, the same way as the last "original" 6.9 version does and the same way the respective octeon file systems did.
There is one issue with this release though: You remember that I decided to not build LLVM/clang because it takes much longer than the userland and kernels alone. It looks like the extra software that is available for octeon (and that was available for sgi, too,in the past) through the ports tree (e.g. from https://ftp.eu.openbsd.org/pub/OpenBSD/7.0/packages/mips64/) is built with LLVM/clang, so seems to require the LLVM/clang libraries to run:

Code:

octane# pkg_add nano
quirks-4.53 signed on 2021-10-06T15:02:34Z
quirks-4.53: ok
nano-5.8:libiconv-1.16p0: ok
Can't install gettext-runtime-0.21p1 because of libraries
|library c++.8.0 not found
| not found anywhere
|library c++abi.5.0 not found
| not found anywhere
Direct dependencies for gettext-runtime-0.21p1 resolve to libiconv-1.16p0
Full dependency tree is libiconv-1.16p0
Can't install nano-5.8: can't resolve gettext-runtime-0.21p1
Couldn't install gettext-runtime-0.21p1 nano-5.8

Well, as bad as this looks, the solution is to just copy over the missing LLVM/clang libs from an octeon userland or the respective set. I assume it's in the base set, because if it was in the comp(iler) set, software from the ports tree wouldn't work without the comp set installed:

Code:

octane# cp -v /tmp/llvm-libs/* /usr/lib/
/tmp/llvm-libs/libLLVM.so.6.0 -> /usr/lib/libLLVM.so.6.0
/tmp/llvm-libs/libc++.a -> /usr/lib/libc++.a
/tmp/llvm-libs/libc++.so.8.0 -> /usr/lib/libc++.so.8.0
/tmp/llvm-libs/libc++_p.a -> /usr/lib/libc++_p.a
/tmp/llvm-libs/libc++abi.a -> /usr/lib/libc++abi.a
/tmp/llvm-libs/libc++abi.so.5.0 -> /usr/lib/libc++abi.so.5.0
/tmp/llvm-libs/libc++abi_p.a -> /usr/lib/libc++abi_p.a
/tmp/llvm-libs/libcompiler_rt.a -> /usr/lib/libcompiler_rt.a

I just copied everything I thought was related to LLVM/clang and that seems to have solved that issue:

Code:

octane# pkg_add nano
quirks-4.53 signed on 2021-10-06T15:02:34Z
nano-5.8:gettext-runtime-0.21p1: ok
nano-5.8: ok
octane# pkg_add htop 
quirks-4.53 signed on 2021-10-06T15:02:34Z
htop-3.0.5pl20210418:libffi-3.3p1: ok
htop-3.0.5pl20210418:pcre-8.44: ok
htop-3.0.5pl20210418:xz-5.2.5: ok
htop-3.0.5pl20210418:sqlite3-3.35.5p0: ok
htop-3.0.5pl20210418:bzip2-1.0.8p0: ok
htop-3.0.5pl20210418:python-3.8.12: ok
htop-3.0.5pl20210418:glib2-2.68.4: ok
htop-3.0.5pl20210418:desktop-file-utils-0.26: ok
htop-3.0.5pl20210418: ok
Running tags: ok
New and changed readme(s):
    /usr/local/share/doc/pkg-readmes/glib2

The binaries I tested from the ports tree - i.e. nano, htop, git, 7z, eopenssl-3.0 - installed and worked just fine after copying the LLVM/clang libs over.

So that is not the perfect solution I had hoped for, but for now I can live with it. And maybe if some people with more powerful machines chime in in the future, we can bring LLVM/clang back to OpenBSD/sgi, though I'm unsure if it is worth the time for compilation, if we could otherwise just copy over some files to make it work with the software from the ports tree.

****

2. I had another idea to tackle the IP28 problem that arose in OpenBSD/sgi 7.2: I don't know how to debug that problem any further without the help of some person with more skills here, but I could do some bisecting to chase down a possible cause. Up until now that wasn't possible, because I use different branches for each version of OpenBSD/sgi and added the needed reverts and adaptations on top. But what if I could make all these changes in one branch and at the time they happened or not happened actually:
Enter the sgi-never-retired branch, where I rewrote the history of the official OpenBSD source and "deleted" the commits that removed sgi stuff and adapted the commits that changed functionality that also needed to changed in the now still existing sgi related code. I will make another post about how to do that with git when I reached 7.2 in that branch. At the moment I am at 7.1 still. When I have reached 7.2 I can also finaly start with the bisecting process and hopefully narrow things down to a reasonable commit, fingers crossed.

johnnym · Jan 31, 2023

The OpenBSD/sgi 7.0 release files are placed on a webserver allowing for HTTP and HTTPS access, i.e. find everything needed here:

https://ftp.machine-hall.org/pub/OpenBSD/7.0/sgi/

I haven't yet configured an FTP server for anonymous access despite its hostname, but if this is still a thing I think I will do that, too.

I know proftpd and vsftpd, though not sure which one to use. Any suggestions, also for other FTP servers?

****

To verify the contents of the fles I have created a signify(1) keypair and signed the SHA256 file (resulting in SHA256.sig). You can hence verify that what you downloaded comes from me by using signify on OpenBSD (or signify-openbsd on Debian GNU/Linux) using my signify public key below.

My signify public key is:

Code:

RWRmGQ1rewM9vHtQ6vMcAUnRrsJqKO/Z+n07CXxQkTPpAOnsVa26CIUj

As all kernels were rebuilt during the release build, they have a different hash value than the ones on GitHub. But as I didn't yet test each on the matching hardware (bsd.mp.IP30 is OK though) I haven't yet changed the release page on GitHub. Actually they shouldn't differ in functionality as they don't differ code-wise.

I haven't tested on-disk installations from ISO or the RAMdisk kernels, so if interested, suit yourself. I'm happy to provide assistance if needed.

johnnym · Feb 1, 2023

I yesterday managed to bring the sgi-never-retired branch to 7.2 and started the bisecting process. I then could limit the error window to just about 10 days yesterday between:

in sgi-never-retired branch: https://github.com/the-machine-hall/open...c58176db30
in real: https://github.com/openbsd/src/compare/8...7e9884dcd4

I didn't spot something obvious there yet, so hopefully the result will make sense

, but it was late and I was tired anyhow, so shouldn't overrate that.

johnnym · Feb 1, 2023

Got my first ambiguity here:

I consider it a good commit when:

booting to the login prompt works
login works
uname, machine and sysctl hw work

Bad commits usually mount the NFS file systems but give bus errors, illegal instruction or segmentation faults when going through /etc/rc.

Now with a39c18f28d16b1a61658f6ce07a74bc58176db30 applied the Indigo² hangs after mounting the NFS root and the NFS swap (the NFS server logs it as "authenticated mount requests"). And I remember that I've seen something similar on my Octane when getting the commits together for the sgi-is-alive-at-7.2 branch, it didn't happen after applying a specific commit - of course I didn't log that one

. So from that I can't really say if this "badness" is related to my actual issue for IP28 and OpenBSD 7.2.

In the end I decided to define it as bad commit - well, it didn't boot until the login prompt - but I now think that was wrong. That because:

Nobody claims that a kernel will always work correctly for each and every commit
That issue seems to have been solved later on, because although the bad commits so far are bad in the sense of the description, they all didn't hang after mounting the NFS shares

So making it a good commit would have set the "left" border ("good" commits) a little closer to the "right" border (bad commits). It could be that I am working now on the wrong half. Well, if it doesn't give a reasonable result, I can still start the bisecting with that ambiguous commit as good, but will cost me hours...

Elf · Feb 1, 2023

Good patience trying to bisect commits on a kernel build where you have to go through the whole reboot process each time

johnnym · Feb 1, 2023

Indeed! Each bisect step requires the following steps for verification:

tar repo state - seconds
copy it over to the NFS server - about 1m
untar it on the Octane to a 15K SCSI disk from NFS - about 20m
compile it - about 28m
test it on Indigo² - under 5m

...so roughly 50 minutes per step.

johnnym · Feb 1, 2023

Ok, I'm through, though I don't believe 4fc402b61836e8a270d48f5516496cfbfe57776b (0541ad29664165bece2a7957d02a188f9bafb73b in real) is the real culprit. I mean, yeah it is in the network code so could have a play in the problem EDIT: and it intruced a bug for big endian machines according to the next change to the according file, but why should it specifically affect IP28 and no other machine, not even the Indy which uses the same NIC?

I rather believe this is the reason of the problem I mentioned earlier (hangs after NFS mounts) and ~~that I also saw on the Octane but which was gone later on~~ EDIT: my memory played my a trick here: the Octane didn't hang but took a long time according to my logs at around the NFS mounts (the solution - according to the commit dates - was to replay 85caa4b also for sgi with 1583f1e). Also newer commits identified as bad did not hang on the Indigo². I'll look into that on the weekend probably.

Code:

$ git bisect log
git bisect start
# bad: [0d58b8b8b5e0621f84efa993ee9ef47605603beb] drop the -beta
git bisect bad 0d58b8b8b5e0621f84efa993ee9ef47605603beb
# good: [5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7] close enough to release, we drop -beta
git bisect good 5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7
# good: [9f850877c8e5a89e6bfb255f1f7026c00bb7875e] vmm(4): reference count vm's and vcpu's
git bisect good 9f850877c8e5a89e6bfb255f1f7026c00bb7875e
# bad: [970cf9c09324fa9781be02dc122de675098fbf1d] Don't yet configure smmu(4) on Qualcomm SoCs as used on the Lenovo x13s as it is still not ready for runtime use and probably needs further quirks.
git bisect bad 970cf9c09324fa9781be02dc122de675098fbf1d
# good: [a7cdf5850edf952aab05a1d12a910add326a1f7c] Make test table based, extend it a little
git bisect good a7cdf5850edf952aab05a1d12a910add326a1f7c
# bad: [a39c18f28d16b1a61658f6ce07a74bc58176db30] strlen was in v6 libc (s5/perror.c) but not documented till v7 ok schwarze@
git bisect bad a39c18f28d16b1a61658f6ce07a74bc58176db30
# good: [17fc9e5b1d3178a8d65cacff7114b83163f14a02] The IPv4 reassembly code is MP safe, so we can run it in parallel. Note that ip_ours() runs with shared netlock, while ip_local() has exclusive netlock after queuing.  Move existing the code into function ip_fragcheck() and call it from ip_ours(). OK mvs@
git bisect good 17fc9e5b1d3178a8d65cacff7114b83163f14a02
# bad: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@
git bisect bad 4fc402b61836e8a270d48f5516496cfbfe57776b
# good: [cd0dd8f18578e5b883f43aa8ea64d31710ca1159] Force disabling the use of delay slots. This is ugly but gets the compiler to produce 99+% correct code at all optimization levels, and can help people who would like to tinker a bit with the backend.
git bisect good cd0dd8f18578e5b883f43aa8ea64d31710ca1159
# good: [0e641b41fe54fcc8de2a3351ff974e0677060113] Remove bogus mtw_read_cfg.
git bisect good 0e641b41fe54fcc8de2a3351ff974e0677060113
# good: [5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f] Zap prototypes for nonexistent nd6_setmtu() and in6_ifdel()
git bisect good 5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f
# good: [e62afb52dea0a7b7d0c6c099652a54e60340a22d] Fix RFC number in comment
git bisect good e62afb52dea0a7b7d0c6c099652a54e60340a22d
# good: [61f35befa9a0619b3becea84efb445917c00389e] Add a second test to validate the tables in the library.
git bisect good 61f35befa9a0619b3becea84efb445917c00389e
# first bad commit: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@

johnnym · Feb 3, 2023

Couldn't let it rest yesterday:

So I followed my suspicion from Wednesday and indeed, with the changes from commit 1109691f1d2 applied on top of 4fc402b6183 the hangs after the NFS mounts are gone and the Indigo² happily boots to the login prompt and works correctly, so this is not the real problem. Checking out the whole repo at 1109691f1d2 and compiling it also gives the same result.

So we got a new good commit and the last bad commit that didn't hang after the NFS mounts as new bad commit for another round of bisecting. This time only 104 commits to search through, "roughly 7 steps" according to git bisect.

johnnym · Feb 4, 2023

Feeling lucky today, so I'll first try to manually search and find 1 or 2 (if need be) commit(s) - one good or one bad - close to the respecitve other end of the search room (same in OpenBSD's original source), thus to cut the number of needed bisect steps short. Fingers crossed.

johnnym · Feb 4, 2023

Unfortunately the two commits selected and tried (3b3dd72256d and 805206b941e) didn't limit the search room to a big degree but could rather be used as new good and bad commits.

But when quickly scanning through the remaining commits I recognized a specific one (ae6cd46) that "benefits most mips64 platforms":

Code:

commit ae6cd4623ffb7b807b08788d8e53f7a9259c0c82
Author: miod <[email protected]>
Date: Sun Aug 7 19:40:48 2022 +0000

Use PMAP_PREFER_ALIGN() == 0 rather than !defined(PMAP_PREFER) to enable the
fast path in the pager code; this benefits most mips64 platforms.

ok kettenis@ mpi@

   (cherry picked from commit d600f90f1a804e442018f93ce8ec61f99cd5fb69)

I checked its parent which was still good and then checked it itself which resulted in a bad kernel producing the errors I described here earlier and also on GitHub for IP28. The corresponding issue on GitHub was updated accordingly and has the details.

****

On the way I also could cut down the time needed for compiling considerably by using git diff to create a patch and apply that one to move between revisions and let make find out what to recompile. Not sure if this will also be faster for commits that are further away than what I was operating on.

johnnym · Mar 1, 2023

Looks like I never posted that I found a fix/workaround for the problem with the IP28 kernel in OpenBSD/sgi 7.2:

Well, my "solution" was to extend an existing - i.e still existing in OpenBSD/sgi - clause meant for R5000 and R7000 processors in sys/arch/mips64/include/pmap.h to also trigger for IP28 (i.e. CPU_R10000 and TGT_INDIGO2):

Code:

diff --git a/sys/arch/mips64/include/pmap.h b/sys/arch/mips64/include/pmap.h
index 7cbac309a96..391e542797c 100644
--- a/sys/arch/mips64/include/pmap.h
+++ b/sys/arch/mips64/include/pmap.h
@@ -177,8 +177,11 @@ void    pmap_page_cache(vm_page_t, u_int);
  * and many structures containing fields which will be used with
  * <machine/atomic.h> routines are allocated from pools, __HAVE_PMAP_DIRECT can
  * not be defined on systems which may use flawed processors.
+ *
+ * There could be a similar problem for the IP28 aka POWER Indigo2 R10000, so
+ * we exclude the definition of __HAVE_PMAP_DIRECT for these systems, too.
  */
-#if !defined(CPU_R5000) && !defined(CPU_RM7000)
+#if !( defined(CPU_R5000) || defined(CPU_RM7000) || ( defined(TGT_INDIGO2) && defined(CPU_R10000) ) )
#define    __HAVE_PMAP_DIRECT
vaddr_t    pmap_map_direct(vm_page_t);
vm_page_t pmap_unmap_direct(vaddr_t);

...and which prevents the definition of __HAVE_PMAP_DIRECT which leads to the described problems on IP28 with commit ae6cd46 in sgi-never-retired branch (d600f90 in the official OpenBSD source code).

The corresponding issue on GitHub was updated accordingly, as was the IP28 kernel of OpenBSD/sgi 7.2 on GitHub.

johnnym · Mar 19, 2023

Finally, but still earlier than expected, OpenBSD switched to 7.3:

Code:

commit 9a3badca5016bb6b6ce5e35f28496815da15afb9 (HEAD -> sgi-is-alive-at-7.3)
Author: deraadt <[email protected]>
Date: Fri Mar 17 22:52:22 2023 +0000

   remove -beta tag

See https://github.com/openbsd/src/blob/1750...newvers.sh

Already working on it...

johnnym · Mar 19, 2023

Now look at that, my Octane already got a new MP kernel running:

Code:

>> boot
Setting $netaddr to 172.16.2.51 (from server )
Obtaining /sash from server
7278928+720752 entry: 0xa800000020020000
ARCS64 Firmware
Found SGI-IP30, setting up.
Initial setup done, switching console.
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.3 (GENERIC-IP30.MP) #0: Sun Mar 19 21:29:24 CET 2023
    [email protected]:/usr/src/sys/arch/sgi/compile/GENERIC-IP30.MP
real mem = 2147483648 (2048MB)
rsvd mem = 1064960 (2MB)
avail mem = 2119352320 (2021MB)
warning: no entropy supplied by boot loader
random: boothowto does not indicate good seed
mainbus0 at root: Octane
cpu0 at mainbus0: MIPS R12000 CPU rev 2.3 300 MHz, R10000 FPU rev 0.0
cpu0: cache L1-I 32KB D 32KB 2 way, L2 2048KB 2 way
cpu1 at mainbus0: MIPS R12000 CPU rev 2.3 300 MHz, R10000 FPU rev 0.0
cpu1: cache L1-I 32KB D 32KB 2 way, L2 2048KB 2 way
clock0 at mainbus0: int 5
xbow0 at mainbus0: XBow revision 4
xheart0 at xbow0 widget 8: Heart revision 4
onewire0 at xheart0
owserial0 at onewire0 "16kb EPROM" sn xxxxxxxxxxxx
owserial0: "PM20300MHZ" p/n 030-1356-001, serial xxxxxx
xbridge0 at xbow0 widget 15: Bridge revision 3
xbpci0 at xbridge0 bus 0: 33 MHz PCI bus
pci0 at xbpci0 bus 0
qlw0 at pci0 dev 0 function 0 "QLogic ISP1020" rev 0x05: irq 0, xbow irq 14
qlw0: nvram corrupt
qlw0: firmware rev 4.66.0, attrs 0x0
scsibus0 at qlw0: 16 targets, initiator 0
sd0 at scsibus0 targ 1 lun 0: <HP 73.4G, ST373455LC, HPC8> xxxxxxxxxxxxxxxxxxxx
sd0: 70007MB, 512 bytes/sector, 143374738 sectors
sd1 at scsibus0 targ 2 lun 0: <COMPAQ, BD07288277, HPB0> xxxxxxxxxxxxxxxxxxxx
sd1: 69464MB, 512 bytes/sector, 142264000 sectors
qlw1 at pci0 dev 1 function 0 "QLogic ISP1020" rev 0x05: irq 1, xbow irq 13
qlw1: nvram corrupt
qlw1: firmware rev 4.66.0, attrs 0x0
scsibus1 at qlw1: 16 targets, initiator 0
ioc0 at pci0 dev 2 function 0 "SGI IOC3" rev 0x01
onewire1 at ioc0
owmac0 at onewire1 "1kb EPROM" sn xxxxxxxxxxxx
owmac0: Ethernet Address xx:xx:xx:xx:xx:xx
owserial1 at onewire1 "16kb EPROM" sn xxxxxxxxxxxx
owserial1: "FP1" p/n 030-0891-003, serial xxxxxx
owserial2 at onewire1 "16kb EPROM" sn xxxxxxxxxxxx
owserial2: "PWR.SPPLY.ER" p/n 060-0035-002, serial xxxxxxxxxx
ioc0: ethernet irq 2, xbow irq 12
ioc0: superio irq 4, xbow irq 11
com0 at ioc0 base 0x20178: ns16550a, 16 byte fifo
com0: console
com1 at ioc0 base 0x20170: ns16550a, 16 byte fifo
iockbc0 at ioc0
iec0 at ioc0: 128KB SSRAM, address xx:xx:xx:xx:xx:xx
icsphy0 at iec0 phy 1: ICS1890 10/100 PHY, rev. 3
lpt at ioc0 not configured
dsrtc0 at ioc0: DS1687
"SGI Rad1" rev 0xc0 at pci0 dev 3 function 0 not configured
power0 at mainbus0
/dev/ksyms: Symbol table not valid.
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
boot device: iec0
nfs_boot: using interface iec0, with revarp & bootparams
nfs_boot: client_addr=172.16.2.51
nfs_boot: server_addr=172.16.0.1 hostname=octane
root on 172.16.0.2:/srv/nfs/octane/root
WARNING: clock gained 99 days
WARNING: CHECK AND RESET THE DATE!
swap on 172.16.0.2:/srv/nfs/octane/swap
Automatic boot in progress: starting file system checks.
pfctl: DIOCADDRULE: Operation not supported by device
pf enabled
starting network
pfctl: DIOCADDRULE: Operation not supported by device
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
swapctl: adding 172.16.0.2:/srv/nfs/openbsd/7.2/octeon/hosts/octane2/swap as swap device at priority 0
kvm_mkdb: can't open /dev/ksyms
savecore: /bsd: kvm_read: version misread
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd smtpd sndiod.
starting local daemons: cron.
Sun Mar 19 21:44:02 CET 2023

OpenBSD/sgi (octane.machine-hall.org) (console)

login: root
Last login: Mon Feb  6 13:08:06 on console
OpenBSD 7.3 (GENERIC-IP30.MP) #0: Sun Mar 19 21:29:24 CET 2023

Welcome to OpenBSD: The proactively secure Unix-like operating system.

Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code.  With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.

You have mail.
octane# machine
octeon 
octane# sysctl hw
hw.machine=sgi
hw.model=IP30
hw.ncpu=2
hw.byteorder=4321
hw.pagesize=16384
hw.disknames=sd0:cff4231e147e67d8,sd1:87a703b75b1e1601
hw.diskcount=2
hw.cpuspeed=299
hw.vendor=SGI
hw.product=Octane
hw.physmem=2147483648
hw.usermem=2147450880
hw.ncpufound=2
hw.allowpowerdown=1
hw.ncpuonline=2
hw.power=1
octane#

Unfortunately it doesn't like to work with my OpenBSD/sgi 7.0 FS, so I booted an octeon 7.2 FS instead. But this also means, I can't try out this kernel when compiling the 7.3 kernels for the other machines, something I usually do for testing the new kernel(s).

Well, we can't have everything at once.

johnnym · Mar 31, 2023

OpenBSD/sgi 7.3

I have by now created all kernels for OpenBSD/sgi 7.3 (incl. for R8000 Indigo² (IP26)) and tested all kernels I have machines for (please see the corresponding release page on GitHub for details).

Already available since a while is also an intro branch that gives an overview.

Every machine was tested by successfully booting with a OpenBSD/octeon 7.3 FS snapshot - OpenBSD 7.3 hasn't released yet! - and running a few benchmarks (7za, openssl) using MP operation where possible. The boot logs are linked from the above mentioned release page, as are the new kernels.

Unfortunately also this release does come with issues, this time for two machines:

Indy (IP22)
R10000 Indigo² (IP28)

I created an issue over at GitHub to follow the process of finding the reason for and hopefully solving this issue.

So it looks like it's time to bring the sgi-never-retired branch forward and do some bisecting starting with the Indy (IP22) kernel.

johnnym · Apr 6, 2023

There might be some confusion about what systems OpenBSD/sgi runs on. So let me clarify that by citing "official" information from the intro(4/sgi) manpage of OpenBSD/sgi 6.9:

[...]
HARDWARE

The following systems are supported:

Hardware Family Kernel Model
IP20 IP20 IP22 Indigo (R4k)
IP22 IP22 IP22 Indigo2, Challenge M (R4k)
IP24 IP22 IP22 Indy*, Challenge S
IP26 IP22 IP26 POWER Indigo2 (R8000)
IP27 IP27 IP27 Origin 2x00, Onyx 2
IP28 IP22 IP28 POWER Indigo2 (R10000)*
IP29 IP27 IP27 Origin 200
IP30 IP30 IP30 Octane*, Octane 2* (Speedracer)
IP31 IP27 IP27 Origin 200*/2x00, Onyx 2 (250+ MHz)
IP32 IP32 IP32 O2*, O2+ (Moosehead)
IP34 IP35 IP27 Fuel (Asterix)
IP35 IP35 IP27 Origin 3x00, Onyx 3x000, Onyx 3
IP39 IP35 IP27 Onyx 4
IP45 IP35 IP27 Origin 300, Onyx 300
IP53 IP35 IP27 Origin 350, Onyx 350, Tezro
IP59 IP35 IP27 Origin 350, Onyx 350, Tezro (1GHz)[...]

I can confirm the principle working of the systems marked with a * - these are the systems I have at hand and tested so far - for OpenBSD/sgi up to 7.3 with the exception of Indy and R10000 Indigo² for which I try to track down the problem cause. For the Indy I could already clarify that the issue detected for OpenBSD/sgi 7.3 is not new but present in all versions since 6.9 (and maybe even earlier), see https://github.com/the-machine-hall/open...1494425202 for details.

I provide kernels for all of the above listed systems on GitHub and those can be easily tested by netbooting them from the PROM on the respective system.

johnnym · Apr 16, 2023

Update for IP28 and OpenBSD/sgi 7.3

Last week I could track down the problem source for IP28, which is actually two-fold: OpenBSD introduced the clockintr(9) subsystem between 7.2 and 7.3 and for some reason this didn't come without bad fallout for IP28, specifically the following commits break operation already during kernel boot:

mips64, loongson, octeon: switch to clockintr (6b3715e (in sgi-never-retired-2 branch), f124c57 (in the official sources))
kernel: stathz is always non-zero after cpu_initclocks() (8a1238c (in sgi-never-retired-2 branch), 9bcfcad (in the official sources))

I can workaround the first problem by deactivating clockintr for IP28 (see https://github.com/the-machine-hall/open...1510429607 for details) but this doesn't prevent the other breakage which produces the same result with or w/o the workaround applied. The workaround for the 7.2 issue with IP28 is not in the sgi-never-retired-2 branch but was applied when testing during the bisecting process.

The other systems I can test (Indy (IP22), Origin200 (IP27), Octane (IP30), O2 (IP32)) seem to be unaffected by the two commits mentioned above.

OpenBSD/sgi

Member

All gone now

Member

Member

Member

Member

Member

Member

All gone now

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member