OpenBSD/sgi

johnnym

New member
Aug 7, 2022
17
18
3
First of all I wish you all a happy new year!

I'm not sure if this is the right forum, so please move this thread to where it belongs if need be.

I'd like to make you aware of something I've been plotting on since a while now - actually since OpenBSD retired the sgi architecture after 6.9 in Oct 2021. Back then I was pretty sad about that move, although it was on the table since a while already back then - actually since 6.6 IIRC, but some person or some people kept it alive until after 6.9. Much obliged for that.
Back then I by chance - well, it was more like trial and error - discovered that the octeon userland is actually compatible with sgi kernels - they use the same packages tree, so they ought to. You only need to change the baud rate for the console device to 9600 for sgi. I later found out that the 6.9 IP30 kernels even work with 7.0 octeon file systems. As I run my machines diskless switching between kernels and userlands is as easy as changing a symlink. Back then I thought this could allow to only forward port the sgi kernels and use the octeon userland stuff and hence avoid creating sgi userlands.

Well, time passed on and I didn't get anywhere for that, but in late November 2022 I finally found the time and dedication to get things going. First things I noticed was, that it seems to be impossible to compile sgi kernels from an octeon userland, as these are missing the gcc 4.2.1 that's needed for compiling the sgi kernels, it looks like LLVM/clang can't do that for sgi kernels. Well, so I was stuck with a 6.9 sgi userland, but this one so far allowed to build all sgi kernels.

I used the GitHub mirror of OpenBSD's CVS src repository for this endeavour - mainly because I'm much more familiar with git than with cvs and rather wanted to invest my time in the actual task than to invest it in learning all about cvs. I started by reverting the commits that removed the sgi related stuff, based on the src repo leaving 7.0-beta. Then I tried to compile the sgi kernels and worked through all errors during compilation by reverting related commits that removed additional sgi stuff and by replaying changes that seemed to be needed from the octeon arch. Unfortunately not all missing things show already during compilation but so far I was able to get everything going on my test machines, except for one kernel (see below).

I've created three branches, check the links (and the history of those branches) for details about the changes I made:

My available resources for compilation are unfortunately rather limited: I am doing everything on my dual 300 MHz R12K Octane, which is good enough for building the kernels (about 50 mins per kernel and a little faster than my Octane2 with a single 400 MHz R12K) but struggles with userland builds. I tried to build an OpenBSD/sgi 7.0 userland for a while now - so far it ran for an aggregated 110 hours, and mostly for LLVM/clang actually - but it's still not done. I modified the main Makefile to be able to interrupt a build and later continue without loosing the already compiled stuff as I don't want to run my machine unattended for too long.

I'm not sure what LLVM/clang is used for on OpenBSD/sgi other than compiling LLVM/clang, the sgi userland was compiled mainly with gcc 4.2.1 IIRC, the kernels can only be compiled with gcc 4.2.1 IIUIC. Hence I'm thinking about skipping the build of LLVM/clang and modify the release file lists accordingly so I can build an OpenBSD/sgi userland w/o building LLVM/clang. It looks like I simply don't have the computing resources to build LLVM/clang in an acceptable amount of time. Assuming an Origin 350 with 4 x 1 GHz R16K could really be 2 x 3 times as fast as my Octane with 2 x 300 MHz R12K this could be done in maybe under a day with such a machine.

If someone is interested in trying that, please feel free to contact me, I can get you going. I needed 2 GiB of memory to be able to compile some LLVM/clang parts with two processes. I started with three processes, but ran out of memory. So a four processor machine will likely need a minimum of 4 GiB of memory, better more to be able to use all processors for the compilation.

So far I have built kernels for 7.0, 7.1 and 7.2:


...and for 7.0 and 7.1 everything runs well on the machines I have tested it:

  • R4400 Indy (IP22)
  • R12000 Origin200 (IP27, both SP and MP)
  • R10000 Indigo² (IP28)
  • R12000 Octane (IP30, both SP and MP)
  • R5000 O2 (IP32)

You can check the logs linked from GitHub over at https://dmesgd.nycbug.org/ for details. I haven't yet uploaded logs for 7.2. But everything seems to work so far for Indy, Octane (SP and MP) and O2, Origin200 no yet tested.

Current problem

Now to my current problem:

Unfortunately between 7.1 and 7.2 something broke for the R10000 Indigo² (IP28) and up until now I wasn't able to figure out what. The problem only affects the R10000 Indigo², all the other machines happily boot their 7.2 kernels with the 7.2 octeon file system. But for the R10000 Indigo² things fall apart as soon as parts of the userland start execution with segfault after segfault. Now the interesting thing here is, this does not happen when using single user mode. Trying a few userland tools where I could see core files left from the last boot(s), even with the root FS mounted r/w doesn't show any problems. But when exiting single user mode things again fall apart, though this time with bus errors instead of segfaults. So everything works in single user mode but falls apart in multi-user mode.

I'd appreciate any help to figure out what's causing this problem.

Help wanted

If you're interested in helping me and be part of this "project" please let me know. I'd also be interested in test results from machines I don't have available or not in working order, like Tezro, Fuel, Origin 350, R12K O2, R4K Indigo², R8K Indigo², R5K Indy and R4K Indigo (have one, but was gutted and partly destroyed by some idiot). If you have the time and interest I'll try to get you going in no time.

Outlook

My rough plan for the future is to get new sgi userlands built and from there work on the other stuff, like RAMdisk kernels and maybe an ISO so users can install their systems on disk if needed and don't have to resort to network booting and nfsrb2 for building their file systems.
 

Elf

Storybook
Feb 4, 2019
776
238
43
Western United States
You might see if there is anything qemu on a fast machine, or cross-compiling, could do to help accelerate builds. This is sort of along the lines of what the SGUG-RSE team is looking at :)

Neat find though, and an interesting project!
 

johnnym

New member
Aug 7, 2022
17
18
3
Cross-compiling OpenBSD seems to be some higher magic and I couldn't find much information about how it could be done, apart from the official FAQ and two gists:


In this regard qemu also wouldn't be an option as it can't boot OpenBSD/sgi.

Really, I think the best way to "speed things up" would be to not build LLVM/clang for OpenBSD/sgi, because frankly, trying to build a new OpenBSD userland feels more like building LLVM/clang. ;)

Neat find though, and an interesting project!
Indeed! :)
 
  • Like
Reactions: Elf

johnnym

New member
Aug 7, 2022
17
18
3
Made some progress since my last post:

Guess what: if you do not intend to build LLVM/clang for OpenBSD/sgi userlands, you can do the whole userland build on a dual 300 MHz R12K driven Octane with 2 GiB of RAM in just a little more than 12 hours! Much better than 110 hours plus extra (for the parts that were not build after I did not continue the build process).

It then took another 10 hours to make the actual release files (sets, kernels (needed to be made again, respective Makefile patched to make use of the second processor), RAMdisk kernels (repective Makefile unfortunately not yet patched, so needed extra time for this run because it was done in serial), boot loaders and CDROM ISO) out of the userland created earlier.

So I now have an OpenBSD/sgi 7.0 userland available which includes the usual files (minus X) that were available in the past, e.g. on https://ftp.eu.openbsd.org/pub/OpenBSD/6.9/sgi/

I still need to test this, but at least during compilation of the files for the userland and release LLVM/clang wasn't needed, so it might indeed be save to do without on OpenBSD/sgi.

****

Actually I also came here (and also to the IRIXNetwork forums) with this topic because I'm in need of testers for all the machinery I don't have available. Hence I'd be really grateful for people with the following machines:
  • Tezro (uses the IP27 kernels)
  • Fuel (dito)
  • Origin 350 (dito)
  • R12K O2 (uses the IP32 kernels)
  • R4K Indigo² (uses the IP22 kernels)
  • R8K Indigo² (uses the IP26 kernels)
  • R5K Indy (uses the IP22 kernels)
  • R4K Indigo (dito)
...to test the kernels (and in the future also userlands) I have available on GitHub (linked from the original post).
 
  • Like
Reactions: Elf

johnnym

New member
Aug 7, 2022
17
18
3
I did some work on OpenBSD/sgi since my last post about it:

1. I tested the created 7.0 userland (incl. the recreated kernel) on my Octane with the help of nfsrb2 and it boots fine, the same way as the last "original" 6.9 version does and the same way the respective octeon file systems did.
There is one issue with this release though: You remember that I decided to not build LLVM/clang because it takes much longer than the userland and kernels alone. It looks like the extra software that is available for octeon (and that was available for sgi, too,in the past) through the ports tree (e.g. from https://ftp.eu.openbsd.org/pub/OpenBSD/7.0/packages/mips64/) is built with LLVM/clang, so seems to require the LLVM/clang libraries to run:

Code:
octane# pkg_add nano
quirks-4.53 signed on 2021-10-06T15:02:34Z
quirks-4.53: ok
nano-5.8:libiconv-1.16p0: ok
Can't install gettext-runtime-0.21p1 because of libraries
|library c++.8.0 not found
| not found anywhere
|library c++abi.5.0 not found
| not found anywhere
Direct dependencies for gettext-runtime-0.21p1 resolve to libiconv-1.16p0
Full dependency tree is libiconv-1.16p0
Can't install nano-5.8: can't resolve gettext-runtime-0.21p1
Couldn't install gettext-runtime-0.21p1 nano-5.8
Well, as bad as this looks, the solution is to just copy over the missing LLVM/clang libs from an octeon userland or the respective set. I assume it's in the base set, because if it was in the comp(iler) set, software from the ports tree wouldn't work without the comp set installed:

Code:
octane# cp -v /tmp/llvm-libs/* /usr/lib/
/tmp/llvm-libs/libLLVM.so.6.0 -> /usr/lib/libLLVM.so.6.0
/tmp/llvm-libs/libc++.a -> /usr/lib/libc++.a
/tmp/llvm-libs/libc++.so.8.0 -> /usr/lib/libc++.so.8.0
/tmp/llvm-libs/libc++_p.a -> /usr/lib/libc++_p.a
/tmp/llvm-libs/libc++abi.a -> /usr/lib/libc++abi.a
/tmp/llvm-libs/libc++abi.so.5.0 -> /usr/lib/libc++abi.so.5.0
/tmp/llvm-libs/libc++abi_p.a -> /usr/lib/libc++abi_p.a
/tmp/llvm-libs/libcompiler_rt.a -> /usr/lib/libcompiler_rt.a
I just copied everything I thought was related to LLVM/clang and that seems to have solved that issue:

Code:
octane# pkg_add nano
quirks-4.53 signed on 2021-10-06T15:02:34Z
nano-5.8:gettext-runtime-0.21p1: ok
nano-5.8: ok
octane# pkg_add htop 
quirks-4.53 signed on 2021-10-06T15:02:34Z
htop-3.0.5pl20210418:libffi-3.3p1: ok
htop-3.0.5pl20210418:pcre-8.44: ok
htop-3.0.5pl20210418:xz-5.2.5: ok
htop-3.0.5pl20210418:sqlite3-3.35.5p0: ok
htop-3.0.5pl20210418:bzip2-1.0.8p0: ok
htop-3.0.5pl20210418:python-3.8.12: ok
htop-3.0.5pl20210418:glib2-2.68.4: ok
htop-3.0.5pl20210418:desktop-file-utils-0.26: ok
htop-3.0.5pl20210418: ok
Running tags: ok
New and changed readme(s):
    /usr/local/share/doc/pkg-readmes/glib2
The binaries I tested from the ports tree - i.e. nano, htop, git, 7z, eopenssl-3.0 - installed and worked just fine after copying the LLVM/clang libs over.

So that is not the perfect solution I had hoped for, but for now I can live with it. And maybe if some people with more powerful machines chime in in the future, we can bring LLVM/clang back to OpenBSD/sgi, though I'm unsure if it is worth the time for compilation, if we could otherwise just copy over some files to make it work with the software from the ports tree.

****

2. I had another idea to tackle the IP28 problem that arose in OpenBSD/sgi 7.2: I don't know how to debug that problem any further without the help of some person with more skills here, but I could do some bisecting to chase down a possible cause. Up until now that wasn't possible, because I use different branches for each version of OpenBSD/sgi and added the needed reverts and adaptations on top. But what if I could make all these changes in one branch and at the time they happened or not happened actually:
Enter the sgi-never-retired branch, where I rewrote the history of the official OpenBSD source and "deleted" the commits that removed sgi stuff and adapted the commits that changed functionality that also needed to changed in the now still existing sgi related code. I will make another post about how to do that with git when I reached 7.2 in that branch. At the moment I am at 7.1 still. When I have reached 7.2 I can also finaly start with the bisecting process and hopefully narrow things down to a reasonable commit, fingers crossed.
 
  • Like
Reactions: Elf

johnnym

New member
Aug 7, 2022
17
18
3
The OpenBSD/sgi 7.0 release files are placed on a webserver allowing for HTTP and HTTPS access, i.e. find everything needed here:


I haven't yet configured an FTP server for anonymous access despite its hostname, but if this is still a thing I think I will do that, too.

I know proftpd and vsftpd, though not sure which one to use. Any suggestions, also for other FTP servers?

****

To verify the contents of the fles I have created a signify(1) keypair and signed the SHA256 file (resulting in SHA256.sig). You can hence verify that what you downloaded comes from me by using signify on OpenBSD (or signify-openbsd on Debian GNU/Linux) using my signify public key below.

My signify public key is:

Code:
RWRmGQ1rewM9vHtQ6vMcAUnRrsJqKO/Z+n07CXxQkTPpAOnsVa26CIUj
As all kernels were rebuilt during the release build, they have a different hash value than the ones on GitHub. But as I didn't yet test each on the matching hardware (bsd.mp.IP30 is OK though) I haven't yet changed the release page on GitHub. Actually they shouldn't differ in functionality as they don't differ code-wise.

I haven't tested on-disk installations from ISO or the RAMdisk kernels, so if interested, suit yourself. I'm happy to provide assistance if needed.
 
  • Like
Reactions: Elf

johnnym

New member
Aug 7, 2022
17
18
3

johnnym

New member
Aug 7, 2022
17
18
3
:unsure: Got my first ambiguity here:

I consider it a good commit when:
  • booting to the login prompt works
  • login works
  • uname, machine and sysctl hw work

Bad commits usually mount the NFS file systems but give bus errors, illegal instruction or segmentation faults when going through /etc/rc.

Now with a39c18f28d16b1a61658f6ce07a74bc58176db30 applied the Indigo² hangs after mounting the NFS root and the NFS swap (the NFS server logs it as "authenticated mount requests"). And I remember that I've seen something similar on my Octane when getting the commits together for the sgi-is-alive-at-7.2 branch, it didn't happen after applying a specific commit - of course I didn't log that one :rolleyes: . So from that I can't really say if this "badness" is related to my actual issue for IP28 and OpenBSD 7.2.

In the end I decided to define it as bad commit - well, it didn't boot until the login prompt - but I now think that was wrong. That because:

  • Nobody claims that a kernel will always work correctly for each and every commit
  • That issue seems to have been solved later on, because although the bad commits so far are bad in the sense of the description, they all didn't hang after mounting the NFS shares

So making it a good commit would have set the "left" border ("good" commits) a little closer to the "right" border (bad commits). It could be that I am working now on the wrong half. Well, if it doesn't give a reasonable result, I can still start the bisecting with that ambiguous commit as good, but will cost me hours... :mad:
 

johnnym

New member
Aug 7, 2022
17
18
3
Indeed! Each bisect step requires the following steps for verification:

  1. tar repo state - seconds
  2. copy it over to the NFS server - about 1m
  3. untar it on the Octane to a 15K SCSI disk from NFS - about 20m
  4. compile it - about 28m
  5. test it on Indigo² - under 5m

...so roughly 50 minutes per step.
 

johnnym

New member
Aug 7, 2022
17
18
3
Ok, I'm through, though I don't believe 4fc402b61836e8a270d48f5516496cfbfe57776b (0541ad29664165bece2a7957d02a188f9bafb73b in real) is the real culprit. I mean, yeah it is in the network code so could have a play in the problem EDIT: and it intruced a bug for big endian machines according to the next change to the according file, but why should it specifically affect IP28 and no other machine, not even the Indy which uses the same NIC?

I rather believe this is the reason of the problem I mentioned earlier (hangs after NFS mounts) and that I also saw on the Octane but which was gone later on EDIT: my memory played my a trick here: the Octane didn't hang but took a long time according to my logs at around the NFS mounts (the solution - according to the commit dates - was to replay 85caa4b also for sgi with 1583f1e). Also newer commits identified as bad did not hang on the Indigo². I'll look into that on the weekend probably.

Code:
$ git bisect log
git bisect start
# bad: [0d58b8b8b5e0621f84efa993ee9ef47605603beb] drop the -beta
git bisect bad 0d58b8b8b5e0621f84efa993ee9ef47605603beb
# good: [5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7] close enough to release, we drop -beta
git bisect good 5b7ece61fa1aa6c1348e9b8f2e7b0863e6ea20e7
# good: [9f850877c8e5a89e6bfb255f1f7026c00bb7875e] vmm(4): reference count vm's and vcpu's
git bisect good 9f850877c8e5a89e6bfb255f1f7026c00bb7875e
# bad: [970cf9c09324fa9781be02dc122de675098fbf1d] Don't yet configure smmu(4) on Qualcomm SoCs as used on the Lenovo x13s as it is still not ready for runtime use and probably needs further quirks.
git bisect bad 970cf9c09324fa9781be02dc122de675098fbf1d
# good: [a7cdf5850edf952aab05a1d12a910add326a1f7c] Make test table based, extend it a little
git bisect good a7cdf5850edf952aab05a1d12a910add326a1f7c
# bad: [a39c18f28d16b1a61658f6ce07a74bc58176db30] strlen was in v6 libc (s5/perror.c) but not documented till v7 ok schwarze@
git bisect bad a39c18f28d16b1a61658f6ce07a74bc58176db30
# good: [17fc9e5b1d3178a8d65cacff7114b83163f14a02] The IPv4 reassembly code is MP safe, so we can run it in parallel. Note that ip_ours() runs with shared netlock, while ip_local() has exclusive netlock after queuing.  Move existing the code into function ip_fragcheck() and call it from ip_ours(). OK mvs@
git bisect good 17fc9e5b1d3178a8d65cacff7114b83163f14a02
# bad: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@
git bisect bad 4fc402b61836e8a270d48f5516496cfbfe57776b
# good: [cd0dd8f18578e5b883f43aa8ea64d31710ca1159] Force disabling the use of delay slots. This is ugly but gets the compiler to produce 99+% correct code at all optimization levels, and can help people who would like to tinker a bit with the backend.
git bisect good cd0dd8f18578e5b883f43aa8ea64d31710ca1159
# good: [0e641b41fe54fcc8de2a3351ff974e0677060113] Remove bogus mtw_read_cfg.
git bisect good 0e641b41fe54fcc8de2a3351ff974e0677060113
# good: [5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f] Zap prototypes for nonexistent nd6_setmtu() and in6_ifdel()
git bisect good 5e24e96cb0c2a092c174a5e9f83d4cbadf271e3f
# good: [e62afb52dea0a7b7d0c6c099652a54e60340a22d] Fix RFC number in comment
git bisect good e62afb52dea0a7b7d0c6c099652a54e60340a22d
# good: [61f35befa9a0619b3becea84efb445917c00389e] Add a second test to validate the tables in the library.
git bisect good 61f35befa9a0619b3becea84efb445917c00389e
# first bad commit: [4fc402b61836e8a270d48f5516496cfbfe57776b] Checking the fragment flags of an incoming IP packet does not need the mutex for the fragment list.  Move this code before the critical section.  Use ISSET() to make clear which flags are checked. OK mvs@
 
Last edited:

johnnym

New member
Aug 7, 2022
17
18
3
Couldn't let it rest yesterday:

So I followed my suspicion from Wednesday and indeed, with the changes from commit 1109691f1d2 applied on top of 4fc402b6183 the hangs after the NFS mounts are gone and the Indigo² happily boots to the login prompt and works correctly, so this is not the real problem. Checking out the whole repo at 1109691f1d2 and compiling it also gives the same result. :D

So we got a new good commit and the last bad commit that didn't hang after the NFS mounts as new bad commit for another round of bisecting. This time only 104 commits to search through, "roughly 7 steps" according to git bisect.
 

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu