Origin 350 Modification Log - Expansion Brick to CPU Brick

CiaoTime

Public Enemy Number One
Jan 15, 2020
45
27
18
Vancouver, BC, Canada
Greetings! This is a half-reference, half-journey through my efforts to take the expansion chassis out from my O350 rack and turn it into a NUMAlink-capable processor module without losing any of its original functionality. While this guide will be centered around carrying out modifications to a graphics brick, the same procedure should work for any Origin 350 chassis that contains a full baseboard, such as an MPX brick.


Section 1: Original Configuration


1582852611722.png
1582852989931.png
1582853493368.png


Discord regulars may have seen occasional mention of a system called 'Octodad' in passing mention: here it is as it was originally configured. Purchased in September 2018 from eBay user and SGI guru mopar5150, the system has been running nearly non-stop as a remotely accessible build server, test system, and Quake 3 machine ever since. The processor brick contains four 800mhz R16000's with 8 gigabytes of RAM. The PCI slots are populated by sound devices, an IO9 board, and a USB card. Attached to it is a G2 brick, originally from an Onyx4 system: the G2 brick connects over the XIO port and provides an enclosure with two ATI FireGL X1 graphics cards for native four-monitor video output and synchronization. It's by far my fastest IRIX machine, to the point where it's essentially daily-drivable in a pinch.


Section 2: Closer Look At The G2 Brick

1582853804188.png

Opening the G2 brick is straightforward: roughly a dozen Philips-head screws hold the lid to the rest of the chassis, which swings vertically open once freed. The interior of the unit is very similar to that of a compute brick, with a few exceptions (that will prove troublesome later on).

From the top: the power supply is standard fare for an O350, as is the single exhaust blower at the top left of the image above. The aft center of the system is taken up by the ImageSync connector panel: on a normal Origin 350 compute brick, this spot is normally taken up by a second exhaust fan. The dual-AGP Pro riser board is exclusive to the G2 and G2N bricks, and sits in place of the normal PCI-X riser. An additional GPU-facing fan sits where the SCSI backplane would normally be attached.

There is also a metal divider sitting where the processor board is normally mounted: thankfully it's only held in place with two Torx T20 screws and slides out of the chassis once they're removed. All of the factory stand-offs for mounting a CPU board are present, and the press-fit connectors at the center of the baseboard are populated by default.


Section 3: Installation


The IP53 processor board itself was a complete assembly that I had purchased in January (thanks, 3ddoc!). It has four 700mhz R16000's, and came pre-loaded with 4 gigabytes of additional memory.

No physical modifications are needed to connect the processor board to the G2 brick's baseboard, though the physical installation can be slightly perilous. Unplugging the ImageSync cables from the AGP riser helps to clear some space in the center of the chassis. I found it best to hold the front-facing end of the processor board with one hand and the 'handle' over the CPU heat spreader with the other, making sure to keep the board as level as possible. Be gentle, use the position of the chassis' screw stand-offs as an alignment guide, don't apply much pressure or side-to-side movements, and the board will eventually find a spot where it can seat firmly onto the connectors.

1582857461334.png

The processor board appears to use 8-32 thread screws in order to mount to the chassis. I didn't have any of those on hand, but I did have a bag of NAS1801-08-06 aircraft bolts! They worked perfectly, keeping a good ground connection and lining things up snug and secure throughout.

1582856474694.png

Even without an IO9 board, an expansion chassis set up like this will still have an L1 controller accessible via the serial console port at the back of the unit. Connect a null modem serial cable set to 38400, 8, N, 1, with RTS/CTS flow control and plug the chassis in to start the L1 controller. The system can be tested in this state, but I'd instead opted to connect the G2 brick back to my original C-brick. Since the system now requires CPU to CPU communication, the two bricks have to be connected over their NUMAlink ports instead of the XIO ports: the same cable can be used without any modification.

Typing * pwr up in the compute brick's L1 controller automatically initializes both sides of the system. From there, I'd booted IRIX from the main unit as usual -- and, success!

Code:
OCTODAD% hinv
Processor 0: 800 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.2
FPU: MIPS R16010 Floating Point Chip Revision: 2.2
Processor 1: 800 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.2
FPU: MIPS R16010 Floating Point Chip Revision: 2.2
Processor 2: 800 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.2
FPU: MIPS R16010 Floating Point Chip Revision: 2.2
Processor 3: 800 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.2
FPU: MIPS R16010 Floating Point Chip Revision: 2.2
Processor 4: 700 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.1
FPU: MIPS R16010 Floating Point Chip Revision: 2.1
Processor 5: 700 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.1
FPU: MIPS R16010 Floating Point Chip Revision: 2.1
Processor 6: 700 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.1
FPU: MIPS R16010 Floating Point Chip Revision: 2.1
Processor 7: 700 MHZ IP35
CPU: MIPS R16000 Processor Chip Revision: 2.1
FPU: MIPS R16010 Floating Point Chip Revision: 2.1
Main memory size: 11264 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes
Secondary unified instruction/data cache size: 4 Mbytes
Secondary unified instruction/data cache size: 4 Mbytes
Secondary unified instruction/data cache size: 4 Mbytes
Secondary unified instruction/data cache size: 8 Mbytes
Secondary unified instruction/data cache size: 8 Mbytes
Secondary unified instruction/data cache size: 8 Mbytes
Secondary unified instruction/data cache size: 8 Mbytes
Integral SCSI controller 2: Version IDE (ATA/ATAPI) IOC4
  CDROM: unit 0 on SCSI controller 2
Integral SCSI controller 0: Version QL12160, low voltage differential
  Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 1: Version QL12160, low voltage differential
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3/IOC4 serial port: tty5
IOC3/IOC4 serial port: tty6
Graphics board: SG2
Graphics board: SG2
Integral Gigabit Ethernet: tg0, module 001c01, PCI bus 1 slot 4
Iris Audio Processor: version EMU revision A4, number 1
IOC3/IOC4 external interrupts: 1
USB controller: type OHCI
USB controller: type OHCI
At this point, the system is fully functional for short-term testing, but there's still a few issues that need some extra attention. One bank of RAM on my new processor board appears to be faulty (hence why hinv is reporting 11gb instead of 12gb), and the system starts hitting the advisory temperature threshold after about half an hour of usage. My next post will describe the O350's cooling system in more detail, as well as some of the things I've tried to solve the problems involved with such a mod.
 
  • Like
Reactions: LarBob and Elf

massiverobot

irix detailer
Feb 8, 2019
121
108
43
Philly
twitter.com
Nice setup. Is there any reason you just don't get a bunch of high-vel fans and drop them in the case facing the right way and hang a PC PSU to power them? Other that the clearly UN-aesthetic appearance of it, or the noise. But you would have a SGi that works longer than 30 min.
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Nice setup. Is there any reason you just don't get a bunch of high-vel fans and drop them in the case facing the right way and hang a PC PSU to power them? Other that the clearly UN-aesthetic appearance of it, or the noise. But you would have a SGi that works longer than 30 min.
Just looking at the power in there I do see what looks like plenty of 12V; could probably tap some of that to drive the fans w/out needing an extra PSU.
 
  • Like
Reactions: chulofiasco

CiaoTime

Public Enemy Number One
Jan 15, 2020
45
27
18
Vancouver, BC, Canada
I've since confirmed that the 1gb of disabled memory is an issue with my processor board itself. Bad luck! I've never encountered an IP53 board with memory issues before: that said, the system's still got 11 gigs of perfectly happy RAM to work with. Moving on:

Here's the L1 env results after twenty minutes of sitting at an idle IRIX desktop. Voltage readings are all perfect, so I'm only showing fan and thermal results. This is what a normal Origin 350 compute module outputs:
Code:
Description     State       Warning RPM  Current RPM
--------------- ----------  -----------  -----------
FAN  0  EXHST 1    Enabled         1980         2311
FAN  1  EXHST 2    Enabled         1980         2295
FAN  2       PS    Enabled         3200         4272
FAN  3    PCI 1    Enabled         1980         2909
FAN  4    PCI 2    Enabled         1980         2657

                              Advisory   Critical   Fault      Current
Description       State       Temp       Temp       Temp       Temp
----------------- ----------  ---------  ---------  ---------  ---------
0 INTERFACE 0       Enabled   31C/ 87F   48C/118F   55C/131F   16C/ 60F
1 INTERFACE 1       Enabled   31C/ 87F   48C/118F   55C/131F   17C/ 62F
2 INTERFACE 2       Enabled   31C/ 87F   48C/118F   55C/131F   19C/ 66F
3 PCI RISER         Enabled   31C/ 87F   48C/118F   55C/131F   20C/ 68F
4 ODYSSEY        <not present>
5 NODE              Enabled   31C/ 87F   48C/118F   55C/131F   18C/ 64F
6 BEDROCK           Enabled   31C/ 87F   48C/118F   55C/131F   15C/ 59F

And this is what's going on with my modified expansion chassis:
Code:
Description    State       Warning RPM  Current RPM
-------------- ----------  -----------  -----------
FAN 0  EXHST 1    Enabled         2160         3214
FAN 1       PS    Enabled         1575         2860
FAN 2    PCI 1    Enabled         1980         2295
FAN 3    PCI 2    Enabled         1980         2445

                              Advisory   Critical   Fault      Current
Description       State       Temp       Temp       Temp       Temp
----------------- ----------  ---------  ---------  ---------  ---------
0 INTERFACE 0       Enabled    [Autofan Control]    75C/167F   53C/127F
1 INTERFACE 1       Enabled    [Autofan Control]    75C/167F   51C/123F
2 INTERFACE 2       Enabled    [Autofan Control]    75C/167F   42C/107F
3 PCI RISER         Enabled    [Autofan Control]    75C/167F   32C/ 89F
4 ODYSSEY        <not present>
5 NODE              Enabled    [Autofan Control]    75C/167F   36C/ 96F
6 BEDROCK           Enabled    [Autofan Control]    85C/185F   70C/158F
Yikes! Fan speeds are fine, but the processors are running much hotter than they should be at idle. And the Bedrock ASIC is approaching the auto-shutdown limit! What gives? Well...

Section 4.1: Principles of O350 Cooling


1583085063671.png

This is what a normal Origin 350 compute brick looks like, fully assembled. All airflow is from left to right in this image: the SCSI and PCI side of the chassis (up top) is separated, and pushes air past the cards and through ventilation holes at the top right. The power supply has an integral fan that helps push its airflow to the back: the remainder of the system is entirely cooled from the two exhaust blowers at the back. Cold air is pulled in through the front of the chassis, directed along the VRM and through the CPU cage via the beige plastic shroud, and then vented out from the rear.

1583086024649.png

Thanks to Discord user NoodlesMcPastaMan, here's an image of the processor board itself, with the CPU cage removed and some simple labels added. It's narrow, and the heat sinks are relatively short: the cage itself seems to serve no thermal purpose beyond redirecting airflow.

With this in mind, the issues with the expansion brick are obvious: air is flowing over the cage, rather than through it, and one of the two fans responsible for all system cooling is missing in action.

Section 4.2: Initial Supplementary Cooling Attempts

1583086434477.png

Back to the expansion chassis. Elf's comment is correct: the baseboard itself provides multiple 12 volt headers to play around with, and one of them is conveniently located right next to where the processor board mounts. It is not a standard PC fan header. Plugging a normal 3-pin PC fan into it will fry said fan to smithereens.

1583086866962.png


Pin 2 on the SGI was varying between about +3V and +4.2V during my testing. Not wanting to mess around with speed control, I'd re-pinned an extension cable to ignore pin 2 and provide the correct +12V and ground to a few standard 40mm Noctua fans. My first idea was to push air alongside the VRM heat sink with one fan, and have one or two more fans pushing air through the CPU cage, in the hopes that doing so would negate the need for the beige CPU shroud. The exhaust fans at the back are fairly powerful, and just one should be capable of venting everything to a reasonable level in theory.

I ran the extension cable and one 40mm fan over to the VRM, temporarily tied it in place, and checked the heat on the VRM with an IR thermometer.

1583087169338.png

Seems to work. I'd forgotten to log temperature values for the VRM as it warmed up and hit equilibrium - but the good old touch-test confirms that it's making a notable difference.

Unfortunately, placing two 40mm fans flush against the front of the CPU cage was not enough to make a tangible difference in cooling the Bedrock and the chips themselves. I had to stop testing as the chips came close to shutoff range: my next post will cover further attempts at a more permanent cooling solution.
 
Last edited:
  • Like
Reactions: Elf

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu