Origin 350 Cluster Mixing Bricks from 2 different Systems....

asmay

New member
Aug 13, 2020
3
0
1
Hey all, don't mean to hijack this thread, but thought folks on here might be the closest to an answer. I have 2 older systems: L2 controller, Numalink router and 6 bricks (base io and MPX modules) all orgin 350s of the same vintage as this thread. Does anyone know or have found a hack to be able to mix bricks from 2 different systems with different serial numbers, and still be able to use all CPUs across the cluster? My understanding is SGI implemented a hardware check that if you remove one of the bricks from the cluster (lets say baseIO) and replace it with a brick from a different stack, they check for this and it doesn't work. In fact it doesn't work.../var/adm/SYSLOG shows only the boot brick CPUs being used and all other (5 bricks) throwing a bunch CPU exception errors and being de-allocated. If anyone has any ideas how to hack/use bricks from different systems (serial numbers), that would be awesome. I am trying to keep a critical system going and the 20 year old parts are failing one at a time...thanks a lot for any help in advance.
 

asmay

New member
Aug 13, 2020
3
0
1
Hey all, looking for some help/ideas.... I have 2 older systems: L2 controller, Numalink router and 6 bricks (base io and MPX modules) all orgin 350s of the same vintage as this thread. Does anyone know or have found a hack to be able to mix bricks from 2 different systems with different serial numbers, and still be able to use all CPUs across the cluster? My understanding is SGI implemented a hardware check that if you remove one of the bricks from the cluster (lets say baseIO) and replace it with a brick from a different stack, they check for this and it doesn't work. In fact it doesn't work.../var/adm/SYSLOG shows only the boot brick CPUs being used and all other (5 bricks) throwing a bunch CPU exception errors and being de-allocated. If anyone has any ideas how to hack/use bricks from different systems (serial numbers), that would be awesome. I am trying to keep a critical system going and the 20 year old parts are failing one at a time...thanks a lot for any help in advance.
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
I’ve only seen posts about this and I don’t own any systems to do this myself. But the easiest way to do what you’re asking is to extract your RTC chips buy a new blank ones and insert them into the bricks you want to add to an existing cluster while they are all hooked to your L2, then apply power, then manage serial numbers from the L2.

look up the L2 documentation, but there should be a command to re-enumerate all attached systems. It should just give serials to all units that don’t have a serial number.

The serial numbers are stored in the RTC chips, in the nvram, you’ll then get errors that you have no serial number at all then you use the L2 to establish serial numbers on the “new” bricks.

Also I was under the impression that there can only be a certain number of compute bricks for every main brick. But I’m under the impression that you can have multiple groups under a single L2?

The reason I talk about the RTC is if things go sideways and you want to go back to the way things were simply re-insert the old RTC into the bricks (assuming you don’t have dead RTC batteries then the chips will wipe immediately on power off. But you know’d that already) and they’ll have their old serial number and act like they did before. Just don’t mix up which RTC’s go to which bricks.

I’d recommend you try “adding” bricks to an existing setup (like you want). Research the L2 commands first, get new RTC chips off eBay or whatever for the systems you want to add. Replace those chips and hook them up to the existing cluster. Try booting the entire cluster and watch the L1s of the “new” nodes. They should complain about not having SSN. Then go to L2 and issue to reassign all serials. I don’t think existing bricks will be affected, I think it will assign new serials in the existing cluster’s range to extend to the “new” bricks.
That’s all I really know. You can find more on old Nekochan postings about O350 serial hacking, there are still some remnants left on gainos.
 

asmay

New member
Aug 13, 2020
3
0
1
Thank you so much at least its a path...Ill look at the other Nekochan postings as well..
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu