350 Mhz O2 CPU Back From The Dead

pierocks

New member
Mar 2, 2021
18
13
3
It's been a while...

I've had a lot of SGI systems in the last 16 years, but I got rid of everything I had left about 7 years ago when I moved from the Midwest to the PNW. I briefly became active again shortly before Nekochan kicked the bucket, in that brief time, I rounded up a couple of o2's (one care of everyone's favorite curmudgeon, Hamei), and a really rad Tandem re-badged Challenge L deskside (that's a different topic, though...). Well, life got in the way for four or five years and my machines sat untouched and unloved. I can't say why I decided to pull my O2s down off the shelf a few days ago, but I did. After kicking around a bit, I decided the best course of action would be to take all the best parts I had and put them into the chassis with the best plastics and leave the rest as spares...

The best CPU I've got on hand is a 350 Mhz RM7000, so I installed it into the best R5k motherboard I've got, filled it up with RAM, stuck it into the chassis and...nada. Red LED. The O2 boot diagnostic flowchart says it's a bad CPU. But O2s are finicky, right? I spent a while cleaning contacts, re-seating things, swapping motherboards, CPU risers, you name it. I sometimes got it to boot, but it was usually very unstable. Eventually the "sometimes" became "never", and I assumed the worst. Swapping other CPUs into the same board worked. Maybe I'd just settle for a lousy 180 Mhz R5k.

But why quit there? The odd thing was, I was usually able to get it to boot once after cleaning the contacts with alcohol. Not every time, but probably half the time, if I cleaned the CPU connector off real good, waited a bit, then gave it a go, she'd boot. She might even run for a bit, then lock up. Odd indeed. Then I thought, mayhaps...just a little bit of alcohol was hanging out and bridging a marginal contact or broken solder joint between the CPU connector and the daughterboard? See figure A for a visual aide if you've never seen the bottom side of an O2 CPU daughterboard:

IMG_0868.png

(Figure A)

What the hell, I had time during a boring conference call. I installed the CPU riser blocks (see figure B if you have no idea what I'm talking about) into the CPU and did a continuity test on every last pin.

IMG_0870.png

(Figure B)

All seemed continuous (ominous foreshadowing!!!). If it wasn't the connection between the CPU daughterboard and these riser blocks, then surely there was a faulty solder joint somewhere. With no shortage of conference calls at $DAYJOB that I can completely tune out of, I heated up the soldering iron with my trusty SMT tip installed and re-melted every last one of those ~200 contacts. Did it fix the problem? No. Of course not.

So what now? Defeat? Nay! I did what I should have done in the first place. I closely visually inspected the CPU connector on the daughterboard. The picture above isn't the R7k module (it's one of my R5k modules), but the connectors are identical. Except for the fact that maybe 10 or 12 of the little metal fingers inside one of the connectors had lost their spring. They were sitting flush up against the plastic connector housing rather than standing proud like their 190ish other comrades. So I did what anybody in my position would do and raided my wife's sewing kit for a needle and got to work prying them away from the connector housing and back into shape. Once again standing proud and ready to make a solid connection, I re-installed the cpu and...success! I booted into Irix (much more quickly than with the R5k) and away I went. Here's what she looks like now:

Code:
bash-4.2# hinv
CPU: QED RM7000 Processor Chip Revision: 3.3
FPU: QED RM7000 Floating Point Coprocessor Revision: 2.0
1 350 MHZ IP32 Processor
Main memory size: 1024 Mbytes
Secondary unified instruction/data cache size: 256 Kbytes on Processor 0
Ternary unified instruction/data cache size: 1 Mbyte on Processor 0
Instruction cache size: 16 Kbytes
Data cache size: 16 Kbytes
FLASH PROM version 4.18
Integral SCSI controller 0: Version ADAPTEC 7880
  Disk drive: unit 1 on SCSI controller 0
  CDROM: unit 4 on SCSI controller 0
Integral SCSI controller 1: Version ADAPTEC 7880
On-board serial ports: tty1
On-board serial ports: tty2
On-board EPP/ECP parallel port
CRM graphics installed
Integral Ethernet: ec0, version 1
Iris Audio Processor: version A3 revision 0
Video: MVP unit 0 version 1.4
AV: AV1 Card version 1, Camera not connected.
Vice: TRE
I'm currently riding high. Now I just need to dig a PS/2 keyboard and mouse out of storage so I can setenv console g and have the full experience...
 

Jacques

Active member
Dec 21, 2019
166
65
28
Somerset, United Kingdom
Hey well done! Nothing quite like getting one of the little toasters up and running. Those 350mhz R7000 cpus aren’t actually that bad at all, certainly more grunt than the 300mhz RM5200 ones and you have less noise and more storage options than r10/12k. And you have 1gig of ram, very jealous!! Been trying to find another 256mb for mine for ages to get it to 512mb.

I ran my O2 a bit two weeks ago but I suspect mine is starting to shows signs of failing PSU. It turned on fine and then halfway through just powered down. Had to unplug it, let it sit and then only after 2-3min would it start, and keep running!

Well done on the revival though!
 
Last edited:
  • Like
Reactions: Elf

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Great troubleshooting work! It is interesting what kind of problems are happening -- physically, electrically -- with machines of this age. I have a similar intermittent failure issue with some Fuel RAM slot connectors and probably need to go through the same sort of exercise.
 

pierocks

New member
Mar 2, 2021
18
13
3
Don't congratulate me too heartily on my troubleshooting skills...

While trying to troubleshoot this, I put my system together a little too quickly after spraying down the CPU connectors with contact cleaner (in my defense, it did look like it had all evaporated) and the instant I plugged the power cord into the PSU, something inside the PSU popped. Luckily, I had a second PSU so I could continue, and nothing else got fried, but now I'm on to fixing my (now) spare PSU.

Took me a while to see any fried components, but finally found these guys on the AC side:
IMG_0878.png


Is this the only damage? We shall find out. Kinda surprised the damage happened on the AC side, but I'd say the two EE classes I took in college 16 years ago don't qualify me in any way to speculate.

Now to do the thing we are probably all too familiar with...counting the minutes until the arrival of my Digikey order :ROFLMAO:
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Ouch, interesting, you only sprayed down the CPU connectors and nothing inside the power supply? Maybe it was on its way out anyways 😅
 

pierocks

New member
Mar 2, 2021
18
13
3
The CPU connector, and the mating connector on the motherboard. Of course there was some overspray...maybe some hiding underneath a component, shorting out 3.3v or 5v and GND.

Or like you said...maybe the PSU was already teetering on the edge of oblivion. It had gotten tons of plug/unplug cycles in a short period of time during this exercise as I was constantly swapping out motherboards/cpus. ¯\_(ツ)_/¯

Edited to add: I was spraying outside, and even though this can of contact cleaner seems to spray very aggressively, I doubt any made it's way inside the PSU still sitting in the chassis indoors :ROFLMAO:
 
Last edited:
  • Like
Reactions: Elf

pierocks

New member
Mar 2, 2021
18
13
3
Well…no joy on the power supply. Replaced the fried resistors and plugged it in…and the PSU fuse blew, so clearly something else is going on (maybe a capacitor failed short?) I’ll start another thread specifically about the dead PSU at some point. Don’t have much time this week…
 
  • Sad
Reactions: Elf and flexion

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Not entirely surprised I guess; those resistors don't usually just go that bad on their own, so usually something downstream will be drawing too much current. It could be the line side capacitors, or maybe the switching MOSFETs, or a whole host of other smaller things.
 

pierocks

New member
Mar 2, 2021
18
13
3
Yeah, I was hoping that the resistors blew because there was a short somewhere from un-evaporated contact cleaner, but obviously there's more going on here. Either some sort of cascading failure from the aforementioned contact cleaner, or it was just a wild coincidence and something else failed and took it down.

This time the fuse blew rather than the resistors, so I wonder if I ended up buying resistors with a higher power rating than the original spec (since I had to guess at that part...the only marking I could read was the resistance). Or maybe after years of use they had deteriorated enough that the fuse wasn't enough to protect them. ¯\_(ツ)_/¯

If this is anything like the rackmount UPS I fixed a while back, it could take me 2 or 3 years of occasionally poking at it to find and fix the issue :ROFLMAO:
 
  • Like
Reactions: Elf

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu