SGI Fuel does not power up

jvakon

New member
Sep 2, 2020
7
0
1
Hi,

I recently salvaged an SGI Fuel from being disposed. The system does not power up and I get a whole bunch of ALERT messages when connected to the L1 port on the system board:

ALERT: NODE EEPROM read error, no acknowledge
ALERT: Error initializing the NODE 0 monitor, no acknowledge
ALERT: Error initializing the NODE 1 monitor, no acknowledge
ALERT: Error initializing the NODE 2 monitor, no acknowledge
ALERT: Error initializing the PIMM monitor, no acknowledge
ALERT: Error initializing the ODYSSEY monitor, no acknowledge
ALERT: Error initializing the BEDROCK monitor, no acknowledge

and so on, complaining on voltages, fan speeds, temperature sensors etc. Then, after many lines:

****************************************
controller firmware panic! resetting...
****************************************

IMAGE B: Rev. 1.30.16
[thread ID 300013cc stack]
TR: ffecb6e4 ffec85dc ffedc4c4 ffedc710 ffedc87e ffedaaa0 ffedb41a
TR: ffedb604 ffeb8918 ffeb8a5e ffec41ec ffe805cc ffeb1728 00000000

(if you see this, please email ssh@sgi.com and include
the output from the 'log' command and a description of
what caused the problem)
ALERT: Running Flash image A because image B failed during boot and is probably corrupt.

The same ALERT messages then repeat in semi-random order for flash image A. It does not drop to a prompt at any point.

Does anyone know what might be causing this? I think it's unlikely both flash images are corrupted. Is it a PSU problem?

Thanks for any hints!

John
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
I am not sure whether a PSU issue would be my first guess, but, at least the power supply is easy to test? If you have a multimeter, try measuring the output of the standby power rail while the L1 controller is booting.

Measure it in a DC mode to get an idea of the level, and also in an AC mode for a very rough estimation of ripple.

Also, welcome to the user group :)
 

jvakon

New member
Sep 2, 2020
7
0
1
Hi Elf,

Thanks for the suggestion. I measured all lines of the PSU and, with the P1 molex disconnected from the main board I found two lines at +5.15V. With the molex in place on the MB these lines read +4.72V. No significant ripple in either case. Is it a case of a slight undervolt?

Thanks again.
 

jvakon

New member
Sep 2, 2020
7
0
1
OK, so I remeasured the voltage of the purple line, which should be 5VSB from what pinouts I can find, and it is +5.15V with the L1 booting. So, probably not an undervolt after all!

John
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Hm, I would guess something has gone wrong with the L1 controller or its connection to the rest of the system. Might be worth trying to disassemble things and see if there are any loose connectors or things that need cleaning?

Unfortunately the troubleshooting stage is pretty wide open from here...
 

jvakon

New member
Sep 2, 2020
7
0
1
A further update on this misbehaving Fuel. I managed to get to the L1 prompt (actually, the prompt was always there but was being overwritten by ALERT messages) so now I can get some more info on the system. env produces the following output for the power supply:

Description State Warning Limits Fault Limits Current
-------------- ---------- ----------------- ----------------- -------
12V Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.00
12V IO Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.00
5V Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 0.00
3.3V Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.55
2.5V Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.05
1.5V Wait Pwr 10% 1.35/ 1.65 20% 1.20/ 1.80 0.23
5V AUX Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 3.95
3.3V AUX Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.00
PIMM 12V BIAS Wait Pwr 10% 10.80/ 13.20 20% 9.60/ 14.40 0.00
SRAM Wait Pwr 10% 2.25/ 2.75 20% 2.00/ 3.00 0.00
VCPU Wait Pwr 10% 1.44/ 1.76 20% 1.28/ 1.92 0.00
PIMM 1.5V Wait Pwr 10% 1.35/ 1.65 20% 1.20/ 1.80 0.00
PIMM 3.3V AUX Wait Pwr 10% 2.97/ 3.63 20% 2.64/ 3.96 0.00
PIMM 5V AUX Wait Pwr 10% 4.50/ 5.50 20% 4.00/ 6.00 0.00

5V AUX is reported as 3.95V, and there are no other live lines. Can anyone comment on whether this may cause the ALERT messages (and lack of power-up) in this Fuel?

Thanks!
John
 

indigofan

Member
Jun 8, 2020
64
22
8
I had an issue with mine, I reseeded the fans and voila, it worked.. Not sure if it will solve your issues, but maybe worth a try...?
 

jvakon

New member
Sep 2, 2020
7
0
1
Thank for the suggestion indigofan. No joy on reseating the fan connections I'm afraid. I also disconnected the HDD and CD-ROM, and even took out the graphics module. No change :(
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
I would think 5V AUX being that low could cause instability in the L1 controller. It might be worth trying a PSU replacement!

I'd also try testing the PSU's 5V AUX line with it out of the case, under a bit of load (e.g. resistive load). That might let you know whether something is dragging the PSU's standby power low because it is almost shorting it out, or whether it is the PSU's fault. I would suspect the PSU first, but always good to check.
 

jvakon

New member
Sep 2, 2020
7
0
1
OK, I did a bit more investigation with the help of a friend more gifted in the electronics department...

When plugged out, the PSU 5V AUX line gives 5.15V under no load, and 5.05V under resistive load. We also measured again the voltage while the PSU was plugged in and the L1 was working (by sticking wires from the top of the connector), and that came to 5V too. So, we're fairly sure the PSU 5V AUX line works as it should, and that no load on the motherboard causes it to drop. We also tested and the short breaker of the PSU and that works fine.

Despite this, the env command still reports 3.7-3.9V in the 5V AUX line. I'm at a loss as to what might be happening electrically!
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Hm, there's a Dallas DS1780 which is used to monitor voltage rails that sometimes fails. It could cause the low voltage reading though I'm not sure that it would cause the L1 controller to panic, though I suppose anything is possible when it comes to an I2C device sending back data that may not be expected :)

You might try to locate the DS1780, see where standby power is fed to it, and see if that still reads as 5V. You can also try turning the environmental monitoring off (env off, I believe) and see if it boots, though it may not fix the issue; it's hard to say whether the L1 controller still tries to talk to the DS1780 when environmental monitoring is off, without scoping it out.

For reference: https://gainos.org/~elf/sgi/nekonomicon/forum/3/16206/1.html
 

jvakon

New member
Sep 2, 2020
7
0
1
Hi Elf,

Thanks for linking to the old post. Doesn't look like a quick fix - to be put on the back-burner, probably. Env off does not fix things, BTW.

Thanks again!

John
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
Hi John,
After doing my own Fuel mainboard repair. I’ve seen something like this. The “no acknowledge” style errors means one or more if you’d ENV monitoring ICs has shorted and is shorting the common i2c bus they all use to communicate to ground, thereby taking out every DS1780 IC’s ability to report in during startup.

I would try one thing before going further and that’s is, remove your graphics card and reconnect to L1 and see what changes, I say that because while you won’t be able to boot, you may find the number of no acknowledgements goes from that huge list… to one or two no acknowledgements. If it does, it means the i2c bus is shorting due to a short in your graphics card and you've isolated it. If the number doesn’t change its in your mainboard or processor module.

Over on Irixnet I have a repair ad in the services section to do DS1780 replacements (I have the tools that make it easy for me to do with high success).

If you’re in the USA, please consider it. I cannot tell you reasonably which DS1780 it is without a lot more time. But I can simply replace all the mainboard ENV ICs quickly to new ones. I can also replace them in the process and graphics too but if either of those have damaged ICs it means a short in the processor card or graphics caused that damage (so there is more damage then just the ENV ICs, in those two cases).

Try the graphics removal trick and let us know what env reports! Also a poorly regulated PSU while damage these monitoring ICs so don’t test for long if you don’t have a clean power source.

Good luck.
 

megaimg

Member
Nov 24, 2021
32
18
8
In my case, The Fuel will power up and after couple of seconds will power down. If I got lucky will stay up after couple of tries. When I look at the boot process. Sometimes the 12v rail will drop to 10.6. This will tell the monitoring system to power down the machine to prevent any damage. For me this was a PSU issue. I posted the process to repair the PSU, in my case was capacitors replacement (80% of them where bulging already) in all the boards of the PSU. Now is working like a chap. I put the procedure and cap list for the model I have on my post.
 
  • Like
Reactions: Elf

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu