NIC programming

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
One of my CPU module has an issue. The computer reports "CPU board NIC diagnostic *FAILED*, Check or replace: CPU module".
One forum member suggested the problem could be the 1-wire chip (DS2505). One could obtain the DS2505 chip from a working CPU module to solve the issue. However, the serial number reported won't match the one on the CPU module. It makes more sense to figure out a way to program this DS2505 chip.

Programmer:

I found a one-wire programmer from file:///C:/Users/Atlas/Documents/Arduino/One%20Wire%20Programmer%20from%20m1l3n%20on%20Tindie.html.
It is designed to work with DS250X. Took me some time to learn the code and modified to work with DS2505. Later, I found another one-wire programmer from AliExpress. It is a bit more costly, but it works with official OneWireViewer.jar. It took many weeks to receive the programmer from AliExpress so I end up using Arduino version for most of my study.

Reading DS2505 from the CPU module

Below is the output of my Arduino code:

R12k 360MHz original chip

In hex:
Page 0: 01 20 20 20 20 48 42 50 35 32 33 20 20 20 20 20 20 20 20 20 20 30 33 30 2D 31 35 39 31 2D 44 6E
Page 1: 20 20 20 30 30 31 20 20 20 42 FF FF FF FF FF 7F 20 20 20 20 50 4D 32 30 33 36 30 4D 48 5A E2 98


Convert the orange values to ASCII, we get ' HBP523 030-1591-'
Repeat the process for the green values, we get ' 001 Bÿÿÿÿÿÿ PM20360MHZ'

The last 2 bytes of each page is the CRC-16/Maxim algorithm of previous 30 bytes. To compute CRC-16/Maxim value, use this website: https://crccalc.com/.
Enter input value in HEX and click CRC-16, and read the output value from CRC-16/Maxim


If we take the orange values from page 0 and compute its CRC-16/Maxim value, we get 0xF0C9, which is not 0x6E44.

On my CPU module, it has a label '030-1591-001 Rev B KFP523'. Obviously, there is an error. HBP523 vs KFP523. The first 2 letters are wrong.

01 20 20 20 20 4B 46 50 35 32 33 20 20 20 20 20 20 20 20 20 20 30 33 30 2D 31 35 39 31 2D

Convert above value to ASCII, we get ' KFP523 030-1591-', and its CRC-16/Maxim value is 0x6E44.

Note (1) 'ÿ' from page 1 is just 0xFF < -- which means not filled, default value

Note (2) There are other info not under 'main' and they have the following contents:

write protect pages (0H)
FF FF FF FF FF FF FF FF

write protect redirection (20H)
FF FF FF FF FF FF FF FF

bitmap of used pages for file structure (40H)
FC FF FF FF FF FF FF FF

page redirection bytes (100H)
all FF


To make sure I understanding is correct, I also did a read on a working CPU module. Below is how it looks like:

R12k 400MHz, single cpu module

030-1475-002 Rev B KWX142

main memory
01 20 20 20 20 4B 57 58 31 34 32 20 20 20 20 20 (0H)
20 20 20 20 20 30 33 30 2D 31 34 37 35 2D CF 73 CRC 0x73CF ' KWX142 030-1475-'
20 20 20 30 30 32 20 20 20 42 FF FF FF FF FF FF (20H)
20 20 20 20 50 4D 31 30 34 30 30 4D 48 5A 54 11 CRC 0x1154 ' 002 Bÿÿÿÿÿÿ PM10400MHZ'

rest of page are all FF

write protect pages (0H)
FF FF FF FF FF FF FF FF

write protect redirection (20H)
FF FF FF FF FF FF FF FF

bitmap of used pages for file structure (40H)
FC FF FF FF FF FF FF FF

page redirection bytes (100H)
all FF



Programming the DS2505

Once I figured out how it works, I want to burn a new chip. Unfortunately, there is a chip shortage. I ordered several chips from 2 different eBay sellers. All the chips I received are used. Since DS2505 is a EPROM not an erasable EPROM, they are all useless. (Maybe there is a way we can use it, but I don't know how.) Contacted the sellers and got my refunds.: 2 months wasted. Why I order chips from eBay seller? Because I can't find any from DigiKey or Mouser. In the end, I put in my order with DigiKey and it is on back-order for 6+ months. Today, I got 10 chips I ordered and immediately I start to program it.

Programming DS2505 is not difficult if you have the programmer. I was able to burn the chip with the correct info, but that did not solve my cpu issue. I will work on this once have more time. In the mean time, I would like to share my findings. Hopefully, this is useful for someone.
 
Last edited:
  • Like
Reactions: Elf

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Thanks for sharing! I have similarly worked with some 1wire things in the past for custom electronics projects. It used to be that there were a few relatively cheap 1wire interface tools direct from Maxim as well; you don't need a programming tool per se, just something that allows you to speak 1wire. Unfortunate that the chips are so unavailable, they used to be plentiful!
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
Yo Rascal!

We were talking through both eBay and few other methods about your project when you got going (I pointed you to Ian’s info in the PROM CPU ID needs when you posted your CPU for sale as parts, originally).
I didn’t remember that you’d hit another road block. If I recall then, I questioned the lines feeding power to the DS2505 on the CPU board in that maybe your newly programmed chip is getting shorted or otherwise by the CPU board. Something ruined the original DS2505, my thoughts were that same condition may still be present on the CPU board.

I think I recommend a multimeter diode test on the DS2505 pins to board ground and compare readings between good and bad CPU boards.

Likewise, if that does held anything, remove both chips and then do both an Ohm’s and diode test to ground on the resultant pad/foot print on the PCB the chip were removed on both boards.

Either/both tests shouldn’t harm the boards and hopefully would give you a path to follow.

Keep us apprised.
 

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
Yo Rascal!

We were talking through both eBay and few other methods about your project when you got going (I pointed you to Ian’s info in the PROM CPU ID needs when you posted your CPU for sale as parts, originally).
I didn’t remember that you’d hit another road block. If I recall then, I questioned the lines feeding power to the DS2505 on the CPU board in that maybe your newly programmed chip is getting shorted or otherwise by the CPU board. Something ruined the original DS2505, my thoughts were that same condition may still be present on the CPU board.

I think I recommend a multimeter diode test on the DS2505 pins to board ground and compare readings between good and bad CPU boards.

Likewise, if that does held anything, remove both chips and then do both an Ohm’s and diode test to ground on the resultant pad/foot print on the PCB the chip were removed on both boards.

Either/both tests shouldn’t harm the boards and hopefully would give you a path to follow.

Keep us apprised.
Thanks for the suggestion. Now I have a direction I can work on.
 
Last edited:

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
Also, given how long it’s been, what was the “new issue” on the CPU after you installed a supposedly correct/working DS2505?

Did you get the SAME error you got before or was it different?

Assuming you find nothing with the above test suggestions I made, I’d ask what really is on an Octane CPU module (under the huge heatsink)?

The reason I ask is that I have worked on a Fuel PIMM (SGI Fuel CPU Module) and it had an onboard voltage regulation circuit/module built into the main CPU PCB.

This concept MAY or MAY NOT be relevant here towards an Octane. But Octane and older platforms have no official monitoring system for system-wide voltages like the later platform.

So without knowing all the known possible error codes the PROM might output it’s unknown (to me) if it would separately throw an error for voltage onboard the CPU, slightly high voltage would also cause minor damage to other components using the same rail.

It’s possible the DS2505 uses that rail and hence became damaged as a side affect.

I’m basing this not an some symptom you’ve mentioned.
 

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
Also, given how long it’s been, what was the “new issue” on the CPU after you installed a supposedly correct/working DS2505?

Did you get the SAME error you got before or was it different?

Assuming you find nothing with the above test suggestions I made, I’d ask what really is on an Octane CPU module (under the huge heatsink)?

The reason I ask is that I have worked on a Fuel PIMM (SGI Fuel CPU Module) and it had an onboard voltage regulation circuit/module built into the main CPU PCB.

This concept MAY or MAY NOT be relevant here towards an Octane. But Octane and older platforms have no official monitoring system for system-wide voltages like the later platform.

So without knowing all the known possible error codes the PROM might output it’s unknown (to me) if it would separately throw an error for voltage onboard the CPU, slightly high voltage would also cause minor damage to other components using the same rail.

It’s possible the DS2505 uses that rail and hence became damaged as a side affect.

I’m basing this not an some symptom you’ve mentioned.
Hi weblacky,

I am no longer getting that "NIC" warning. The computer power up the moment I plug it in. No front panel lights. Fan is on.
Based on the troubleshoot guide, the problem could be (1) system module not seated (2) CPU failure (3) frontplane failure.

I opened up the CPU module and did the measurement. Both modules (working vs non-working) are very similar. I do notice the voltage regulators on either side of the board. I plan to de-solder them to do some tests. Probably it is easier just replacement if I can find the parts.

Right now I have 2 CPU modules behave the same (R12K 360 dual and R10k 250 dual). Hopefully, both modules have the same issue.

As always... really appreciate your helps.
 
Last edited:

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
This is R10k 250mHz Dual. The actual CPUs are attached (or stuck) to the. The DS2505 is in the middle below that silver oscillator

61044FC0-165E-4588-A76A-9524F155F9C4.jpeg



Thispicture shows both R12k 360MHz dual (top) and R10k 250MHz dual (bottom). Both boards are similar but not the same.
930AB1E8-57EC-4D8D-80CB-A69680F0EF59.jpeg

This blue circle area is the DS2505 chip. Only pin 1 ( near white dot) and 2 are in use. The voltage regulator next to the red wire. There is another VR on the other side of the board. The red wire is my attempt to fix a trace
2285EA9E-DF20-4C5B-81AE-F86403BF0903.jpeg

CPUs unstuck from the heatsink.
7F819E04-828F-44EC-A55B-41DA03C74EC5.jpeg
 
Last edited:

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
During my initial study, I found this two TPs. We can use this to read DS2505 without open up the CPU module. However, i was not able to program the chip thru these 2 ports. Probably due to voltage drop.
3A77E249-772D-497B-85F2-69A8B305FC80.jpeg
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
Neat, would you mind taking additional pictures and posting them that show the small (long) daughter card components? I'd like to see them as well.

Also anything on the underside besides the CPU - to - PCB interface connector?
 

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
Neat, would you mind taking additional pictures and posting them that show the small (long) daughter card components? I'd like to see them as well.

Also anything on the underside besides the CPU - to - PCB interface connector?
first 3 pictures are from R12k360mhz dual

DF8F20EE-BA69-4C95-A738-B1BB7B4EA96D.jpeg
BF7E84CD-3EF7-494F-B111-5B9B6B15DA00.jpeg
476A7E4F-4CF7-433A-8B89-3B8022DB79DA.jpeg

Next three are from R10k250mhz dual
539AA7DF-A886-4570-862E-BA548552B915.jpeg
FE23AA2C-D3C7-4957-AA1A-4983E16FE847.jpeg
FA2218DF-3667-441F-AECC-5EC625AF2354.jpeg
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
I believe I'm out of my depth on this. None of these components look like a true VRM. Yes there's one or two voltage regulation modules but nothing with enough power to actually directly run the CPUs. The daughter cards are kind of strange, I can't tell if they are filtering some sort of interface or perhaps they contain the reset/power-on logic for the CPU?

Your description of the new problem seems kind of odd.

I'm a little confused so perhaps you can clarify something for me. You say you have two CPUs where this happens. Does this mean you have a CPU where everything works fine?

That is to say does the system actually fully function when you replace any of these items? Like the motherboard or the CPU or the power supply?

The immediately turning on thing really bothers me, now no one has done an investigation on the startup sequence for an octane power supply, but I assume that it's a system where the main board or some part of the front plane has to actually trigger the power supply "on" versus like an Indy where the power supply is held "off" by the mainboard by default.

The fact that it auto-starts would lead me to believe that the logic that controls the power supply on/off function is shorted or not working?

This is why I'm asking about the CPU specifically. Your description sort of left it ambiguous enough where maybe there's such a thing as a "working CPU" in this case? Could you please clarify a little to correct my misunderstanding?
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
For what it's worth, that board with the Pulse transformer does look like a high current DC-DC buck converter to me. You can find things in that form factor that will do a good 40A on the output.

I wouldn't expect any voltage regulation to be the root cause of something immediately powering on but being unresponsive though. As weblacky was saying above at least on the earlier systems there is usually some more simple logic that controls the power on sequence as the CPUs are not powered up yet and there is not enough standby power to run them. There is usually an input into this so that a bit in some sort of (external) register can be flipped, in software, to power the system off again, but the button to power on sequence is usually just simple logic. It is only with the advent of the L1 controllers that I would expect things to get any more complex than that, and still not involving the main system CPU.
 

weblacky

Active member
Jan 13, 2020
181
45
28
Seattle, WA
Hi Elf, is what I’m mentally missing under the black epoxy then? I don’t recognize this because I don’t see any switching FETs, can you explain how that this is a dc to dc converter? It’s missing the visual landmarks I normally associate with a dc converter?

What’s the theory of operation here?

The power up thing I why I asked if there was any succcess with any part swapping.
 

Elf

Storybook / Retired, ex-staff
Feb 4, 2019
792
252
63
Mountain West (US)
Hi Elf, is what I’m mentally missing under the black epoxy then? I don’t recognize this because I don’t see any switching FETs, can you explain how that this is a dc to dc converter? It’s missing the visual landmarks I normally associate with a dc converter?
There are a few things about it that would immediately say DC-DC converter, specifically high efficiency synchronous buck converter:
  • Large banks of capacitance in high value MLCCs right next to magnetics (big giveaway, not a lot of other things that would do this)
  • General layout of the module, power oriented with a bunch of ganged pins dropping out the bottom
  • The physical arrangement of what is presumably the switching MOSFETs between the magnetics and the capacitors
The only thing really unusual about it is that the switching and rectification MOSFETs are die bonded underneath an epoxy blob rather than packaged surface mount components. Given it was an earlier design and the lack of any apparent heat sink I am wondering if they just wanted to die bond it for thermal reasons so that the board becomes the heat sink.

In any case this style of module and that topology is actually quite common, for low currents all the way up to 40+ amps, e.g.:
1658078257685.png
1658078328784.png
1658078412238.png


What’s the theory of operation here?
Given the relative lack of heat sinking and presumably high current, synchronous buck converter.
 
Last edited:

Rascal

New member
Nov 14, 2021
20
19
3
Poway, CA
Hi All,

Good News! I fixed one of the CPU module: R10K 250MHz Dual.
This one's DS2505 is correct so I don't need to do anything.
For this module, the computer powers on, no front panel light, and fan is on.
I de-soldered the voltage regulator (VR) thinking it is bad. The model number for this VR is LT1585 CT.
From the data sheet, I have determined it is not broken.
Next I used the diagram to verify the connections on the PCB.
Quickly, I found the voltage divider and the capacitor (R1, R2, C4 in the schematic).
On the back side of the board, there is another VR and works as the diagram.
However, the VR on the top seems to have a broken connection.
Next, I added a jump wire between Pin1 of VR and the voltage divider.
LT1585CT.png


The red line below depicts the connection.
R10K250MHz_topview2.png


That fixed it. Compute boots up and seeing both CPUs.

Now I have figure out what is wrong with my R12k 350Mhz dual.
 
Last edited:
  • Like
Reactions: Jan

About us

  • Silicon Graphics User Group (SGUG) is a community for users, developers, and admirers of Silicon Graphics (SGI) products. We aim to be a friendly hobbyist community for discussing all aspects of SGIs, including use, software development, the IRIX Operating System, and troubleshooting, as well as facilitating hardware exchange.

User Menu