We have a server with RAID configured in it and is running a XP
Professional version which is
installed in a separate hard drive. The problem which we face now is that
once in 3-4 days its
giving a single line blue screen error message as
System halted due to hardware failure.
and the entire system gets struck. We check out the event log of the OS ,
but there is no clue
from the Operating System. Is it related to RAM failure or some other
hardware failure. In MS
support doc also they don't have mentioned anything regarding this error
message.
If anyone can put some light over this issue will be greatly appreciated.

Some configuration note:
1. The server is running normal applications only.

Hardware Details
1. Intel Entry Server board SE 7230 NH1E
2. Intel Pentium D pro processor
3. 4 Transcend 1 GB DDR2 RAM
4. Seagate Baraccude 80 GB HDD (which holds the OS).
5. MSI Graphic card
6. 3ware RAID controller card.
7. 4 Hitachi Deskstar SATA 500 GB (for RAID array).

I can provide more details if needed.

System halted due to hardware failure... by Oscar

Oscar
Wed Oct 22 00:37:37 PDT 2008

Hi,
I read your post and I have to say that I found the same problem.
Apparently it seems a memory error, in fact we found one of the Dimm broken but after we fixed it the problem came out again. So I don't know what it is.

The system is a Dell Workstation Precision T7400 with 32 Gb Ram and 2 Xeon quadcore 3.2 Ghz.

Did you figure out what is the problem?

hope you did and can tell me...

Thank you
Oscar

Re: System halted due to hardware failure... by Paul

Paul
Wed Oct 22 07:37:56 PDT 2008

Oscar Cainelli wrote:
> Hi,
> I read your post and I have to say that I found the same problem.
> Apparently it seems a memory error, in fact we found one of the Dimm broken but after we fixed it the problem came out again. So I don't know what it is.
>
> The system is a Dell Workstation Precision T7400 with 32 Gb Ram and 2 Xeon quadcore 3.2 Ghz.
>
> Did you figure out what is the problem?
>
> hope you did and can tell me...
>
> Thank you
> Oscar

http://groups.google.ca/group/microsoft.public.windowsxp.hardware/browse_frm/thread/12205373db700a76/d02251f7e0c7a27b

Your board is based on 5400 and uses FBDIMMs.

http://support.dell.com/support/edocs/systems/wsT7400/en/UG/html/paris320.jpg
http://support.dell.com/support/edocs/systems/wsT7400/en/UG/html/parts.htm#wp1658505
http://support.dell.com/support/edocs/systems/wsT7400/en/UG/html/about.htm#wp1224499

If you're using just the motherboard memory slots, and don't have
the riser cards, your wiring looks like this. (The riser assembly
would fit into the four slots marked X in my diagram here. And
I don't understand how that can possibly work, but that is a
puzzle for another time.)

branch0___/Channel0 -----X0--Y0
5400 \Channel1 -----X1--Y1
branch1___/Channel2 -----X2--Y2
\Channel3 -----X3--Y3

If using just the motherboard slots, fill in pairs, filling
a branch at a time, populating the X slots first. They are
white in color in the Dell web page. The X slots would be
nearest the Northbridge heatsink.

FBDIMMs are serially connected. X0 and Y0 would be the two
leftmost DIMMs in the following picture. To reach Y0, the packet
goes through the AMB on X0, and on towards Y0. That means
a problem in the output pad of the X0 DIMM, could influence
the performance of the Y0 DIMM.

http://www.xbitlabs.com/images/cpu/intel-skulltrail/fbdimm.png

In short, you have more memory configurations to test, than
with conventional memory. You start simple, and work up.
Test two pairs of memory in this configuration first, to
prove the memory chips are working.

white black
branch0___/Channel0 -----X0
5400 \Channel1 -----X1
branch1___/Channel2 -----
\Channel3 -----

Next, install both pairs, using corresponding white
and black slots (check the slot numbering for a hint
of which ones to use).

white black
branch0___/Channel0 -----X0-----Y0
5400 \Channel1 -----X1-----Y1
branch1___/Channel2 -----
\Channel3 -----

That configuration proves the output of X0 and X1
are OK. Now switch the order of the pairs around
and repeat the test. That would prove the outputs
of the second set of modules are OK.

If the technology supports the ability to track
whether an error is on the packet bus, or the
error is in the memory array itself, that could
reduce the amount of test cases necessary.

On an Asus motherboard that uses the 5400, they allow
a single DIMM to be used in a designated slot (perhaps
only X0 in my diagrams above). While most of the time,
you deal with the memory in pairs of DIMMs, it is possible
your user manual may mention which slot will work with
only a single DIMM installed.

For more information, see "Intel 5400 Chipset Memory Controller Hub (MCH) Datasheet".
Figure 5-4 on page 296 and Figure 5-5 on page 297, show
some details of what is involved. I have no first
hand knowledge about this platform, so I cannot tell
you whether they give any fault isolation information
to work with or not. In any case, the serial interconnect
between DIMMs on the same channel, means there are
potentially more fault modes to consider when
constructing test cases. It is more complicated
than the old registered DIMMs.

http://www.intel.com/Products/Server/Chipsets/5400/5400-technicaldocuments.htm

http://www.intel.com/Assets/PDF/datasheet/318610.pdf

Paul