[Beowulf] Re: UPS & power supply instability

David Kewley kewley at gps.caltech.edu
Thu Sep 29 16:39:09 EDT 2005


On Thursday 29 September 2005 11:50, Robert G. Brown wrote:
> David Kewley writes:
> >> This is a classic symptom of cheap, poorly designed and made power
> >> supplies. Or bad room wiring, with undersized neutral lines.

I blew right on by Maurice's mention of undersized neutral lines, because I 
believe they were designed properly to avoid overheating.  But I'm not 100% 
sure that's true, especially considering harmonics.  I need to check with 
experts here.  Even more, your detailed description of harmonics below 
makes me wonder about undersized neutrals in the sense of neutrals whose IR 
drops will induce significant voltage fluctuations on the 120V signal seen 
by each power supply.

> > The PDUs have a front panel that displays lots of diagnostic
> > measurements, and they sound a rather piercing alarm when any
> > measurement goes over its Liebert-defined limit (they are the only
> > alarms I've heard in that room that can reliably be heard over the room
> > noise, from any part of the room :).  The PDUs also have suitably sized
> > breakers and suitably sized conductors on each of the 93 branch
> > circuits.
> >
> > The three output phase currents all stay well under their limits, even
> > when they begin to become unstable (at the low-power end of the
> > instability, and well into the instability domain).  Toward the
> > high-power end of the instability domain that we've tested, the current
> > oscillations become large enough, and sit on top of a large enough
> > average current, so the PDUs *do* give overcurrent alarms (plus other
> > alarms due to the wild oscillations).
> >
> > Unless something is going on that is not alarmed for, the PDUs and the
> > Liert techs who've been onsite don't indicate any problem with the
> > neutral wiring or the power supplies per se.
>
> I don't understand.  Didn't you say in the original post that under
> balanced load there was a substantial neutral line current?
>
> To make things perfectly clear -- with a balanced load and PF 1 and no
> harmonic distortion -- the neutral line current should perfectly cancel.
> Even with small variations -- the same number of nodes turned "on" on
> each phase but maybe not in identical state -- you should cancel to a
> few per cent.

The load is *almost* as balanced as I can get it (I intend to work on it 
some more still), but it is not perfectly balanced.  I'm not at the PDUs at 
the moment to read the front panel, but as best I recall, the three phases' 
currents are within 10% or so when all the computers are idle.

There is not zero THD.  As I recall, the PDUS report that the voltage THD is 
around 5% and the current THD is around 15% for the three phases, with a 
little variation among the phases.

The PDUs report the PF as 1.00 or 0.99.

Even with all this, the neutral current reported by the PDUs is about 30-50% 
of any one phase current.

Keep in mind that aside from the computers, which are all Dell PE1850s, we 
also have three different kinds of switches (Myrinet, Nortel Baystack, EMC 
Fibre Channel), a tape library, a console, and a rack full of disks & disk 
controllers.  I'm not sure what the total power of these components is, but 
I'd estimate 20-50kW.  When all the compute nodes are turned on and idle, 
the total power of all computers & other computing components is around 
~220kW.  There are no motors or other non-switching loads connected to the 
PDUs or the problematic UPS.

> If there is a neutral line current AT ALL during balanced operation
> there is very likely a problem with the PC power supplies, in my
> opinion, the only question is where.

If that's so, then why do the high-quality, expensive PDUs report PF=1?

> I liked Jim's idea of trying a 
> balanced, large resistive load on the lines.  A heater, an electric
> range.

I want to say (but am not sure at the moment) that the instabilities show up 
at around 140kW.  I don't have 140kW of resistors available easily, let 
alone a full-power load of 350kW.  We initially looked at renting resistive 
load banks to test the room, but that idea fell by the wayside because of 
the cost involved in renting the units & having professionals do the 
testing.

It's not out of the question to do such a test, but I don't think it'd be 
easy at a facility of this size.  Suggestions welcome.

> I also liked the idea of putting a 'scope on the lines because I 
> mistrust the KAW for things like PF testing, although some of the stuff
> reported here suggests that maybe it is ok.  I'd still worry that what
> it measures is phase alignment at 60 Hz, not for waveform distortion
> (basically additional fourier components at higher frequencies, in
> particular 180 Hz).  A scope and measure of neutral line currents could
> help.

I agree wholeheartedly that scoping out all three phases' currents and 
voltages, together with neutral current, at the outputs of both PDUs and at 
the output of the UPS, would be ideal.  I don't have that sort of testing 
equipment at hand; I suppose I can look into renting it or trying to hunt 
down someone on campus from whom I could borrow.

Liebert has done some of these measurements.  I'm hoping they'll come out 
very soon and do a complete suite of measurements as I describe.

I wouldn't trust a KAW completely either, let alone the freebie KAW clone 
that I have.  But I *would* be inclined to trust that the very expensive 
Liebert PDUs are measuring & reporting numbers correctly.  Or should I not 
even trust the Liebert PDU reports?

> IF the KAW is indeed insensitive to harmonic distortion, your symptoms
> are very, very similar to what we experienced under similar
> circumstances with switching power supplies on multiple phases with
> shared neutrals and what the Mirus FAQ predicts will happen under those
> cases.  Hoofbeats equals horses unless proven otherwise (there ARE
> zebras, they're just locally rare, right?:-).

I need to read the Mirus FAQ completely.

> Also, seriously, the phases shouldn't share a neutral and if the cluster
> is wired so that they do, this is a design bug all by itself -- this is
> just a bad idea for loads that carry harmonics OR are phase shifted
> relative to the voltage.

About 13 computers are plugged into each rack PDU.  (I believe I mentioned 
these earlier but to repeat: the rack PDUs are APC AP7960s.  They take 
3ph+neutral 208V ph-ph in, and offer 24 L5-15 outlets at 120V.  I'm not 
sure, but I don't believe they do any significant filtering etc.)  There 
are three rack PDUs per rack.  Each has its own whip traveling 
independently back to the breaker panel.  Each of seven breaker panels ties 
its neutrals together.  There is then a short, thick cable from the breaker 
panels to a bus bar which ties neutral to chassis ground.  The neutral 
return to the transformer is also tied to that bus bar.  From the breaker 
panels to the bus bar to the transformer is all Liebert design, with one 
exception.  One PDU had to have its breaker panel box separated from its 
transformer box onsite due to some mysterious miscommunication during the 
ordering process of the physical configuration needed.  There is a fairly 
short (~15 feet), thick neutral cable running from the neutral/ground bus 
bar in the breaker panel, under the floor, to the transformer neutral 
return bus bar.  Liebert did not make this modification themselves, but 
they know very clearly about this modification, and have not raised any 
issues with it.

So the distal portion of the neutral return lines carry the current for 
about 13 computers each, split among the three phases.  And the proximal 
portion is made up of thick conductors designed or tacitly approved by 
Liebert.

Do your concerns about shared neutrals remain, considering that "only" ~13 
computers distributed across the three phases share one neutral wire?

Would you raise an issue, then with the tying-together of the neutral across 
phases, by the AP7960s and the whips?  If we should have separate neutral 
runs for the three phases back to the breaker panels, we'd have to have 
about eight whips & power strips per rack instead of three.

Leaving aside for the moment of costs of retrofitting, how would you design 
things so that we have 13+ kW of power used per rack (at 120V), and at 
least 40 outlets, all taking up zero U (mounted at the back-left side of 
the cabinet or on the left rear door of a Dell rack), and not too much 
under-floor space?  Assume we want flexible under-floor whips and 
under-floor mounting of the whip-end outlets, where the power strips would 
plug into.  I might be able to find appropriate power strips, although I'd 
welcome suggestions.

> The neutral line builds up a resistive back
> voltage proportional to line current times wire resistance -- hence
> instead of running at nominal "120V" you might measure a loaded line
> voltage of 110V.

Yes.  I need to measure it.

> For a resistive load, this has the effect of reducing nominal voltages
> by a few volts with no meaningful waveform distortion and any
> well-engineered device will tolerate peak voltage variations of maybe
> 10-15%.  For a third-harmonic distorted load, this back voltage
> modulates the neutral line voltage and ADDS to the the 60 Hz phase
> signal.  You get beats at 120 Hz, for example.  Except that there are
> more harmonics -- the neutral line voltage isn't a pure sinusoid -- and
> ANY of those harmonics could couple with accidental resonances anywhere
> in your system that the power supply and system designers couldn't have
> imagined being present in the system and didn't engineer any defenses
> against (not like high frequency surges and noise, for example, that is
> pretty well defended even without a surge protector).

Interesting.  I'll have to think this over & measure things.  I'll also call 
my Liebert engineering contact to ask about this.

> Think of it as putting a signal order volts with lots of LOW frequency
> noise right across the hot/neutral lines of your system power supplies.
> The noise maybe affects the point in the switching cycle where the
> system draws current, and you get real fed-back oscillation where the
> trigger point rings and resonates.
>
> I mean, if I grabbed a lab frequency voltage generator and set it to 3V
> and 180 Hz and tried to hook it up across your power supply inputs,
> you'd hurt me, right?  who would do this and expect things to work
> right?  It might well go right through your de facto system high pass
> filters designed to keep out "surges" because the impedence of your
> power supply capacitors at these low frequencies is not that high.
> Maybe it finds something in the natural frequencies of your PC power
> supply to resonate with by screwing with trigger points (where phase A
> is dropping its current draw right where phase B is trying to raise its
> current draw). Maybe you see a 180 Hz signal floated on top of your
> supposedly flat line voltages at various places on your mother board.
> Maybe you just brown out the supply so your motherboards run a bit under
> their nominal voltage.  What GOOD thing could come of it?

Interesting.  I have to learn some more in order to make an educated reply 
to your details, but you have me thinking, and I'll talk to Liebert about 
it.

Thanks very much, Robert, for pointing out possible problems with a neutral 
shared among the three phases.

David
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list