Print
Hits: 3496

Introduction

Welcome to a new series on ClusterMonkey! While the news and articles have been a bit sparse lately, it is not because the head monkey has been idle. Indeed, there is so much to write about and so little time. Another issue we ran into was how to present all the recent projects that may seem rather disparate with an easy-to-understand overriding theme. Welcome to edge computing.

Defining edge computing has become tricky because it now has a marketing buzz associated with it. Thus, like many over-hyped technology topics, it may take on several forms and have some core aspects that allow it to be treated as a "thing."

In this series, the definition of edge is going to be as specific as possible. In general, edge computing is that which does not take place in the data center or the cloud (hint: the cloud is a data center). Such a definition is too broad, however, since computing is everywhere (from cell phones to actual desktop workstations). A more precise definition of edge computing can be written as:

Data center level computing that happens outside of the physical data center or cloud.

That definition seems to eliminate many smaller forms of computing but still is a little gray in terms of "data center level computing." This category of computing usually operates 24/7 and provides a significantly higher level of performance and storage than mass-marketed personal systems.

A good way to designate this type of edge computing is to categorize it as "No Data Center Needed" (NDN) computing. Thus, a good definition is "those systems that can rival data center system performance, storage, and networking, but are not physically housed in a data center or in the cloud." This categorization is still fairly broad but does keep the focus on the type of computing not normally found in the consumer marketplace. For those interested in technical coverage of edge computing, have a look at the Edge Section on The Next Platform website, and of course, check back here or sign up for our newsletters to read about our continuing series.

Location, Location, Location

In business, location is often a key to success. For edge computing, location is a constraint. Inside a data center, the power, noise, and heat envelope can be quite large and flexible. Outside the data center, the location may actually determine the level of edge computing that can take place. With regard to makeshift "personal data centers" in labs, offices, and closets (I have seen a quite a few of these), NDN computing should not require "building a personal remote data center" or require changes to the surrounding infrastructure. To provide workable plug-and-play NDN computing, a standard environment needs to be defined. Unlike the data center, the power, noise, and heat envelope must be smaller and based on unmodified standard office (work or home), factory, lab, and classroom environments. One way to define an NDN edge computing environment is to consider the chart in Figure One below.


Figure One: Defining the edge computing envelope

From a design standpoint, an NDN system should fit within the green cube in Figure One. There is a bit of wiggle room, but in general, the volume of the NDN cube can't get that much bigger before data center level (makeshift or otherwise) services are needed. If systems are engineered to work within the NDN cube, there is a good chance they can be deployed almost anywhere.

This series of articles is going to address engineering high performance--numeric and data--systems to fit within the NDN envelope in Figure One. In other words, our data processing environment is defined by a fixed set of specifications and we will tune the NDN design to fit within these parameters. Thus, our definition of location is derived from power, heat, and noise--all of which are NOT independent. The design aspects and how these play out in terms of other issues are discussed below.

Available Power

Available power is a fixed variable for most NDN computing. Adding new electrical service may not be possible or may be prohibitively expensive. In the U.S , typical electrical service is either 15 or 20 Amps of current at 120 Volts. That translates into a maximum of 1800 Watts (15Ax120V) for most office, classroom, or residential locations. Assuming 20% headroom for safety, the usable power is about 1440 Watts. Thus, on any single circuit, an NDN system should not exceed this amount. However, this assumes exclusive use of the circuit. Most outlets are not single runs and are shared with other outlets and possibly overhead lights. When looking to use power, these considerations are important; otherwise, when an NDN system places a large computational load on the power circuit, a breaker may trip. There may also be other shared devices on the same circuit including monitors, lamps, audio equipment, and even a coffee pot. Thus, the true available power often takes some investigation.

There is a hard power ceiling of 1440 Watts (1920 Watts for a 20A service) on most circuits, but the actual ceiling may be much less depending on how the location is wired. Therefore, when designing NDN systems, it is best not to assume--unless verified--that the full 1440 Watts is available and to design accordingly (e.g. a good baseline for unknown circuits is 500-600W). Keep in mind, big CPUs and/or GPUs can add up to 250W each to the underlying system.

Another issue lies in the number of wall receptacles. Ideally, the use of plug strips should be avoided and an NDN system should plug directly into a single wall receptacle. Using multiple receptacles can be problematic if the system has multiple power supplies or components (e.g. an Ethernet switch) that require multiple receptacles. The use of a multi-plug uninterruptible power supply (UPS) or power conditioner can help with this situation; however, many AC/DC power supplies normally create more heat than a single large power supply and can lead to inefficient power use.

Heat Generation

In terms of heat, processors and GPUs are rated using Thermal Design Power (TDP) that is the maximum amount of heat generated by a component under any workload. This number helps determine how to cool the system. Keep in mind that TDP is not an upper limit on the amount of heat a device can create. There is some controversy regarding the effectiveness and accuracy of the TDP metrics, but for the purposes of NDN designs, it is usable. In addition to TDP rated components, power supplies can generate a lot of heat. Power supplies built to the current 80-Plus rating system ensures that 20% or less of the electricity used by the power supply will be lost as heat (e.g. A 1000W 80-Plus power supply can potentially create 200W of heat under load.)

In the chart in Figure One, the TDP limit per processor is given as 65-95 Watts. For a typical systems builder, this may seem quite low. Conventional wisdom suggests that for the fastest performance, use the fastest processor available. In the data center, this may be true, but cooling processors and GPUs present other design issues. We are also going to assume air cooling for the moment. Water cooling will be addressed in other installments.

Faster processors always mean more heat per processor. For example, if we decide to use an AMD Threadripper 2990WX (32-cores, 250W TDP) or an Intel Core i9-7980XE (18 cores. 165W TDP), a very specific CPU cooler is needed. In the case of the Threadripper, a typical cooler may have a volume of 1.9 x10^6 mm3 that translates to 7722 mm3 per watt--the cooler volume needed per watt. Note that for a 65W processor (e.g. Intel i7-8700), the volume is (2.5x10^5 mm3) and the per watt rating is 3836 mm3 (or about half). Thus, as the TDP increases, the size of coolers seems to increase in a nonlinear fashion. The calculations were based on commercially available desktop coolers.

This nonlinear growth means coolers tend to become quite large and actually need much more air to be moved though the cooler. Fortunately, bigger and quieter fans (see below) can be used on big coolers; however, the amount of air needed to move though the cooler can create noise. Some coolers are rated at 40dB (decibels) at top speed. From a design standpoint, systems that need to cool over 100 Watts present airflow, noise, and space constraints on the physical designs.

These constraints can be lifted if several cooler processors are used instead of a single hot processor (admittedly, "hot" and "cool" are relative terms here). For instance, there are several 65W coolers that are less than 30mm in height, which is usually just under the height of motherboard memory modules. This arrangement allows stacking of multiple motherboards in a confined volume.

Using multiple processors also has other advantages in terms of heat movement. First, instead of moving heat from one specific location in the design, the total heat, which may actually equal that of a large hot processor, is more easily dissipated (i.e. smaller and quieter coolers). Second, multiple processors also allow for power control. It is possible to turn off entire motherboards if they are unused, thus reducing both power consumption and the amount of "do nothing" heat generated. Finally, there are some system performance aspects that aid in this design. As will be covered later in the series, the choice of system architecture can have a huge impact on system performance. For instance, is one large multi-core processor better than a handful of slower processors (i.e. cluster design vs. SMP) from a performance standpoint? As will be shown, the answer is not as simple as counting up cores or memory bandwidth.

Chip designers have gone to great lengths to power down parts of the processors that are not being used--and then quickly power them back up--as a means to reduce heat. Without such measures, modern X86 processors would run "hot" even when they are not doing anything. This thermal design is also used in GPUs. Even with these measures, high TDP CPUs and GPUs still create a certain level of "do nothing heat" and can become an under-desk "space heater." This type of heat is not an issue in a data center.

NDN design does not preclude high temperature CPUs and GPUs from being used. The performance per watt may be attractive, but without data center level service, the heat must often be handled by an office or lab HVAC system. Under full load, the amount of heat generated by NDN systems must be considered. A fast processor and two fast GPUs may require moving 600W of heat from three specific devices and a power supply in the system. Heat concentration means faster and louder fans.

Before moving to the noise aspect of NDN design, a quick mention of the Arrhenius equation is important. In practical terms, the equation states that approximately every ten degree increase in temperature results in a doubling of chemical reaction rate. Translated to computer hardware, it simply means that "the hotter electronics operate, the more likely they are to fail." Thus, outside of the controlled data center, local environments can be expected to fluctuate since ambient environments can get hot for many reasons. Depending on the specific needs, designing with multiple lower temperature (slower) components may offer better stability than a few high temperature (faster) components.

Noise Tolerance

The noise component is pretty simple to understand. Anyone who has been in a data center would not want one or two loud servers sitting next to their desk. The general rule is "small fans, big noise; big fans, small noise." Everything else being equal, small fans must rotate faster to move the same amount of air.

Understanding that noise levels vary by distance, a general office, lab, or classroom environment has a sound level of about 40-60 dB. In these conditions, people can have conversations or talk on the phone without being bothered by continuous ambient noise. A good measure of fan sound is the Sone scale. Sones are not decibels or volume, but rather how sound is sensed. The Sone scale is linear and a normal office environment is usually between 1-4 Sones. As an example, bathroom fans are usually rated in Sones. A quiet fan is less than 2 Sones, while a loud fan is 4 Sones.

A design target less than two Sones is important to NDN systems. Any louder could be considered mildly annoying, and at worst, create issues with conversations. A good rule of thumb is that when sitting next to an NDN system, it should not impinge on your workflow, cause interruption, or be annoying. This rule may vary by person, age, etc.

Keeping Computing in the Green Box

Designing within the above constraints results in systems that are not usually "off-the-shelf" or easy to construct on your own. Each situation may require some customization to fit within the NDN green box shown in Figure One. These constraints contribute to system design in ways that data center computing can ignore. Over the course of this series, the following aspects will be covered (in no particular order). Like many design issues, there will be overlapping issues and trade-offs. The good news is that highly effective NDN edge-based systems can provide data center level computing almost anywhere.

What is in a name?
The icon and name for this series harkens back to the rock band known as Yes and their album called Close to the Edge. The da Vinci-looking flying machine could often be found in small places on their vinyl album covers beautifully illustrated by William Roger Dean.

Open Design/Open Software

One very practical aspect of NDN systems is a flexible design process. This requirement implies that the underlying software--or "the plumbing"--should be as open as possible. The obvious and logical choice is GNU/Linux that drives many data center systems. Open software does not preclude closed-source solutions, but rather ensures a large amount of choice and control over the design. In terms of areas like High Performance Computing (HPC) and data analytics (Hadoop/Spark), it also allows very easy migration of applications to and from the edge and data center.

Expect Less, Compute More

Based on the design parameters, NDN edge computing may not be performance competitive with similar hardware housed in a real data center. In exchange for this performance hit, there is huge degree of environmental freedom for NDN systems. Keep this aspect in mind because as we move through the design of NDN systems, it will be tempting to "Yea, but..." the design. For instance, "Yea but, I could just use a server on a desk with 4 GPUs and get better performance." Such a proposal is certainly possible if you can live outside the green box.

By assuming a clearly defined power, heat, and noise envelope, a large amount of "location freedom" is provided to end users. NDN systems should be as easy to relocate as a deskside or desktop PC, and they should provide a reasonable fraction of data center performance, often at lower cost, to anyone. As we journey closer to the edge, so does our attention to design, efficiency, and performance of NDN systems. Welcome aboard.