1.0 Server Architecture – Just another IT blog…

1.0 Server Architecture

1.1 Explain the purpose and function of server form factors

As it has been said – size matters. Different form factors mean different sizes of servers and this section describes them all.

Rack mount refers to server stored in special cabinets named racks. Height of the racks vary, but usually you come across racks of 42U/48U. Width is at least 19″ (but the external size may vary as well). Depth mainly depends on the type of equipment put to the rack, if cable management will be used etc.
- Dimensions – size of servers is measured in “U”, which stands for unit starting on 1U. One rack unit is 3 rounded or squared holes on the rack, where rack servers are mounted. For squared holes you need cage nuts – nuts inside thin metal specially twisted around it so it can be easily attached to the rack hole.
  - 1U, 2U, 4U
    - 1U – one rack unit is 1.75″ (44.45mm) – it is usually measured in inches.
    - 2U – 3,5″
    - 4U – 7″
- Cable management arms helps with proper management of cables attached to a specific server. Cable management arms are vendor dependent but usually it is a system attached both to a server and rack creating small curved corridors behind the server inside the rack. To these corridors a full (or most) length of the cable is put so the server can be easily ejected from the rack without a threat of ripping the cable in the process of ejecting.
- Rail kits allow rack mount servers to be put into the rack. Usually you have two pairs of rails – one pair is attached on the server and second pair is attached inside in the rack. Then you are able to slide the server into the rack (rails on the server to rails in the racks).
Tower form factor refers to form factor similar to desktop PCs. It is unmounted free standing server containing everything it needs for running.
Blade technology allows greater density of servers (called “blades” in blade enclosure) in racks.
- 9U blade enclosure holding 14 blades
- Blade enclosure – contains blades, power, cooling system, management access. The Blade enclosures differ from vendor to vendor, but most components in it are redundant and hot-pluggable. It may require high amperage power circuit.
- Modular system of blade enclosure.
- Backplane / Midplane is a plane with connectors located in the middle of the blade enclosure. From one side blades are attached and from the other side supporting systems are attached (power, cooling, network…).
  - Power supply sockets are located on the midplane from the back side.
  - Network modules / switches modules for managing connectivity – can vary from simple network cards with RJ-45 or fiber connectors to full operable switches.
  - Management modules serve for managing the blade enclosure in LO (lights-out) approach remotely. Usually this is done through IPMI dedicated interface with KVM possibilities
- Blade server – is one computing unit in the enclosure. The blade server has at least it’s own CPU and memory with the rest devices shared through midplane.

1.2 Given a scenario, install, configure and maintain server components

CPU – Central Processing Unit
- Multiprocessor vs. multicore
  - Multicore refers to multiple cores located in one single processor. In the past all processors were single core – one core per one processor. Nowdays CPUs contain more than one core and are multicore.
  - Multiprocessors – on the other hand multiprocessors means multiple processors located in multiple sockets (interface on the motherboard where processors are put).
  - Example – if motherboard has two sockets each with a 6 core processor. The Operating System will see 12 processors (2 processors, 6 cores each).
- Socket type – each processor needs specific socket. This socket on the motherboard is not upgradable and so it is on the processor. Sockets are not compatible with previous or next versions, so specific processors need specific sockets.
- Cache levels: L1, L2, L3 – special memory located inside the processor. This memory is checked in the first place before processor contacts the main memory. Modern processors have multiple levels of cache. With level number size of the cache increases but also the latency meaning the L1 cache is the smallest but also the fastest. Usually in case of multicore processors with more cores, each one core has its own L1 memory, L2 is shared between 2 cores and L3 is shared among all cores.
- Speeds
  - Core listed speed of the processor (eg., 2,4GHz, 3GHz etc.)
  - Bus speed of the bus for accessing peripherals.
  - Multiplier possibility to over-clock a CPU.
  - Core = Bus * Multiplier
- CPU stepping refers to versions of the processor. CPU stepping consists of a letter and number (A0, letter refers to base layer mask update, number is update to the metal layer)
- Architecture is the way the chip was designed along with providing possible features.
  - x86 designed by Intel, 32bit architecture, fully backward compatible CISC arch.
  - x64 designed by AMD and then adopted by Intel, 64bit architecture allows addressing more memory and is fully compatible down to 16-bit as all the instructions are still part of this architecture and no emulation is needed (therefore no performance loss for 16-bit and 32bit apps). CISC arch.
  - ARM architecture based on RISC, less powerful than CISC architecture, but more power efficient (therefore it is used in mobile phones and portable devices overall)
RAM Random Access Memory – (very good resource on how memory timing works http://www.hardwaresecrets.com/understanding-ram-timings/)
- ECC vs. non-ECC
  - ECC – Error Correcting Code – this memory is able to correct error in one bit (from 64bit word), these modules have one more chip on module used for this error checking and correction, these modules are more expensive…
- DDR2, DDR3, (DDR4) – Double Data Rate Synchronous Dynamic Access Memory (phew). DDRs # are generations of DDR memories and are not backward or forward compatible – different number of pins. Increasing in generation performance grows, also heat dissipation is lessser. Also as the generation increases the density of memory is greater allowing more memory to be put on the DDR chip.
  - Number of pins
    - DDR 184
    - DDR2 240
    - DDR3 240
    - DDR4 288
    - SO-DIMMs – Small Outline Dual In-line Memory Module – modules used in laptops, different number of pins (almost always less) than their full sized equivalent put in desktop PCs
- Static vs. dynamic RAM
  - Static RAM (SRAM) uses 6 transistors per bit stored and has much higher performance than dynamic RAM for a higher cost (also more transistors means less space. Therefore it is used in places where cost doesn’t matter that much like in L# caches of processors.
  - Dynamic RAM uses 1 transistor for 1 stored bit. Offers usually higher capacity, lower performance for lower cost. In order to retain data in dynamic memory, cells must be refreshed periodically (hundreds of times per second). Reading from dynamic memory is also 1,5x slower as read data are cleared and needs to be refreshed (therefore each reading needs a refresh after it).
- Module placement – what to be aware of when placing a memory into memory banks?
  - NUMA – stands for Non-Uniform Memory access. Basically this architecture connects CPU sockets to specific memory banks. A CPU in NUMA architecture has its “own” memory banks, which the processor can access faster than memory banks associated to other processor or that are shared.
- CAS latency – see bellow
- Timing
  - Memory speed is based on different parameters – clock speed and various latencies. Memory timing is listed on the memory module like 5-5-5-15 or 7-7-7-21. Generally speaking – the lower these numbers the better (older DDR modules have these number lower than DD3, but DD3 operates on much higher (2x,4x…) clock speeds resulting in smaller absolute latency).
  - The four numbers separated by dashes have this meaning Trcd-Tcas-Trp-Tras
    - Trcd (Time between Row Address to Column Address delay) – delay between selecting a row (Tras) and selecting a column (Tcas)
    - Tcas (Column Access Strobe, also CAS latency or Tcl) – this delay indicates how many cycles one have to wait, if correct row is already selected, for retrieving the data from RAM
    - Trp (Row Precharge Time) – delay in cycles which indicates how long will it take to select different row
    - Tras (Row Active Time) – time in cycles during which the data stored in the row are accessible, usually it is Tras>=Trcd+Tcas+Trp
- Memory pairing
  - Dual-channel – allows to place two memory modules in “paired” slots – these modules has to have the same size and preferably also the same model (from the same manufacturer) for the best performance. In result the bandwidth (bandwidth = DDR clock rate x bits transferred per clock cycle / 8) throughput of modules in dual channel is increased (2 times), because memory controller is able to use both memory channels simultaneously.
Bus types, bus channels and expansion slots
- Internal vs External
  - Internal connect CPU to Memory modules and all other important internal components, also called as FSB (Front side bus)
  - External buses – connect internal components (graphic, sound, network cards…) and external peripherals (keyboard, mouse, printer) to work in/with the computer
- Height differences and bit rate differences
  - Full height PCI Cards
    - Bracket: 4.7″ (120 mm)
    - Card: 4.2″ (107 mm)
  - Half height (low profile – for rack mounted servers)
    - Bracket: 3.1″ (79 mm)
    - Card: 2.5″ (64 mm)
- PCI (Peripheral Component Interconnect), oldest
  - Shared BUS – for multiple slots (and multiple devices in them)
    - Only One device can use the BUS at one time
    - The BUS will be running on the speed of the slowest card – if there are 2 PCI 66MHz cards and one 33MHz, the BUS operates at 33MHz.
  - Uses parallel communication (multiple bits are sent simultaneously)
  - Half duplex, 32 bit/64bit, 33 or 66 MHz speed (=> lowest =( 32 ( bitwidth) *33.33 (frequency) = 1066 Mb/s /8 =133MB/s), highest 533MB/s)
  - Hot Plug capable (as all PCI standars)
- PCI-X (PCI extended)
  - 64 bit, 133MHz in 1.0 (1064MB/s)
  - 2.0 version introduced speeds 266MHz and 533MHz, but wasn’t widely used due to step up of PCI-e
- PCI-e (PCI express)
  - Serial communication
    - Dedicated BUS per device
      - Cards doesn’t affect each other in terms of speed
      - All devices can communicate at one time
  - Full duplex
    - Communication in both directions
      - Communication in “lanes” – 1 bit per clock cycle per lane (pic on http://s.hswstatic.com/gif/pci-express-lanes.gif) – x1 (1 lane in,1 lane out), x4 (4 lanes) etc.
  - PCI-e slots are labeled as x1, x4, x16 etc. (increasement in speed as well as length of the slot) – the more bandwith card needs the more length it will have (graphic cards goes to x16 etc.)
  - PCIe comes in different versions – each affects speed based on the number of used lanes. For example PCI-e 3.0 x1 has speed of 1 GB/s, while x16 has 16 GB/s)
NICs
- This hardware component is used to connect a PC to a network
- Differs in speeds measured in bits per second
  - 10 Mb – 802.3-802.3j (standards defined based on physical media used – from thick coax cable to optic fibre)
  - 100 Mb – 802.3u
  - 1000 Mb – 802.3z
  - 10 Gb – 802.3aq
  - 40 Gb – 802.3bm
  - 100 Gb – 802.3bm
- Advanced features
  - Jumbo Frames – enables working with larger frames than 1500 bytes, must be set on both ends
  - TCP Checksum offload (can be enabled for IPv4/IPv6 separately) – offloads (from CPU) the TCP checksum of segment to NIC
  - TCP Segementation offload – offloads (from CPU) the TCP segmentation of data to NIC
Hard drives – Internal vs External, Directly attached to the server, Provided on network (iSCSI, FC)
- HDD – mechanical disks – platters carying data (data are recorded by magnetizing a thin film of ferromagnetic material), spindle, head, actuator components etc. sizes – 1.8″, 2.5″, 3.5″
  - SATA (Serial AT Attachment) disks – 5400 RPM, 7200 RPM, 10 000 RPM, usually the cheapest solution used for not so much performance sensitive solutions, largest capacities
  - SAS (Serial Attached SCSI) disks – 10 000 RPM, 15 000 RPM, usually the standard for server environment, has smaller capacity than SATA drives as SAS drives have lesser number of platters inside for achieving better lifespan (smaller number of platters means smaller capacity, but also decreases the load of all mechanical parts inside the disk),
- SSD (Solid State Disk) drives – doesn’t contain any moving parts inside, based on NAND flash memory, much more performance than mechanical drives (hundreds times more) in terms of random acces to the storage, may wear out faster as number of writes to SSD is limited (how much depends on the type of NAND chip – SLC/MLC/TLC), size 2.5″
Riser cards
- It is a “converter” for full-height cards to be able to work in low-profile slots. Riser cards are often put to 1U servers, when you have full-height expansion card that wouldn’t physically fit. Riser card is a low profile card, that has the same slot on it and the full-height card can there be inserted to the riser card horizontally.
RAID controllers
BIOS/UEFI
- The very first system loaded on the computer. It basically operates with hardware components and brings everything alive. Its main purposes can be briefly described as:
  - SETUP – Saves basic configuration (boot order, time etc.) for computer and HW somponents. Setup retains configuration in case of loss of power.
  - Makes a POST tests – tests for all HW components
  - Load of the bootloader (which loads OS afterwards)
- Version of BIOS can be upgraded via processes called “flashing”. In case of UEFI this can be done from OS. Processes of flashing is critical, therefore often you can find two separate copies of BIOS in case something will go wrong with either of them.
- UEFI – is a successor to BIOS. Main features are – programability from the OS, UEFI can use specific drivers for peripherals, reworked boot configuration, enhanced security (SecureBoot…).
- CMOS battery
  - CMOS keeps track of time (on HW level) even when the server is off. To perform such task it is powered by a separate battery.
Firmware
- Basic software, which makes a communication with the device possible or enables basic functionality/features. It is tightly connected to the device to which it is embedded. For example – firmware of a calculator monitors buttons pressed and provides support for mathematical operations. In computers the firmware enables basic functions and makes communication with the device possible, it can be imagined as some sort of self-awareness of the device itself.
- It is stored in some sort of non-volatile memory. The process of upgrading is similar to the process of BIOS’s (actually BIOS/UEFI can be viewed as some sort of specialised firmware) and also exists in more copies on the device.
USB interface/port (Universal Serial Bus)
- Common interface for connecting mouse, keyboard and other peripherals
- Versions and max theoretical speeds:
  - 1.1 12 Mb/s
  - 2.0 480 Mb/s
  - 3.0 5 Gb/s
  - 3.1 10 Gb/s
- Connector types – A (3 types) vs B (3types) – standard, mini and micro each look differently for each type, usb 3.0 has blue connector, USB type C – new universal connector
Hotswap vs. non-hotswap components
- Hot swap makes possible change of a component while the underlying system is still running.
- It depends on everything that the hot swap process goes trough. For example if we would like to hot swap HDD, we need the support of the motherboard (controller), of the device and of the OS.
- Hot swap components can be (usually):
  - Hard drives
  - Power supplies
  - Fans
  - Various network modules
- Non hot-swappable (usually – there are systems when even these are possible to hot swap!)
  - Processor
  - Memory

1.3 Compare and contrast power and cooling components – every electrical system that consumes electrical power generates heat and therefore needs cooling. Computer systems operates in some range of temperature and maintaining proper temperature is crucial. The more powerful (more powerful components or simply more components it has) system is, the more power it needs to run, the more heat it will generate and the more cooling is needed to keep it running in optimal conditions.

Power
- Voltage – can be found in range from 100-240V. More is typical for distribution centers.
  - 110v vs. 220v vs. -48v
    - Voltage for 1-phase power.
    - Lower voltage means that more amperage is needed for the same workload to happen. Higher voltage tends to be slightly more efficient mainly because lower energy loss and lesser cost for wiring (higher voltage permits to use thinner cables). The 110V and 220V/240V will be of alternate current (alternate is it because it changes direction in time) running on most electrical circuits, direct current (DC, one direction in time) is then supplied to the device itself by transforming the AC to DC via power supply.
    - -48V can be found in bigger data centers or telephone systems. It requires special -48V DC power supplies and has its own pros (no heat and power loss – no AC/DC conversion in the device, less noise) and cons (more expensive).
  - 208v vs. 440v/460v/480v
    - Voltage for 3 phase power.
    - Three phase power uses three separate (each phase is offset by 120 degrees) phases for delivering the current. The voltage in three phase power never drops to zero, there fore the delivery of power is much more consistent. The power load can be distributed more efficiently across all phases. The installation costs are usually higher, but maintain costs can be lesser (less conductor material etc.).
- Wattage
  - Watt is the unit of Electric power – a rate at which electrical energy is transferred by eletric circuit.
  - Watt is calculated as Ampere(s) multiplied by Volt(s) (W=A*V)
    - Volt is unit for electric potential between two points.
    - Ampere is unit of “strength” of the electric current.
    - One watt is then a rate at which work is done when current of one ampere flows through potential difference of one volt.
- Consumption
  - Consumption is measured in Watts per hour. Wh or better kWh is the number the electricity distribution company is charging you for.
- Redundancy
  - Redundancy is a topic not only for power, but in terms of power for servers/devices it is critical. It needs to be maintained on more levels:
    - Redundant power supplies in the server – attached each to separate branch of electricity (fallback) source, in ideal solution to:
    - Redundant UPS (Uninterruptible Power Supply) – device that has it its own source of power drawing it from batteries, that ensures – at least – proper shutdown of devices connected to it. Redundant UPS are connected to electric grid where are:
    - Redundant power generators – which are able to supply power to the electric grid in case something goes wrong in power transformation centers or power plants. Redundant power generators are hooked on:
    - Redundant power feeds, which are able to support the data centers with two independent power sources (usually up to full redundancy, which is in fact rare and can be found in class 4 data centers)
- 1-phase vs. 3-phase power
  - Single phase can be found in many homes. It runs on 120v and 240V.
  - Three phase power is used for large buildings or distribution centers. It differs from 208V to 440-480V.
- Plug types
  - NEMA (National Equipment Manufacturers Association). NA standardization company. Designs male and female power plugs based on amperage of 15-60 and voltage 125-600.
    - Image of NEMA outlets.
  - Edison – type of NEMA outlet – NEMA 5. Designed for 120V, 15Amps.
  - Twist lock
    - Type of plug, where the connector (must support the twist lock plug) can be twisted after connecting into the plug to avoid unintentional rip off of the cable.
Cooling
- Airflow – maintaining proper airflow is crucial for cooling servers.
  - Servers intake cool air from the front and exhaust the hot air on the back. This way the air flows through disks (the less hot components) on the front to inner parts – chips like CPU, RAM, chipset etc. and is expelled on the back.
    - In this manner are also built rows where server racks are. Usually these racks are facing each other by front sides, creating cold and hot isles. Air conditions are then designed to supply cold isles with cold air and collecting hot air from the hot isles to maintain proper air circulation.
  - In server (doesn’t matter if it is rack, tower or blade) it is very important to keep the chassis closed. This is the way the server was designed and the air flow is optimal in this state.
- Thermal dissipation
  - Active – active dissipation is often connected to a fan, which is basically cooling a passive heat sink behind it. It is used for chips with greater thermal dissipation like CPUs, GPUs etc., where the size of passive heat sink would be unmanageable in terms of size.
  - Passive – passive thermal dissipation is done through a heat sink (or combination of various heat sinks), where heat is drained from the chip by metal heat sink attached to it (with thermal compound between the chip and the heat sink). This way the heat dissipates in the server (in cooperation with shole server cooling) and prevents burning of the chip.
- Baffles / shrouds
  - Both baffles and shrouds helps with proper airflow inside the server – each their own way:
    - Baffles usually stops air flow where it shouldn’t be – for example on the back of the server, where are unoccupied slots for cards (baffles are there instead of cards).
    - Shrouds on the other help direct the flow. These are often plastic shaped things that restrict spreading the air over all of the chassis, but rather help with directing of the flow through the most critical parts of the server.
- Fans are used for intaking (front fans) the air to the chassis forcing it to go through and cooling the main heat dissipators in it. Other fans on the back of the servers serve for pushing the hot air out of the server.
- Liquid cooling
  - Liquid cooling is a one way of cooling the components in the server. Usually there are tubes with flowing water connected to special heat sinks on the chips. Water is then transferred to heat exchanger to remove the heat.

Leave a Reply Cancel reply