3.0 Storage

3.1 Given a scenario, install and deploy primary storage devices based on given specifications and interfaces

  • Disk specifications
    • RPM (Revolution Per Minute)
      • Magnetic disks comes with different numbers for “spins” per minute (RPM), basically the more the RPM, the faster the disk is. SATA disks are 7.2k/10k RPM, where SAS disks are 10k/15k RPM.
    • Dimensions/form factor
      • 2,5″ – mostly SSDs are done in this size. In servers you can find SAS disks of this size too. For installing a 2,5″ disk into a 3,5″ slot a reduction is needed.
      • 3,5″ – usually used in desktop computers or servers. Disks of this size can store more data (more platters fit in the disk).
    • Capacity – is the amount of data, that can be stored on a disk. In the past this was mostly measured in MB, now days we are measuring it in GB or TB. 
    • Bus width
    • IOPS (Input/Output Operations per Second)
      • This is some kind of basic measurement when it comes to measure a performance of storage subsystem (disk(s), RAIDs etc.). For a single drive IOPS can be from about 85 IOPS for 7.2k SATA drive to 200 for 15k SAS drive. The range even for the same type of interface with the same RPM can vary, because drives differs from each other based on the manufacturer (each will have different average seek, therefore the counting of IOPS will be different). Can be counted as 1/(average seek + average latency)
    • Seek time and latency – in order to get the essentials about the seek and latency times, one should first understand how are magnetic disks composed
      •  
      • The basics to understand on the image above are tracks and sectors:
        • Tracks are circular shapes going from the innermost parts on the disk platter to the outermost.
        • Sector is indicating where the read/write head is positioned.
          • Data or parts of them are stored in the divided tracks by sectors.
      • Seek time – is the time needed for re-positioning of the read/write head from one track to another. Because we can go from track 0 (the innermost) to 10 (the outermost) and the time will be lower then going from track 2 to track 3, this value is often indicated as an average.
      • Latency – is the time it takes for the platter to spin to the correct sector. Because the correct sector can be just next to the sector the head is on right now or we have to wait one full round of spin, the value is often mentioned as average. The latency is closely linked to RPM – the faster the disk spin, the lower latency there will be -7,2k RPM disks will be around 4,2ms where 15k will be around 2ms.
    • Hotswap vs non-hotswap components
      • Hot Swap in disks depends on type of interface – all “advanced” interfaces are hotswappable -SATA, SAS, FC, USB, even SCSI (“better” disks and controllers – here is the same requirement of both ends of the cable must support hot swap).
  • Interfaces – a type of connector to connect the hard drive to a server, the server must be compatible with the interface (must have a compatible interface itself, either built in, which is standard case for SATA or through HBA).
    • SAS (Serial Attached SCSI)
      • Better performance over SCSI, point-to-point serial interface, full duplex transmission, up to 128 devices connected over thinner and longer cables, 3.0 Gb/s-12 Gb/s and hot-pluggable devices. SAS disks tend to be a more expensive and are more sort of server standard, mainly because of greater reliability – they have lesser amount of platters (lesser capacity), but also greater lifespan due to lower load of mechanical parts.
    • SATA (Serial ATA)
      • An evolution of ATA, serial interface, de facto a standard in desktop computers, uses AHCI controller, which allows features like NCQ and hot plug, comes in 1,5Gb/s, 3Gb/s and 6Gb/s.
    • SCSI (Small Computer System Interface)
      • Parallel interface (can simultaneously operate with multiple devices), limitation on maximum number of connected devices, up to 80 MB/s.
    • USB (Universal Serial Bus)
      • Can also be used for connecting HDD, external interface.
    • Fiber Channel (Fiber Channel)
      • Connecting of devices through optical cable, speed 2 Gb/s to 16 Gb/s, used mainly for whole SANs (Storage Area Networks).
  • Hard drive vs SSD
    • Hard drive referes to magnetic disks (HDDs etc.) and their basic operations were described above.
    • SSD refers to Solid State Drive. This is a type of non-volatile memory that has no moving parts and stores data on NAND flash chips. The performance is from hundreds to thousands times greater than magnetic disks. They also use less power.
      • The downside of SSDs is price and durability (in some cases, mostly for systems that are write intensive). The capacity is often only in hundreds of GB instead of thousands of SATA drives and price grows much more with added capacity.
      • Durability of SSD is one of the not-so-many problems they have. The smallest units, where 0/1 are stored, are cells. With this term you can hear abbreviations like SLC (single level cell), MLC (multi-level cell) and TLC (triple level cell). Cells are formed to pages and pages are formed to blocks. While read and write occur on page level, in order to write data an erase has to occur and erase works on block level. For writing even a small quantity of data, SSD has to erase a bigger number of cells to achieve that and since cells have a limited number of re-writes before getting worn out, this can lead to preliminary end of life of the whole SSD.

3.2 Given a scenario, configure RAID using best practices

  • RAID levels and performance considerations – RAID stands for Redundant Array of Inexpensive/Independent Disks. This is a technique that allows combining physical disks together to form a logical disk with increased performance, reliability and capacity. RAID is a basic storage feature which helps in preventing data loss.
    • In RAID terminology there is often  mentioned term “striping”. This means that data are segmented and stored on different physical drives. The term “stripe” (or stripe width) often refers to the whole size of logically sequential data written across all physical disks, while stride (or chunk, or stripe size, stride size, stripe length etc.) refers to one segment of the data on one physical disk.
    • 0 (striping)
      • Data are stored according to the picture above –  chunks are distributed  among disks connected in the raid in the manner when chunks of continuous data are split and stored alternately on both disks. This RAID is the most performing, because you for read and write operations, you are using disks simultaneously.
      • For RAID 0 you would need at least 2 disks.
      • Capacity penalty is none. Capacity = #disks*capacity.
      • Fault tolerance is none – if any of the disks fails, the RAID is broken.
      • Use cases – not many uses for RAID 0 in production environments. Usable for testing and lab purposes as you gain both capacity and performance and you are able to tolerate downtime.
    • 1 (mirroring)
      • Data are stored on one disk and mirrored to the second disk. In case of read, you are able to use both disks to read the data from, in case of write there is a penalty of 2 for write operations, because for each data you write to one disk, you have to write the exact copy of data to the second disk.
      • Even though it is theoretically possible to use more disks than 2 for RAID 1, current RAID controllers (HW) won’t permit you to do this and mostly switch the RAID configuration to RAID 1+0 (or RAID 0+1).
      • For RAID 1 you would need at least 2 disks.
      • Capacity penalty is 50%. Capacity = #disks*capacity/2.
      • Fault tolerance is 1 disk. In case of multiple disks it would be as fault tolerant as many disks you have minus one (logically one drive with the data must remain).
      • Use cases – usually used for system disks in servers.
    • 5
      • Data are stored alternatively on all disks and for each continuous data in stripe a parity is calculated. Parity is a special type of data from which we can calculate data on a failed drive in RAID 5. Parity is spread across all disks, which ensures fault tolerance of any of the disks.
      • RAID 5 has penalty for write of 4 for IO operations. The write penalty of 4 is because if you are saving some data to some disk, you have to read the data from the disk, read the parity, calculate the new parity, save the data and save the parity. You are doing more steps, but in fact 4 IO operations for 1 write of data.
      • For RAID 5 you would need at least 3 disks.
      • Capacity penalty is 1 disk (parity information will take up to one disk). Capacity = #disks*capacity – 1.
      • Fault tolerance is 1 disk. If any one disk fails, the RAID won’t break and you won’t loose data. When this happen the RAID 5 is in “degraded” state, which means the performance will be far from optimal (data from the missing disk has to be recalculated each time through the parity).
      • Use cases – not too important storage space.
    • 6
      • Data are stored alternatively on all disk and for each continuous data in stripe, a two sets of parity are calculated. Works like RAID 5, but with one more added parity.
      • RAID 6 has penalty of 6 for IO operations. Just like a RAID 5, but one more read for the second parity and more write for the second parity.
      • For RAID 6 you would need at least 4 disks.
      • Capacity penalty is 2 disks (parity information will take up to 2 disks). Capacity = #disks*capacity -2.
      • Fault tolerance is 2 disks. If any two disks fail the RAID won’t break and you won’t loose data, but again the RAID will be in degraded state.
      • Use cases – important storage space.
    • 10
      • Data are stored in two levels. First a smaller arrays – mirrors, then data are striped across them.
      • RAID 10 is often confused for RAID01, when the levels are opposite to RAID 10 – data are mirrored across striped arrays. RAID 01 is worse mainly because of fault tolerance in larger arrays.
      • RAID 10 write penalty is 2. Each chunk of data you are writing twice (in pair).
      • For RAID 10 you would need at least 4 disks.
      • Capacity penalty is 50%. Capacity = #disks*capacity/2.
      • Fault tolerance is at least 2. Because the first level of this array is created by RAID 1, as long as there is one functional disk in each of these R1 arrays, the whole array is functional.
      • Use cases – database servers.
    • Please note, that there are more RAID types – 2, 3, 4 as well as more multi-level RAID types -50, 60. Most of these RAID are very specific for a very specific purposes. RAID types like 50 or 60 are used within SAN with many disks.
  • Software vs hardware RAID
    • Software RAID
      • Usually the controller is some sort of software code – for example inside the Operating System. Less costly than HW RAID.
    • Hardware RAID
      • Specialized HW device (often a card in PCI slot).
      • HW RAIDs have their own firmware to which you can get during the server boot process. There is the possibility to set up logical volumes, before any OS installation. More costly than SW RAID.
    • Performance considerations
      • While the HW RAID may be more expensive, because you would need a separate HW (controller card) for it, it will be much faster a definitely a choice in server environment. HW controllers have their own built in cache memory, their own chips for computing and their own battery. From the performance point of view, having the memory and chips available just for array computing saves a lot of resources elsewhere (mainly on processor when compared to SW RAIDs, which uses processor for eg. parity computing etc.).
  • Configuration specifications
    • Capacity – for 1TB drives in RAID, the capacity will be based on type of RAID
      • Type of RAID #disks Capacity*
        RAID 0 4 4 TB
        RAID 1 2 1 TB
        RAID 5 4 3 TB
        RAID 6 4 2 TB
        RAID 10 4 2 TB
      • *Capacity is calculated as raw capacity, not usable
      • Unless a special type of raid is used (like eg. Synology hybrid RAID), you cannot mix capacity. Better said – you can mix capacity, but only the capacity of a disk with the lowest capacity will be used off the bigger disks (eg. having 4 disks – 2 500GB, 1 1TB and 1 2TB, you can create only a 1,5TB RAID 5, wasting capacity of the larger disks).
    • Bus types – when compared to each other
      • SATA – cable length – shortest, speed – slowest, price – cheapest
      • SAS – c.l. – moderate, speed – moderate, price – moderate
      • FC – c.l. – longest, speed – fastest, price – most expensive
    • Drive RPM
      • Comes in various types -7,2k/10k/15k. Mixing RPM in the RAID is possible but it is not the smartest thing to do – usually the RAID will be degraded to the speed of the slowest disk.
  • Hotswap support and ramifications
    • Hot swap has to be usually supported on both ends – both device and controller have to be supporting this. by default this is a pretty much standard today as all modern systems allows you to hot swap SAS and SATA (which is hot-swappable by specification) drives.
  • Hot spare vs cold spare
    • Hot spare – a disk which is in stand by mode until a working disk part of an array is labelled as faulty. In that case the hot spare disk is immediately placed instead of the faulty disk and begins working instead of the faulty drive. This is a common concept, which ensures that a disk is replaced as soon as possible and minimises the time needed for replacement of faulty drive (thus minimising the time for another drive to be faulty, which can lead to loss of data in case of some RAID types).
    • Cold spare – a drive that is prepared to replace a faulty drive. These are drives either stored in IT equipment room or in the server itself, waiting just for an administrator’s intervention. Nothing will work automatically in case of cold spares, but at least the time for order/buying a new drive is minimised.
  • Array controller (RAID controller, HW RAID etc.)
    • Memory (cache)
      • Dedicated memory just for block operations on the array. Usually configured in some ratio for read/write operations. On top of that allows to switch on write-back mode, which first writes the data to the cache (memory) and synchronize the data afterwards (for example, when the cache is full or the array is not under a pressure etc.).
    • Battery backed cache
      • Battery is located on the controller itself to power the memory and disks in case the power goes down. This is crucial in case write-back mode is used and most controllers disable write-back mode in case battery lifespan is over.
      • Downside of this that lifespan of battery is much less than of a controller card and you have to replace the battery every 5 years or so.
    • Redundant controller
      • This is a possibility in order to eliminate single point of failure – having a cold spare RAID controller in case the one in operation will fail.

3.3 Summarize hardware and features of various storage technologies

  • DAS (direct-attached storage)
    • Storage directly connected to the server.
    • This is the basic or better say standard approach for servers. You have a physical space in the server (usually on the front of the server, with disks facing outside of the server – for easy access to hot swap them in case of need), disks are connected to a SAS/SATA backplane – a board with connectors, where all disks are connected from the front side, which is then connected to a motherboard or controller. This will be the most common approach – internal DAS.
      • In some cases the server can be connected to a DAS externally by SCSI or FC (enclosure with HDDs would be connected this way)
    • SAS or SATA aren’t the only usable protocols, DAS can be found working with SCSI, FC, eSATA, USB.
    • Works with magnetic disks and SSDs.
  • NAS (network-attached-storage)
    • Shared storage on a network. Usually a server stripped off most functionalities, with lower performing processor, providing most features around shared storage space. It has a small OS or IPMI for configuring basic and advanced features.
    • Usually lower price than SAN, but can provide better features over DAS with easier setup.
    • Common protocols for NAS are SMB, CIFS or NFS.
    • iSCSI (internet Small Computer Systems Interface)
      • In short this is a SCSI protocol over IP network, meaning you can attach an interface of an network card on NAS/SAN to a switch and make it accessible by the iSCSI protocol. iSCSI encapsulates SCSI commands in IP packets.
      • Uses Initiator/Target model.
        • Initiator is a software (hardware initiators are rare – it would be a dedicated iSCSI network card, but also can be found) that connects through a NIC to:
        • Target – this is the NAS/SAN storage which accepts commands from initiators.
      • Common approach is to dedicate a physical network just for this storage access method – NICs on servers, switches and storage running just (on) this network.
      • It is possible to secure the connection of initiator to a target with authentication methods.
      • Through iSCSI it is possible to connect servers directly to the storage, but this have to be supported on both ends (especially on the storage side). Most common approach is to connect everything through dedicated switches.
    • FCoE (Fiber Channel over Ethernet)
      • Uses fiber channel protocol over Ethernet network. The target is to have access to the both worlds – ethernet and as well as fibre channel without the need to have NICs+HBAs.
      • Usually runs on 10GbE network.
      • Uses special cards called Converged Network Adapters (CNA). This card provides both functionalities – NIC+HBA on just one card.
  • SAN (Storage Area Networks)
    • SAN are larger storage arrays containing multiple disks and providing them over the network like they would be physically presented in the server.
      • Difference over the NAS is the use of protocol – SANs usually work with lower level protocols like SCSI, iSCSI, FC etc. where NAS provides just shares to the server through protocols of “higher level” like SMB. Word “usually” is well used here, because NAS exists in “Enterprise” versions, where you have access to protocols like iSCSI too.
    • Fiber Channel
      • Protocol for connecting SAN to servers – either through switches or directly (can be used too, but it has to be supported on both sides).
    • LUN & LUN masking
      • LUN (Logical Unit Number) – LUNs are provided to the server as a block devices, like the disk itself. Usually this allows to take smaller capacity from a logical volume, created upon physical disks, and provide it as a whole block device.
      • LUN masking – is a way to ensure, that just the intended systems have access to the LUN. It hides (masks) the LUN from other systems. It can be implemented on the HBA level (less secure) or on the controller level (more secure).
    • HBAs and fabric switches
      • HBA (Host Bus Adapter) – generally this is a highly specialized card for various protocols. For a FC on the server side you would need a dedicated HBA card with FC interfaces as these are not a standard interfaces coming out from the motherboard. The downside of HBA is price, but on the pros side is dedicated card with chip(s) understanding just that protocol and memory for working just with that protocol, offloading traffic from critical components like processor etc.
        • Fabric switches are in this case switches used for connecting servers with FC HBAs to SAN with controllers with FC ports. They are usually providing extra redundancy and scalability for the whole infrastructure.
  • JBOD (just-a-bunch-of-disks)
    • JBOD is what it stands for. There is no added value – no RAID configuration, no advanced protocol for provisioning of the block devices. Usually you can find these in servers with multiple disks and no configuration over it. You have an operating system on one disk, data for one project on the second, storage with installation packages on the third, databases on the fourth and so on.
    • There is no redundancy, performance scaling with different RAID types etc. You are only relying on the disks, or better say the disk, to provide you the whole functionality – its small cache memory and whole lifespan.
      • I have seen this only once in enterprise world (developer environment with virtual machines) and this approach goes well along with ignorance or arrogance. Usually you will find this approach in environments, that don’t have any Administrator (or too many) and lack systematic approach.
    • On the other hand you can surely find environments that were designed to use JBODs from the scratch. There are special use cases, where the redundancy is done by intelligent applications, which are aware of  being installed over JBOD and uses their own logic to achieve redundancy and span themselves or their parts on more physical disks (even though these disks can be set up as one large logical drive).
  • Tape
    • Drive
      • Drive allows perform IO operations upon the tape. Tape drives (external) can be bought separately much like for example external DVD drive. Tapes are changed manually and drive can store only one cartridge inside.
    • Libraries
      • Tape libraries are used for storing tapes. Tapes are magnetic systems used for storing data – they are usually inexpensive and offer quite large capacities. There are many tape standards – IBM, DEC, LTO etc.
      • Tape libraries consist of magazines (for storing tapes), drives (for performing IO operations with tapes) and robotics (mechanics or mechanical arm for taking the tape from magazine and putting it to the drive and vice versa).
      • Currently in declination – tape libraries used to be very popular in the past. Now with the decreasing cost of HDDs and with high price for libraries (and especially number of mechanical parts in it that can go wrong), the lower amount of money for media won’t justify the investment that easy. They still provide good ratio in cost and storage capacity and lifespan especially when used for archiving of data.
      • Often used as the second level for data backup – Disk-to-Disk-to-Tape.
  • Optical drive
    • CD ROM, DVD ROM, Blue-Ray – technologies based on laser beams for performing IO. Limited use today, mostly in personal life for archiving data. In server environment you would use it for initial installations in case other access methods are not available. Small capacity, limited lifespan.
  • Flash, Compact Flash and USB drive
    • Flash and Compact flash are storage mainly used in digital cameras. Requires special drive to read/write in laptops. Used in server environment for special use cases like vmWare Hypervisor.
    • USB drives represents portable, small scale drive for various purposes. It can be easily portable storage for SW installation media or it have the whole operating system installed on it.

3.4 Given a scenario, calculate appropriate storage capacity and plan for future growth

  • Base10 vs Base2 disk size calculation (1000 vs1024) – there is a difference in capacity calculations – basically this is a marketing thing, where HDD manufacturers counts the capacity based on decimal system whereas computer works with the capacity in binary system.
    • Hardware manufacturers uses Base10 for calculating capacity of the HDD
      • In the decimal system (Base10):
        • 1KB = 1000 bytes
        • 1MB = 1000 KB
        • 1GB = 1000 MB
    • Computer works with Base2 for calculating capacity of the HDD:
      • In the binary system (Base2):
        • 1KB = 1024 bytes
        • 1MB = 1024 KB
        • 1GB = 1024 MB
    • Example – when you buy a 4TB HDD how much will be seen in OS?
      • 1TB = 1000 GB = 1 000 000 MB = 1 000 000 000 KB = 1 000 000 000 000 bytes
        • Now if you divide the number of bytes by 10244 (1024 is the base2 representation of kilobyte and power of four is the summary of sets of 000 in the number) and multiply it by 4 (original task is 4TB) you will get 3,63TB (rounded down)
        • Another technique would be to transfer it to bytes, like 4 000 000 000 000 bytes and divide it by 2^40 (representation of TB in base 2).
    • Example 2 – when you buy a 120 GB  HDD how much will be seen in OS
      • 120 GB is 120 000 000 000 / 1024^3 = 111,75GB
  • Disk quotas
    • Technique how to limit stored data per user or per volume.
      • In Windows client OS you can set this up by rclicking the disk in This computer on tab Quota.
      • In Windows server OS quota can be set easily through FSRM.
  • Compression
    • Disk space compression can be achieved by SW or specialised HW (mostly in the past) and algorithms they use for compression.
    • Not used very widely today as price for storage capacity isn’t that high as it used to be.
    • Compression of data can be done by two ways mainly:
      • file compression utilities and storing them in archives – files with special file type (.zip,.rar,.7z etc.)
      • on-the-fly compression of data by utilities substituting the local operating system processes for handling IO operations
    • Pros – can save a space in case of not compressed data, Cons – adds processing overhead to everything else (data saved need to be compressed and data read have to be decompressed before used)
  • Capacity planning considerations:
    • In general you should always take into consideration not only the current need, but also a plan for the next year(s). This will permit you to be prepared for extraordinary requirements or future growth.
    • Operating system growth – first consideration is operating system itself, initial requirements for space are specified by manufacturer. You should always take into consideration other installed applications or OS parts (server roles etc.).
      • Patches – patches and patch logs can also be taking too much space over time, obsolete installation packages for patches should be cleaned on regular basis
      • Service packs – usually stores a lot of changes for the operating system, therefore the installation package for SP is quite larger
        • In Windows, you can clean unused SP installation packages by command Dism.exe /online /Cleanup-Image /SPSuperseded
      • Log files – can grow big over some time, based on the a log rotation or log cleanup should be specified and logs should be archived.
        • In connection with certain services, especially those publically facing internet like FTP, a specific approach to log files should be considered from the start
    • Temporary directories
      • Temporary directories are used for storing temporaral data used for calculations now or in future. It is possible to empty these directories with no harm, but it can cause some configuration changes eg. browsers would lost settings for specific pages, auto-login etc.
    • Databases
      • Databases are a topic on its own. Regarding temporal data a specific disk or storage is advised, as well as approach to configuration of growth, for temporary database (MS SQL it is tempDB).
      • Also a methods for shrinking databases can be used from time to time in order to save space. This should be always in hands of people that understand databases – Database Administrators. Shrinking a database can be a very bad idea in certain situations and can require additional steps afterwards (rebuilding indexes etc.).
    • Application servers
      • App servers usually grow the most on the side of logs. Logging various information regarding retrieving web pages or application usage.
    • File servers
      • File servers grow with used space and usually require planning not only of growth, but also of restrictions in form of file quotas.
    • Archival

Leave a Reply