Understanding the Performance of Quantum Solid State Disks
Specialty Storage Products Group
Copyright ©1995, Quantum Corporation. All Rights Reserved.
Performance of systems is dependent on system configuration and application workload, so actual performance may vary. The data presented herein is representative of performance on the systems and configurations discussed.
Since their introduction about a decade ago, the performance of workstation and server systems has increased by almost two orders of magnitude. In the same period, the performance of magnetic disk drives has increased by only a factor of 2 to 3. This disparity has placed increasing emphasis on storage performance alternatives, particularly certain variations of RAID and caching.
This paper introduces a technology that may be new to workstation and server users: solid state disk (SSD). Until recently, the cost of such devices typically exceeded that of workstations, limiting their use to high-value, commercial mainframe applications. SSDs also tended to be large devices using special interconnects and protocols to interface with systems.
The convergence of open storage standards (SCSI), new DRAM technologies (4 & 16 Mb) and lower cost, higher capacity low profile magnetic disks has brought solid state disk to the workstation and server market. What in 1989 cost over $1000 per megabyte and was the size of a refrigerator, now sells to end users for less than $100 per megabyte, packaged in standard 51/4" and 31/2" disk form factors.
The explosion in computing performance has also caused a trend to migrate applications from mainframe and super-minicomputers to smaller client-server systems based on workstation technology. Quantum's new SSDs support this migration by allowing those performance-critical applications to take advantage of the same high-performance technology used in the mainframe environment at a cost appropriate for the server environment.
This paper describes the benefits of Quantum's SCSI solid state disks. The results of performance benchmarks are presented, along with a comparison of SSD technology to caching.
What is a solid state disk
A solid state disk is a disk drive that uses semiconductor storage (DRAMs) instead of magnetic platters as the media. The lack of mechanical components leads to very fast, predictable access time. Standard disk interconnects, such as SCSI, and form factors allow solid state disks to be added to systems just like magnetic disk drives. Most SSDs include some type of data retention mechanism to compensate for the volatile nature of DRAMs. This may either be battery backup or the combination of batteries and a magnetic disk.
What are the advantages of solid state disk
The main advantage of solid state disk is fast access time. The typical seek and rotational latency component of magnetic disk access time is virtually eliminated, replaced by the time required for a microprocessor to interpret the SCSI command and program the hardware to transfer the requested data from the semiconductor "media" to an internal buffer. As can be seen in Figure 1, this leads to data access times up to 30 times faster than magnetic disk.
Solid state disk gives system and subsystem providers a competitive advantage. It provides high performance, even on workloads where caching and RAID provide no benefits.
Figure 1 - SCSI Command Timing
The timing of the rest of the command processing and data transfer is like that of magnetic disks. Solid state disks, then, have their greatest advantage for small I/Os where the mechanical access time is a significant portion of the total time to complete the I/O on a magnetic disk. Industry and academic studies have shown that small I/Os are typical for most operating systems and applications.
The access time advantage of solid state disk decreases as the I/O request size grows, because data transfer time becomes the dominant component of I/O completion time. The advantage also diminishes for sequential workloads where seeks are short and infrequent. Quantum's new ESP3000 and ESP5000 families of solid state disks have greatly improved sustained bandwidth. With a sustained transfer rate of over 13 megabytes per second, they significantly outperform magnetic disk drives, even for purely sequential workloads.
When to use solid state disk
Solid state disks are aimed at applications whose performance is limited by I/O performance. They are particularly effective in write-intensive applications, or applications where data locality is poor. As discussed below, caching tends to be ineffective in those applications.
Solid state disks are most used in commercial processing and on-line transaction processing applications. The tangible costs of lost business or time wasted by valuable human or computer resources readily justify an investment in solid state disk technology.
Although solid state disks have sustained bandwidths twice that of high performance magnetic disks, they still may not be the optimum choice for applications that require high bandwidth for large, sequential I/O requests. Striping of high performance magnetic disks may be a lower cost method to meet high bandwidth requirements.
Some examples of SSD use, and the performance benefits obtained are:
- putting journal files and key parts of an options trading database on SSD reduced the response time in a financial trading environment from 24-62 seconds to 2-4 seconds.
- moving active operating system files, such as job and security databases, to SSD reduced login time by 75% and essentially eliminated waits during batch and print queue operations.
- using SSD to hold chemical modeling data reduced job run time by 26%.
This type of performance boost can be important to system providers in demonstrating the full potential of their products on workloads such as TPC-A and AIM.
Qualifying the need for solid state disk
Figure 2 illustrates the decision process used to evaluate the need for solid state disk by a system or application. The data needed can be obtained using performance monitoring utilities such as the UNIX utility iostat . As in any performance tuning exercise, the goal here is to determine which subsystem is causing the bottleneck and determine a cost-effective solution.
Figure 2 - SSD Qualification Decision Tree
How much solid state disk is needed
Studies done by Princeton University [Staelin 88] and by Digital [Ferguson 89], show that applications tend to do the bulk of their I/O to a very small percentage of the available on-line data. Typically, over 80% of the I/O requests go to less than 20% of the megabytes of storage in the system. What is even more important is that over 50% of the I/O requests go to only 1% to 3% of the megabytes. Isolating and placing this small amount of "hot" storage on solid state disk means over 50% of the I/O requests performed by the system will see the very fast access time of solid state disk.
Figure 3 - The 80/20 rule applies to storage access
Quantum's Solid State Disks
The performance tests documented in this report were performed on ESP3013 and ESP3026 drives. The ESP5000 series shares a common controller and firmware design with the ESP3000 family. Performance for the ESP5000 family members will be the same as the ESP3000 results. Quantum's solid state disk families include the following models:
Table 1 - Quantum Solid State Disk Models
Model Name Cacity Form Factor Integraded Data Retention ESP3013 134 MB 31/2" HH Yes ESP3026 268 MB 31/2" HH Yes ESP5047 475 MB 51/4" FH Yes ESP5095 950 MB 51/4" FH Yes
The integrated data retention system in the ESP family consists of a magnetic disk drive and sufficient batteries to move the entire contents of memory onto the magnetic drive multiple times in the event of power failure. Modified data is actually moved continuously to the data retention disk by a background process, so under normal conditions the save under battery power completes in under one minute. The battery capacity, derated for temperature and battery aging, is thirty minutes for the ESP3000 family and forty minutes for the ESP5000 series. When power returns, data is automatically restored to the memory from the magnetic disk. The complete data retention system is integrated into the 51/4" and 31/2" form factor for easy plug-and-play operation.
All members of both families share a common controller and firmware design. This commonality reduces device qualification and application performance testing time.
Evaluating ESP performance
The best test of solid state disk performance is in an actual running application, or an artificial workload that accurately simulates the I/O profile of the target application.
When comparing solid state disk and magnetic disk performance, it is important to choose a workload that reflects the differences between solid state disk and magnetic disk. Such a workload has small, random I/O requests. This workload is typical for most modern operating systems. Operating systems today, including UNIX, Mac OS, MS-DOS, Windows NT and OpenVMS, include some level of disk caching. Since this cache satisfies a large share of the read requests, the resulting workload seen by the disks is random with a large number of writes. Furthermore, the I/O request size tends to be small. For UNIX, it is the size of a buffer cache page (typically 8 KB). The sizes tend to be even smaller for other operating systems.
The first approach many people take to test a new disk drive is a large disk-to-disk copy. This typically results in large, sequential I/O requests. Although such a test will show a marked performance advantage for the ESP3000/5000 over magnetic disks, it is not a realistic simulation of a running application. The results of such a test will generally understate the true performance potential of solid state disk technology. Application benchmarks, such as TPC-A, are much better indications of performance than large file copies.
Access and response time
The new ESP3000 and ESP5000 solid state disks have greatly improved performance over earlier models. The ESP3000 series access time, from end of command to first byte of data is approximately 150 µs for a single 512-B sector request. This time includes about 120 µs to interpret the command, allocate buffers and setup the internal memory access and about 30 µs to read and correct the first data block. Data access is pipelined. Subsequent sectors are output approximately one every 30 µs. Depending on the settings on the Disconnect/Reconnect Mode Page, the access time may be longer for multiblock commands. If the ESP3000 disconnects after accepting the command, it will read a number of blocks, specified in the Maximum Burst Size field, before reselection.
The following trace shows a typical synchronous, single-sector command. This trace was taken during the testing shown below on 60 mHz Pentium System with an Adaptec Controller The entire access takes 216µS, of which 31µS arbitration and command transfer, 4128.5µS is data transfer and 151 µs is drive access time.
Figure 4 - SCSI Bus Trace, 60Mhz Pentium, Adaptec AHA 2940W Controller, Synchronous 512B I/O
ssss.mmm_uuu_nnn---------------------------------------------------- Bus Free Arb Start 7 2_300 Arb Win 7 1_140 (Atn Assertion) 140 Sel Start 0 7 760 Sel End 19_300 (Atn Deassert) 200 Msg Out 80 4_380 Command 28 00 00 00 00 64 2_720 00 00 01 00 151_120 Data In 0000 0000 0000 0000 0000 0000 600 0000 0000 0000 0000 0000 0000 1_480 0000 0000 0000 0000 0000 0000 600 0000 0000 0000 0000 0000 0000 [...] 22_200 0000 0000 0000 0000 0000 0000 600 0000 0000 0000 0000 2_460 Status 00 3_660 Msg In 00 1_480 Bus Free --------------------------------------------------------------------
The ESP3000 series of solid state disks are capable of over 1800 I/O requests per second. To obtain such throughputs requires multiple simultaneous requests in order to overlap host, bus and drive components of access time.
The following graphs were produced using a saturation workload tool. This tool attempts to maintain a user-specified number of commands queued to the drive. It does this by issuing that number of commands asynchronously, then re-issuing commands as part of the I/O post processing using an asynchronous trap. Note that these asynchronous commands are made synchronous (from the drive's perspective) by a non-queuing driver. Except where noted, all tests used a maximum random seek and a request size of 512 bytes.
Figure 5 shows the throughput as a function of request size in 512B sectors with a non-queuing driver.
Figure 5 - ESP3000 Series Request Throughput, 60 Mhz Pentium system, Adaptec AHA 2940W Controller
The throughput for synchronous I/O is limited by the combination of the ESP3000 access time (approximately 150 µs) and the host CPU and adapter overheads (approximately 40 µS).
As is shown in the following graph (Figure 6), 16 bit single-ended members of the ESP3000 series have a sustained transfer rate of over 13.0 MB/s. The ESP3000 series excels over magnetic disk with a random workload, even at very large request sizes. Here, advantage of the lack of mechanical latency in the solid state disk is visible.
Figure 6 - ESP3000 Series Sustainable Bandwidth Performance, 60 Mhz Pentium, Adaptec AHA 2940W Controller
Comparing SSD and caching
Conceptually, a solid state disk with data retention is very much like a cached magnetic disk. There are two key differences that give solid state disk an advantage over cache.
The first difference is that data on a solid state disk is always in memory. Cache memory only holds recently used blocks and the blocks near them. Caches work on the principle of locality of reference. Programs tend to use data closely spaced in address and time, improving the probability, but not guaranteeing that data will be found in the cache when needed, generating a cache hit. The hit ratio, or percentage of accesses that generate cache hits, depends on the locality of the workload, the size of the cache and how it is managed. Given the hit ratio, the average access time can be calculated from the following equation:
This equation assumes that the cache access time is incurred every access in order to determine whether or not the data is in the cache.
The following chart shows average access time plotted against hit ratio for a 15 ms disk with a 250 µs cache access time. The magnetic disk and solid state disk access times are shown as horizontal lines for reference.
Figure 7 - Access Time vs. Cache Hit Ratio
Studies have shown that for typical sized disk caches, hit ratios in the range of 75% to 85% can be expected. There are some industries and applications, telecommunications for example, that exhibit little or no locality and experience lower hit ratios. There are also some workloads that have better locality and achieve greater than 95% hit ratios. Because data on solid state disks is always in memory, SSDs act as a cache with a guaranteed 100% hit ratio.
The second difference is also related to predictable performance. Most disk caches are volatile. To ensure the safety and integrity of the user's data, caches use a technique called write-through. Writes to the disk are written through the cache, all the way to the magnetic media, before being considered complete. While this ensures that data is safely on the magnetic media in the event of power outage, it means that the performance is dependent on read-to-write ratio.
Given the read-to-write ratio, the average access time equation is modified as follows:
This equation assumes that the hit/miss statistics are kept only on reads, and that cache management adds no additional overheads to writes. Using this equation, Figure 8 shows average access time as a function of write ratio for a 15 ms disk with a 250 µs cache and a 80% read hit ratio.
Most workloads are read mostly, with 20-30% of operations being writes, but that is not necessarily what is seen by the cache. Modern systems often have multiple levels of cache. The UNIX and Macintosh operating systems both have a disk cache in main memory, referred to as the buffer cache in UNIX. Most modern SCSI disks implement some form of write-through caching. Many also have a write-back mode. It is important to understand how these caches interact.
Figure 8 - Access Time vs. Percentage of Write Operations
The presence of an I/O cache in host memory radically changes the workload that a cache in the disk drive sees. Since most of the reads are satisfied out of the cache in host memory, the disk drive workload becomes write-intensive. Instead of 25% writes, 70%-80% of the operations to the disk are writes. As seen in Figure 8, such a workload sees very little benefit from the use of caching.
In addition, since much of the locality is extracted from the workload in the host cache, the hit ratio in the disk drive is much less than would have been achieved with only the cache in the disk drive. Applications that have a disk bottleneck, even while taking advantage of a host disk cache, are excellent candidates for solid state disk technology.
Summary and Conclusions
Solid state disk technology achieves its performance advantage by removing the mechanical latencies from a magnetic disk access. When evaluating how Quantum Solid State Disks will perform, it is important to understand what part those latencies play in the workload under evaluation.
It is important to evaluate disk drive performance with workloads that are representative of the target application. The degree to which solid state disk will improve system performance is very dependent on the seek and other characteristics of the I/O workload.
[Staelin 88] Staelin, Carl. File Access Patterns. Technical Report CS-TR-179-88, Princeton University, September, 1988
[Ferguson 89] Ferguson, Paula. Hot File Analysis of FSTRACE I/O Data Internal Digital Memorandum, September 1, 1989.