First models of the Pentium line clocked in at 60/66MHz. Most CPUs will run a clock speed between 75-233MHz.
The Pentium(TM) processor is the newest and most powerful member of Intel's x86 family of microprocessors. While incorporating new features and improvements made possible by advances in semiconductor technology, the Pentium processor is 100% code compatible with previous members of x86 family, preserving the value of user's software investments.
The Pentium processor incorporates a superscalar architecture, improved floating point unit, separate on-chip code and write-back data caches, 64- bit external data bus, and other features designed to provide a platform for high-performance computing.
The State of Processor Design Art
In recent years, developments in the art of semiconductor design and manufacturing have made it possible to produce increasingly more powerful microprocessors in smaller and smaller packages. Chief among these developments has been the decreasing size of components. Microprocessor designers are now working with CMOS (complementary metal-oxide semiconductor) process technology with features of less than a micron (one- millionth of a meter) in size.
The use of sub-micron components allows designers to fit more of them on a chip. The number of transistors in each member of the x86 family has continued to grow, culminating in the Pentium processor, which is implemented in 0.8 micron CMOS technology and has 3.1 million transistors.
The increase in transistors has made it possible to integrate components that were previously external to the CPU (such as math coprocessors and caches) and place them on-board the chip. Placing components on-board decreases the time required to access them and increases performance dramatically. Another way to decrease the distance between components (and therefore increase the speed of communication between them) is to provide multiple levels of metal for interconnection. Intel's current microprocessor technology utilizes a 3 metal layer, the layout of which requires special computer-aided design tools.
The Pentium processor utilizes the latest in microprocessor design technology to provide performance comparable to that of alternative architectures used in scientific and engineering workstations, while maintaining compatibility with the immense installed base of software now available for the x86 family of microprocessors.
Intel's x86 family
The history of the personal computer industry is intimately associated with the history of Intel's x86 chip family. In 1985, Intel introduced the ground-breaking Intel386TM DX CPU, a 32-bit microprocessor that executed 3 to 4 million instructions per second (MIPS). Available in speeds ranging from 16 MHz up to 33 MHz, the Intel 80386 addresses up to 4 gigabytes of physical memory, and up to 64 terabytes of "virtual memory" (a technology borrowed from mainframe computers that allows systems to work with programs and data larger than their actual physical memory.)
The Intel 80386 provided for true, robust multitasking and the ability to create "virtual 8086" systems, each running securely in its own 1-megabyte address space. Like its predecessors, the i386 DX microprocessor spawned a new generation of personal computers, which had the ability to run 32-bit operating systems and ever-more complicated applications, all the while maintaining compatibility with previous members of the x86 family.
In 1989, Intel shipped the Intel486TM DX microprocessor, which incorporated an enhanced 386-compatible core, math coprocessor, cache memory, and cache controller--a total of 1.2 million transistors--all on a single chip. Operating at an initial speed of 25MHz, the Intel 80486 DX CPU processed up to 20 MIPS. At its current peak speed of 50 MHz, the Intel486 DX CPU processes up to 41 MIPS. By incorporating RISC principles in its CPU core (specifically, instruction pipelining), the Intel 80486 DX CPU is able to execute most instructions in a single clock cycle. In spite of these powerful new features, the Intel486 DX microprocessor maintains 100% compatibility with previous members of the x86 family, thereby preserving customers' investment in software.
With the 1992 introduction of the Intel486TM DX2 microprocessor, Intel increased the speed of the 486 family by as much as 70 percent. The DX2 family features a technology called "clock doubling," which allows the processor to operate twice as fast internally as externally. The Intel486 DX2 CPU is nevertheless pin-compatible with the Intel486 DX processor. At its current peak speed of 66 MHz, the Intel486 DX2 CPU executes up to 54 MIPS.
The Pentium processor is the next step in Intel's commitment to provide the highest possible performance at the best price, while maintaining compatibility with previous Intel processors.
First Superscalar x86 Compatible Processor
The heart of the Pentium processor is its superscalar design, built around two instruction pipelines, each capable of performing independently. These pipelines allow the Pentium processor to execute two integer instructions in a single clock cycle, nearly doubling the chip's performance relative to a Intel486 chip at equal frequency.
The Pentium processor's pipelines are similar to the single pipeline of the Intel486 CPU, but they have been optimized to provide increased performance. Like the Intel486 CPU's pipeline, the pipelines in the Pentium processor execute integer instructions in 5 stages: Prefetch, Instruction Decode, Address Generate, Execute, and Write Back. When an instruction passes from Prefetch to Instruction Decode, the pipeline is then free to begin another operation.
In many instances, the Pentium processor can issue two instructions at once, one to each of the pipelines, in a process known as " instruction pairing." In this case, the instructions must both be "simple", and the v- pipe always receives the next sequential instruction after the one issued to the u-pipe. Each pipeline has its own ALU (arithmetic logic unit), address generation circuitry, and interface to the data cache.
While the Intel 80486 microprocessor incorporated a single 8 Kbyte cache, the Pentium processor features two 8K caches, one for instructions and one for data. These caches act as temporary storage places for instructions and data obtained from slower, main memory; when a system uses data, it will likely use it again, and getting it from an on-chip cache is much faster than getting it from main memory.
The Pentium processor's caches are 2-way set-associative caches, an improvement over simpler, direct-mapped designs. They are organized with 32-byte lines, which allows the cache circuitry to search only 2 32-byte lines rather than the entire cache. The use of 32-byte lines (up from 16- byte lines on the 486 DX) is a good match of the Pentium processor's bus width (64 bits) with burst length (4 chunks.)
When the circuitry needs to store instructions or data in a cache that is already filled, it discards the least recently used information (according to an "LRU" algorithm) and replaces it with the information at hand.
The data cache has two interfaces, one to each of the pipelines, which allows it to provide data for two separate operations in a single clock cycle. When data is removed from the data cache (and only then), it is written into main memory, a technique known as write-back caching. Write- back caching provides better performance than simpler write-through caching, in which the processor writes data to the cache and main memory at the same time (though the Pentium processor can be dynamically configured to support write-through caching).
To ensure that the data in the cache and in main memory are consistent with one another (especially a concern with multiprocessor systems), the data cache implements a cache consistency protocol known as MESI. This protocol defines four states, which are assigned to each line of the cache based on actions performed on that line by a CPU. By obeying the rules of the protocol during memory read/writes, the Pentium processor maintains cache consistency and circumvents problems that might be caused by multiple processors using the same data.
The use of separate caches for instructions and data works in conjunction with other elements of the Pentium processor's design to provide increased performance and faster throughput compared to the Intel486 microprocessor. For example, the first stage of the pipeline is Prefetch, during which instructions are obtained from the instruction cache. With a single cache, conflicts might occur between instruction prefetches and data accesses. Providing separate caches for instructions and data precludes such conflicts and allows both operations to take place simultaneously.
The Pentium processor also increases performance by using a small cache known as the Branch Target Buffer (BTB) to provide dynamic branch prediction. When an instruction leads to a branch, the BTB "remembers" the instruction and the address of the branch taken. The BTB uses this information to predict which way the instruction will branch the next time it is used, thereby saving time that would otherwise be required to retrieve the desired branch target. When the BTB makes a correct prediction, the branch is executed without delay, which enhances performance.
The combination of instruction pairing and dynamic branch prediction can speed operations considerably. For example, a single iteration of the classic Sieve of Eratosthenes benchmark requires 6 clock cycles to execute on the Intel486 microprocessor. The same code executes in only 2 clock cycles on the Pentium processor.
Improved Floating Point Unit
The floating point unit in the Pentium processor has been completely redesigned over that in the Intel486 microprocessor. It incorporates an 8- stage pipeline, which can execute one floating point operation every clock cycle. (In some instances, it can execute two floating point operations per clock--when the second instruction is an Exchange.)
The first four stages of the FPU pipeline are the same as that of the integer pipelines. The final four stages consist of a two-stage Floating Point Execute, rounding and writing of the result to the register file, and Error Reporting. The FPU incorporates new algorithms that increase the speed of common operations (such as ADD, MUL, and LOAD) by a factor of 3 times.
The Pentium processor's new architectural features--its superscalar design, separate instruction and data caches, write-back data caching, branch prediction, and redesigned FPU--will enable the development of new applications software, in addition to improving the performance of current applications in a manner that is completely transparent to the end user.
Internally, the Pentium processor uses a 32-bit bus, like that of the Intel486. However, the external data bus to memory is 64-bits wide, doubling the amount of data that may be transferred in a single bus cycle. The Pentium processor supports several types of bus cycles, including burst mode, which loads large (256-bit) portions of data into the data cache in a single bus cycle. The 64-bit data bus allows the Pentium processor to transfer data to and from memory at rates up to 528 Mbyte/sec, a more than 3-fold increase over the peak transfer rate of the 50 MHz Intel486 (160 Mbyte/sec).
Several instructions (such as MOV and ALU operations) have been hardwired into the Pentium processor, which allows them to operate more quickly. In addition, numerous microcode instructions execute more quickly due to the Pentium processor's dual pipelines. Finally, the Pentium processor features an increased page size, which results in less page swapping in larger applications.
The result of the Pentium processor's new architectural features and enhancements to the 486 architecture is performance improvement ranging from 3 to 5 times (5 to 10 times for floating point intensive applications) when compared to a 33 MHz 486 DX and 2.5 times when compared to the 66 MHz Intel486 DX2 CPU.
Such dramatic performance improvements will meet the demands of computing in a number of areas: advanced multitasking operating systems that support graphical user interfaces, such as Windows NT, OS/2, and new Unix implementations; compute-intensive graphics applications such as 3-D modeling, computer-aided design/engineering (CAD/CAE), large-scale financial analysis, high-throughput client/server; handwriting and voice recognition; network applications; virtual reality; electronic mail that combines many of the above areas; and new applications yet to be developed.
The Pentium processor employs a number of techniques to maintain the integrity of the data with which it is working. Error detection is performed on two levels: via parity checking at the external pins; and internally, on the on-chip memory structures (cache, buffers, and microcode ROM.)
For situations where data integrity is especially crucial, the Pentium processor supports Functional Redundancy Checking (FRC). FRC requires the use of two Pentium chips, one acting as the master and the other as the "checker". The two chips run in tandem, and the checker compares its output with that of the master Pentium processor to assure that errors have not occurred. The use of FRC results in an error detection rate that is greater than 99 percent.
The Pentium processor includes a number of built-in features for testing the reliability of the chip. These include: a Built-In Self Test that tests 70% of the Pentium processor's components upon resetting the chip; an implementation of the IEEE 1149.1 standard (Test Access Port and Boundary Scan Architecture), which provides a standard interface for manufacturers to test the external connections to the Pentium processor; and Probe Mode which provides access to the software visible Pentium processor registers for the purpose of determining the current state of the processor.
The Pentium processor also provides performance monitoring features that will make it easier for developers to take fullest advantage of the Pentium processor's superscalar architecture. System developers will be able to monitor the "hit rates" of the instruction and data caches, as well as the length of time the Pentium processor spends waiting for the external bus, which will help in the optimal design of external memory. The ability to measure address generation interlocks and parallelism will help compiler authors develop the most effective methods for instruction scheduling.
So that system developers may design systems with different features for specific applications, the Pentium processor incorporates a System Management Mode (SMM) similar to that of the Intel386 SL architecture. Power management and security features are two areas for which SMM is useful.
High Performance While Maintaining Compatibility
The Pentium processor is a high-performance microprocessor that incorporates the latest state-of-the-art design principles to meet the needs of newly developing areas of applications software, while nevertheless maintaining complete compatibility with the $50 billion installed base of software currently running on members of the x86 family.
Users will experience dramatic performance improvements while running their current software, and can anticipate new applications that take advantage of the Pentium processor's high-performance features.
General Q1. Which markets will be the first to employ Pentium processor-based systems? A1. We expect that initial customers for Pentium processor- based systems will be traditional early adopters who require increased performance to meet their needs. The Pentium processor will power advanced personal computers, workstations and super servers. Q2. I just bought an Intel486TM CPU-based system; is the Pentium processor going to obsolete it? A2. No. The Intel486TM CPU remains the mainstream processor. The Pentium processor will have limited availability in '93 and will be targeted at high-end applications, such as servers. As we have seen with the Intel486 CPU, the Pentium processor will evolve downward in the market and one day become the volume mainstream processor . Speed/Performance Q3. What is the performance of the Pentium processor in comparison to an Intel486 CPU? A3. The Pentium processor runs applications up to five times as fast as the popular, desktop-standard 33-MHz Intel486 DX CPU. The 66-MHz Pentium processor operates at 112 million instuctions per second Dhrystone (MIPS), it has a SPECint92 rating of 64.5 and SPECfp92 rating of 56.9 and an Intel iCOMPTM Index rating of 567. The performance delta between the 66- and 60-MHz version of the Pentium processor is about 10 percent. Q4. What is the performance of the Pentium processor in comparison to RISC machines? A4. The Pentium processor has equal or greater integer performance (SPECint92) than all current volume shipping RISC-based systems. In addition, the Pentium processor has demonstrated workstation-class floating- point performance. The RISC processors available today are designed to be a very high-end processors. In the mainstream volume workstation and PC marketplace, it is important to be able to ship millions of processors, not just thousands. Q5. What is the iCOMPTM Index? A5. The iCOMPTM Index was created by Intel as an easy-to-use index to give PC buyers useful processor performance information when selecting an Intel-based PC. This tool reflects the performance of the microprocessor and should not be used as a measurement of overall system performance. For example, the Intel486 SX CPU at 25-MHz has an iCOMP rating of 100, the Intel486 DX2 CPU at 66-MHz has an iCOMP rating of 297 and the Pentium processor at 66-MHz has an iCOMP rating of 567. Naming Q6. Why did you name it the Pentium processor? A6. The purpose of naming it the Pentium processor is to help users recognize the genuine Intel processor. Imitators sell products using the "386" and "486" designation when the products are not on par with Intel's. We want to ensure that the PC user knows which processor is the genuine Intel chip. The Pentium name will designate that: no one else can legally use that name. Upgradability Q7. I have heard people refer to Pentium Ready or OverDriveTM Pentium systems. What are they and when will they be available? A7. Many Intel486 DX2 CPU-based systems will be upgradable to Pentium processor technology. Whether systems are upgradable is based on system design considerations. The Pentium processor-based OverDriveTM Processor will be introduced in 1994. Software Q8. What applications are best suited for Pentium processor- based machines? A8. The Pentium processor will enable high-performance servers at a lower cost than currently available. The Pentium processor is capable of running all major network operating systems with scalability from the desktop to the data center. Performance-intensive desktop and technical applications, such as imaging, real-time video and voice recognition will benefit from the increased performance available from the Pentium processor. In addition, it will expand the acceptance of Intel processor-based systems into applications such as scientific modeling, computer-aided design/engineering (CAD/CAE), large-scale financial analysis and high- throughput client/server applications. Q9. Will software written for 286/386/486 CPU-based systems run on the Pentium processor? What will be the difference? A9. Yes, Intel has always been committed to compatibility across processor generations and that will continue. To achieve the highest possible software application performance from Pentium processor and Intel486 CPU- based systems, software can be optimized. Q10. What is software optimization? A10. Optimization is the process by which operating systems and application software are developed or recompiled to take full advantage of the Intel architecture. Results are most dramatic on the Intel486 and Pentium processor- based systems. Q11. How much faster can the Pentium processor run today's software than the Intel486 DX2 CPU? A11. About 40-70% faster than the 66-MHz Intel486 DX2 CPU running existing software. Q12. Which software developers have committed to optimizing their applications for the Intel architecture? A12. Currently, Andersen Consulting*, Adobe*, Aldus*, Autodesk*, Cadre*, Calera*, ComputerVision*, Dragon*, EDS*, Frame Technology*, Gain Technology*, Gupta*, Hypercube*, IBM*, Ithaca*, Interleaf*, Knowledgeware*, Kurzweil*, Lotus*, Microsoft*, Novell*, NCR*, Oracle*, Pixar*, Reuters*, SAS*, SCO*, Set Technology*, Sigma Design*, SunSoft*, Sybase*, Univel*, Viewlogic*, Ventura* Software, and Wolfram* have all committed that one or more of their applications will be optimized for the Intel architecture. More software companies are committing every week. Q13. Which operating system suppliers are committed to supporting Pentium processor? When? A13. IBM*, Microsoft*, NeXT*, Novell*, SCO*, SunSoft*, Univel* and USL*. You will need to check with them on announcement plans or ship schedules. Q14. Which compiler and tools companies are supplying optimized tools and compilers? A14. Absoft*, Borland*, IBM*, Liant*, MetaWare*, Micro Focus*, Microsoft*, NeXT*, SCO*, USL*, and WATCOM*. Q15. If Pentium processor performance is so great, why would I want or need to optimize my software? A15. While the Pentium processor is significantly more powerful than its predecessors, performance can be enhanced when software is optimized for the Intel architecture. Intel has been working with its software partners for over a year to ensure that full advantage of the Pentium processor and Intel486 microprocessor performance can be taken by tools, compilers, operating systems and application software. Q16. How much incremental performance can I expect from an optimized application running on a Pentium processor- based system? A16. Performance enhancements will vary, but early optimization projects have yielded up to 30% performance enhancement over the enhancement provided by the chip alone. Technical Details Q17. How does the Pentium processor differ from the Intel486 CPU? What are new features of the Pentium processor? A17. The Pentium processor includes both new architectural features as well as enhancements to the Intel486 CPU. New architectural features are superscalar architecture, a totally redesigned Floating Point Unit (FPU), branch prediction, separate code and data caches, a write back cache with MESI (Mutual Exclusive Shared Invalid) protocol, multiprocessor support and built-in data integrity for increased reliability. Other enhancements to the architecture include hardwired instructions, enhanced microcode, increased page size, 64-bit data bus and pipelining. Q18. What is superscalar? A18. Superscalar is new to the Pentium processor and is a microarchitecture design technique that allows multiple instructions to be executed simultaneously on chip. (An anology: superscalar is like adding another lane to a single lane highway; more cars (instructions) can go to the same place at the same time). Q19. What is branch prediction? A19. Branch prediction is new to the Pentium processor and is another performance improvement technique. Since software execution incurs substantial delays on branches, points in the software instruction stream require a branch to a new, non-contiguous location in system memory to fetch the next instruction. This Intel-developed technology will predict where the program is going next and can actually begin working on the next instruction before it is actually called upon. Q20. Why do you have separate data and instruction (code) caches? A20. Having the two separate caches allows the CPU to fetch data and code in parallel, doubling the available cache bandwidth. In addition, the Pentium processor has very large on-chip data paths, some as large as 256 bits. The data cache is dual access, meaning two instructions can read and write data in parallel. This complements the superscalar design (dual pipeline).