In the latest issue of Custom PC magazine, out now, Stuart Andrews recalls how Intel built the CPU of the future in the late 1990s.
Look at the Pentium Pro as Intel’s ugly duckling: the CPU that launched to little serious acclaim and suffered a whole lot of criticism in its early years, but slowly transformed into something incredible. The first CPU based on Intel’s P6 architecture, it was arguably the biggest step forward in Intel’s architecture since the original 8086, with Intel’s engineers making big, risky bets on where mainstream computing was headed, and most of those bets paying off in the long term.
When it appeared in November 1995, many saw it as a failure. Nearly 27 years later, it’s still a huge influence on Intel’s CPU design today. Pentium Pro was launched at a weird time for Intel. On one level, it dominated the hardware side of personal computing. The 486 had put Intel far ahead of any rival, and the Pentium, launched in 1993, had extended Intel’s lead in terms of both sales and performance.
Yet, outside the business and consumer PC market, Intel faced serious competition, with fast, efficient RISC processors surging ahead in the server and workstation markets. Intel might not have had much to fear (yet) from AMD and Cyrix, but it had plenty to worry about from Digital, MIPS and the Power PC alliance comprising IBM, Motorola and Apple. Intel ruled in the PC market, but RISC processors were squeezing the company out of datacentres and high-performance computing sectors.
The obvious move for Intel was an evolution of the Pentium architecture and its powerful superscalar design. Where previous Intel CPUs had worked on one instruction per clock cycle, the Pentium had two data pipelines that could operate on two instructions simultaneously. Admittedly, only one pipeline could handle all instructions, with the second limited to a subset of simple, frequently used instructions, but it was still a speedy chip.
Intel could have optimised these pipelines further, or simply added more, but the P6 team running under Fred Pollack and Bob Colwell had other ideas. It had been working quietly since 1990 on a more revolutionary idea.
RISC meets CISC and goes OoOE
At the time, RISC vs CISC was the hottest debate in computing. Should you go for a complex instruction set (the CIS in CISC) and have a lot of instructions covering most operations built into your hardware, or a more flexible reduced instruction set (you guessed it) with fewer, more flexible instructions, but the ability to decode and process those instructions at a higher speed?
With the 386, 486 and Pentium, Intel had become the champion of CISC. It was core to Intel’s technology, and where the expertise of most of its engineers lay. However, Pollack and Colwell’s team was actively researching RISC approaches, and rethinking ways in which you could process instructions more efficiently.
Crucially, the team came up with a type of design that Intel hadn’t really done before: it designed P6 as an architecture that could evolve over the long term. In a 2009 interview, Colwell talks of how the team built it ‘with an eye towards a long-term contribution to the company, as opposed to “let’s do a chip and after that let’s do another chip.”’ P6 was an architecture for the future of computing, not just the next component release.
How? Well, firstly, the P6 architecture was designed for out-of-order execution (OoOE). Previous Intel processors were designed to handle instructions in the order that they were received, as defined by the programmer and the compiler program that compiled their finished code. With its dual pipelines and clever instruction caches, the Pentium architecture was pretty smart about how it did this, but the P6 architecture took it to another level.
With OoOE, the CPU itself would look at the instructions coming through the pipeline and make intelligent decisions about the instructions on which to move forward, which instructions required data from other instructions, and which instructions you could advance and deal with during otherwise unused clock cycles.
It could then allocate instructions to the pipeline accordingly. Thanks to out-of-order execution, the P6 architecture wasn’t just faster, but smarter, moving instructions from its 8KB instruction cache through its three decoders to the execution units with smooth efficiency.
Secondly, the P6 didn’t fight against RISC – it embraced it. The instruction decoders took x86 CISC instructions and broke them down into RISC micro-operations, which could then be processed at higher speeds by the six execution units. This, combined with the out-of-order execution and an optimised 14-stage pipeline, turned the P6 into a number-crunching monster.
Floating point performance, in particular, was far ahead of the improved floating point performance of the Pentium design – a critical point when 3D rendering applications and CAD were spicing up the workstation market. Some of us even saw the P6’s potential for 3D gaming.
Thirdly, P6 was the first Intel architecture to integrate the Level 2 cache on the CPU itself, rather than on the motherboard. The 486 and Pentium processors had between 8KB and 32KB of L1 cache on-board the die to store the data the CPU was most likely to use next, but the next level of cache, the larger and slower L2 cache, was held separately on the motherboard.
This meant that when the CPU needed data from the cache and couldn’t find it in the on-board L1 cache, it had to talk to the motherboard’s L2 cache through a 32-bit data bus to find out if it was there. Say a big hello to latency.
The Pentium Pro, however, placed between 256KB and 1MB of L2 cache on a separate die held within the processor module itself, where it connected to the main CPU die through a dedicated, full-speed 64-bit bus. Having more high-speed cache in such close proximity to the CPU dramatically improved performance, to the extent where Intel claimed that 256KB of on-chip L2 cache was as good as having 2MB on the motherboard.
Confusion at launch
These three factors in combination should have made for a killer CPU, but at first the Pentium Pro came across as a damp squib. Benchmarks put it ahead of the RISC competition by some yardsticks, but not others.
Floating point performance was good, and better than the existing Pentium processors, but it still wasn’t quite in the same league as the FPU performance of the fastest Digital and MIPS CPUs. More seriously, the P6’s pipelines had difficulty juggling 16-bit and 32-bit code, at a time when Windows itself and many applications used a mix of the two. This meant that the Pentium Pro only made sense to people who used Windows NT rather than the tried and tested Windows 3.1 or the new and shiny Windows 95.
It also meant that performance in many mainstream packages wasn’t necessarily going to be faster than on a Pentium CPU running at the same clock speed.
When journalists got their hands on the Pentium Pro towards the end of 1995, they found it a curious mixture of amazing performance in some applications and underwhelming speeds in others. Tests by the US magazine PC World concluded that the 200MHz Pentium Pro was just 8 per cent faster than the 200MHz Pentium in benchmarks, despite costing around $200 US more.
To add to the embarrassment, the Pentium Pro had its share of teething troubles. It was slow at writing to video memory, hobbling performance in games. In id Software’s Quake, for example, you could expect sub-Pentium frame rates unless you set it to write to system memory instead or installed a third-party utility, FASTVID. Doing so could double your frame rates at higher resolutions.
More seriously, a bug was found in the floating point unit, affecting the results when a large negative floating point number was stored into memory in an integer format.
The bug was tricky to repeat, and of little consequence in most cases. Windows and Microsoft Office were unaffected, while id Software’s John Carmack noted that the bug only manifested when storing 80-bit values ‘which almost nobody ever uses’. All the same, the bug helped to create the impression that the Pentium Pro had been rushed out before it was ready.
As a desktop CPU, sales were relatively slow. The Pentium Pro was outsold by the standard Pentium throughout its lifespan, particularly as Intel continued to develop the Pentium line through the Pentium MMX, reaching speeds of 233MHz and later even 300MHz in the mobile world. Yet Pentium Pro did meet Intel’s long-term aims of building market share in the server and workstation markets. By 1997, 97 per cent of servers under $10,000 had Intel CPUs, and the same applied to 50 per cent of all workstations. Not bad at all!
The P6 Legacy
All the same, success in the server market isn’t what makes the Pentium Pro a landmark processor. What does is its lasting legacy. In May 1997, Intel combined the P6 architecture of the Pentium Pro with the MMX instructions of the Pentium MMX, creating the Pentium II. In some respects, it was actually a downgrade from the Pentium Pro, running the L2 cache at half the CPU’s clock speed, but Intel improved 16-bit performance and doubled the L1 cache to make the Pentium II the fastest x86 CPU of its time.
The Pentium III, which came out two years later, still used the P6 architecture, but added the new Streaming SIMD Extensions to accelerate floating point, 3D and multimedia operations. Pentium III should have been the P6 architecture’s swansong, as Intel moved to the NetBurst architecture for 2000’s Pentium 4 and 2005’s Pentium D.
But then something weird happened. Intel struggled to scale NetBurst’s performance, with the promised 10GHz clock speeds wrecked by issues with power and heat. Looking for another way forward, Intel returned to P6 with its Pentium M mobile CPUs, and Pentium M became the basis of the Core microarchitecture Intel still uses today.
That architecture has changed a lot over the years, adding new stages to the pipeline, along with support for more cores, more instructions and more cache. Yet at the centre you’ll still find the same ideas and fundamental principles introduced with the Pentium Pro. Instead of joining doomed architectures, such as the 64-bit Itanium or the 32-bit iAPX 432, Pentium Pro became the ultimate survivor, outlasting the 486, the Pentium, NetBurst and, well, just about everything else.
Get Custom PC #230 NOW!
And if you’d like a handy digital version of the magazine, you can also download issue 230 for free in PDF format.