Please consider a donation to the Higher Intellect project. See https://preterhuman.net/donate.php or the Donate to Higher Intellect page for more info. |
Small Kernels Hit It Big (01/1994)
A microkernel is a tiny operating-system core that provides the foundation for modular, portable extensions. Every next-generation operating system will have one. However, there's plenty of disagreement about how to organize operating-system services relative to the microkernel. Questions include how to design device drivers to get the best performance while abstracting their functions from the hardware, whether to run nonkernel operations in kernel or user space, and whether to keep existing subsystem code (e.g., a legacy version of Unix) or to throw everything away and start from scratch. IBM, Microsoft, and Novell's Unix Systems Laboratories answer these questions differently; each company has strong opinions about how and why its approach will work best.
It was the Next computer's use of Mach that introduced many of us to the notion of a microkernel. In theory, its small privileged core, surrounded by user-mode services, would deliver unprecedented modularity and flexibility. In practice, that benefit was somewhat obscured by the monolithic BSD 4.3 operating-system server that Next wrapped around Mach. However, Mach did enable Next to supply message-passing and object-oriented services that manifest themselves to the end user as an elegant user interface with graphical support for network setup, system administration, and software development.
Then came Microsoft's Windows NT, which touted not only modularity but also portability as a key benefit of the microkernel approach. NT was built to run on Intel-, Mips-, and Alpha-based systems (and others to follow) configured with one or more processors. Because NT would have to run programs originally written for DOS, Windows, OS/2, and Posix-compliant systems. Microsoft exploited the modularity inherent in the microkernel approach by structuring NT so that it did not architecturally resemble any existing operating system. Instead, NT would support each layered operating system as a separate module or subsystem.
More recently microkernel architectures have been announced by Novell/USL, the Open Software Foundation, IBM, Apple, and others. One prime NT competitor in the microkernel arena is Carnegie Mellon University's Mach 3.0, which both IBM and OSF have undertaken to commercialize. (Next still uses Mach 2.5 as the basis of NextStep, but it is looking closely at Mach 3.0.) Another is Chorus 3.0 from Chorus Systems, which USL has chosen as the foundation of its Unix offering. Sun's SpringOS, an object-oriented successor to Solaris, will use a microkernel, and the Taligent Operating Environment will rely on the same microkernel that IBM is developing for its Workplace OS. Clearly, there's a trend away from monolithic systems and toward the small-kernel approach. That's no surprise to QNX Software Systems and Unisys, two companies that have for years offered successful microkernel-based operating systems. QNX Software's QNX serves the real-time market, and Unisys' CTOS is strong in branch banking. Both systems exploit the modularity enabled by a microkernel foundation with excellent results.
Fueling the current microkernel frenzy is the recent fragmentation of the operating-system market. With no one vendor a clear winner in the operating-system sweepstakes, each needs to be able to support the others’ applications. AT&T tried this tack a few years ago with Unix System V release 4.0, by including support for the Berkeley and Xenix extensions. But while SVR4 has done well enough, it hasn’t been the grand unification of Unix for which AT&T (now Novell’s USL) had hoped. On the other hand, Microsoft’s NT seems to have succeeded— at least in this respect— by being the first to unify multiple subsystems capable of running Win32, Win16, DOS, OS/2, and Posix applications. IBM is responding with a portable successor to OS/2, the Workplace OS. Its truly modular operating-system architecture, with plug-and-play components and multiple operating-system personalities, may advance expectations still further.
Defining the Microkernel
A microkernel implements essential core operating-system functions. It's a foundation for less-essential system services and applications. Exactly which system services are nonessential and capable of being relegated to the periphery is a matter of debate among competing microkernel implementers. In general, services that were traditionally integral parts of an operating system— file systems, windowing systems, and security services— are becoming peripheral modules that interact with the kernel and each other.
When I first learned about operating systems, the layered approach used by Unix and its variants was the state of the art in operating-system design. Groups of operating-system functions— the file system, IPC (interprocess communications), and I/O and device management— were divided into layers. Each layer could communicate only with the one directly above or below it. Applications and the operating system itself communicated requests and responses up and down the ladder.
While this structured approach often worked well in practice, today it's increasingly thought of as monolithic because the entire operating system is bound together in the hierarchy of layers. You can't easily rip out one layer and swap in another because the interfaces between layers are many and diffuse. Adding features, or changing existing features, requires an intimate knowledge of the operating system, a lot of time, some luck, and the willingness to accept bugs as a result. As it became clear that operating systems had to last a long time and be able to incorporate new features, the monolithic approach began to show cracks. The initial problems vendors encountered when SVR4 shipped in 1990 illustrate this point.
The microkernel approach replaces the vertical stratification of operating-system functions with a horizontal one. Components above the microkernel communicate directly with one another, although using messages that pass through the microkernel itself. The microkernel plays traffic cop. It validates messages, passes them between components, and grants access to hardware.
This arrangement makes microkernels well suited to distributed computing. When a microkernel receives a message from a process, it may handle it directly or pass the message to another process. Because the microkernel needn’t know whether the message comes from a local or remote process, the message-passing scheme offers an elegant foundation for RPCs (remote procedure calls). This flexibility comes at a price, however. Message passing isn’t nearly as fast as ordinary function calls, and its optimization is critical to the success of a microkernel-based operating system. For example, NT can, in some cases, replace message ports with higher-bandwidth shared-memory communications channels. While costly in terms of nonswappable kernel memory, this alternative can help make the message-passing model practical.
Portability, Extensibility, and Reliability
With all the processor-specific code isolated into the microkernel, changes needed to run on a new processor are fewer and group logically together. Since the processor market seems more likely to fragment with competing designs than to converge on a single architecture, running an operating system on more than one processor may be the only way to leverage buyers’ investment in hardware. Intel is still on top of the microprocessor hill, but IBM/Motorola/Apple, DEC, Mips, and Sparc International, among others, are making determined runs at its dominant position.
Extensibility is also a major goal of modern operating systems. While hardware can become obsolete in a few years, the useful life of most operating systems may be measured in decades. Whether the operating system is small like DOS or large like Unix, it will inevitably need to acquire features not in its design. For example, DOS now supports a disk-based file system, large hard disks, memory management, and most radically— Windows. Few, if any, of these extensions were envisioned when DOS 1.0 shipped.
Operating-system designers have learned their lesson and now build operating systems that make adding extensions manageable. There’s no alternative. With increasingly complex monolithic systems, it becomes difficult, if not impossible, to ensure reliability. The microkernel's limited set of well-defined interfaces enables orderly growth and evolution.
There’s also a need to subtract features. More users would flock to Unix or NT if these operating systems didn’t require 16 MB of memory and 70 MB or more of hard disk space. Microkernel does not necessarily mean small system. Layered services, such as file and windowing systems, will add bulk. Of course, not everyone needs C2 security or wants to do distributed computing. If important but market-specific features could be made optional, the base product would appeal to a wider variety of users. Martin McElroy, brand manager for Workplace OS at IBM’s Personal Systems Products division, says that IBM’s Mach implementation will eventually run the gamut from “palmtops to teraFLOPS.” The services riding on the microkernel can be customized to meet the needs of the platform and the market.
The microkernel approach can also help improve the overall quality of the computing environment. Systems like Unix, OSF/1, and NT require hundreds of thousands of lines of code and take years to mature. Programmers who write applications for these systems don’t have time to worry about undocumented APIs; they’ve got their hands full just learning about the hundreds of APIs that are documented. The learning curve for new operating-system calls is becoming so steep that no developer can reasonably expect to know and use them all.
The result is that no one can guarantee the correctness of code making use of several system-service APIs, and no one can guarantee even the correctness of the operating system itself. A small microkernel that offers a compact set of APIs (the OSF microkernel will have about 200, and the tiny QNX microkernel has just 14) improves the chances of producing quality code. This compact API is visible to the systems programmer only; the applications programmer must still wrestle with hundreds of calls. But it certainly enhances the value of microkernels such as IBM’s, which the company plans to license to OEMs for customized development.
What’s In and What’s Out?
As we have seen, the proper division of labor between the microkernel and its surrounding modules is a matter of debate. The general idea is to include only those features that absolutely need to run in supervisor mode and in privileged space. That typically means processor-dependent code (including support for multiple CPUs), some process management functions, interrupt management, and message-passing support.
Many microkernel designers include process scheduling, but IBM's implementation of Mach locates scheduling policy outside the microkernel, using the kernel only for process dispatch. IBM's approach separates policy from implementation, but it requires close collaboration between the external scheduler and the kernel-resident dispatcher.
Device drivers may be in-kernel, out-of-kernel, or somewhere in between. Some implementations (e.g., OSFs) locate device drivers in the microkernel. IBM and Chorus locate the device drivers outside of the microkernel but require that some driver code run in kernel space so that interrupts can be disabled and set. In NT, device drivers and other I/O functions run in kernel space but work with the kernel only to trap and pass interrupts.
IBM's Paul Giangarra, system architect for the Workplace OS, says that separating device drivers from the kernel enables dynamic configuration. But other operating systems (e.g., NetWare and OSF) achieve this effect without abstracting the devices from the kernel. While NT doesn’t permit dynamic configuration of device drivers, Lou Perazzoli, project leader for NT development, notes that its layered driver model was designed to support on-the-fly binding and unbinding of drivers. But the necessary support for this feature didn’t materialize in the first release of NT.
Dynamic configuration notwithstanding, there are other reasons to treat device drivers as user-mode processes. For example, a database might include its own device driver optimized for a particular style of disk access, but it can’t do this if drivers reside within the kernel. This approach also yields portability since device-driver functions can, in many cases, be abstracted away from the hardware.
Mach and the Workplace OS
IBM’s forthcoming Workplace OS uses a Mach 3.0 microkernel that IBM has extended (in cooperation with the OSF Research Institute) to support parallel-processing and real-time operations. This implementation counts five sets of features in its core design: IPC, virtual memory support, processes and threads, host and processor sets, and I/O and interrupt support. Giangarra refers to the Workplace OS microkernel as its hardware abstraction layer (not to be confused with NTs HAL, which is just the lowest slice of the NT microkernel). The file system, the scheduler, and network and security services appear in a layer above the microkernel. These are examples of what IBM calls personality neutral services, or PNSes, because they're available to any of the individual operating-system personalities layered above them.
A key distinction between the IBM PNS layer and NT’s own service managers is that IBM’s PNS layer runs in user space, while the bulk of NT's services run in kernel space. IBM’s approach aims to let OEMs add or replace system services freely; NT's system services are intended to remain in place.
Perhaps the best way to describe the relationship of the kernel to the nonkernel processes is that the kernel understands how the hardware works and makes the hardware operation transparent to the processes that set and enforce operating-system policy. In IBM's case, process and thread management is a kernel function. However, only the process dispatcher actually resides in the kernel. The scheduler, which sets policy by checking priorities and ordering thread dispatching, is an out-of-kernel function.
This is an important distinction. Dispatching a thread to run requires hardware access, so it is logically a kernel function. But which thread is dispatched, Giangarra says, is irrelevant to the kernel. So the out-of-kernel scheduler makes decisions about thread priority and queuing discipline.
The other microkernel implementations don’t relegate the scheduler to the periphery. Why would you want them to? In IBM's case, the company plans to license its microkernel to other vendors, who might need to swap the default scheduler for one that supports real-time scheduling or some specialized scheduling policy. NT, which embodies the notion of real-time priorities in its kernel-resident scheduler, does not currently expose these to the programmer. You cannot modify or replace the NT scheduler.
Memory management, like scheduling, is divided between the microkernel and a PNS. The kernel itself controls the paging hardware. The pager, operating outside the kernel, determines the page replacement strategy (i.e., it decides which pages to purge from memory to accommodate a page fetched from disk in response to a page fault). Like the scheduler, the pager is a replaceable component, IBM is providing a default pager to boot Workplace OS, but the primary paging mechanism will be integrated with the file system. The Workplace OS file system (like NT’s) unifies memory-mapped file I/O, caching, and virtual memory policies.
PNSes can include not only low-level file system and device-driver services but also higher-level networking and even database services, Giangarra believes that locating such application-oriented services close to the microkernel will improve their efficiency by reducing the number of function calls and enabling the service to integrate its own device drivers.
Mach and OSF/1
The OSF, whose OSF/1 1.3 will also incorporate Mach microkernel technology, includes virtually the same microkernel features as does IBM. The code for this version of OSF/1 was frozen in December 1993 and is due to be distributed to OSF licensees in the second quarter of 1994. IBM is a member of the OSF, and the two organizations have been exchanging microkernel technologies. However, OSF’s approach differs from IBM's in important ways. OSF/1 was reworked to be able to call Mach for basic system services. Then the entire OSF/1 server system was placed on top of Mach and run in user space. What IBM divides into separate PNSes and layered personalities, OSF lumps into a single structure.
Why the monolithic Unix server riding on top of the microkernel? OSF/1 is mature and proven code, and the OSF says it wasn’t feasible to start from scratch. The amount of code reuse between OSF/1 1.3 and the previous version of OSF/1 is over 90 percent. On the other hand, the OSF is also rewriting parts of the Mach kernel in C++, to be able to provide better support for object management.
The net result is that OSF/1 1.3 is less modular than Workplace OS. But by reusing a substantial part of OSF/1, the OSF can ship a more or less complete microkernel-based operating system to its members ahead of the expected debut of the Workplace OS in late 1994. Note that it is precisely this configuration— the OSF/1 server running on Mach— that IBM currently demonstrates as the Unix personality of its Workplace OS.
The QSF’s goal is to let the Mach- plus OSF/1- server combination run efficiently on massively parallel hardware systems. One of the active areas of study in the OSF Research Institute is to configure systems with dozens or hundreds of processors and to observe distributed operating-system behavior as the number of processors grows. The Mach microkernel will run on all processors, but the server— which provides file system, process management, and networking services— need run only on some.
According to Ira Goldstein, vice president of research and advanced development at the OSF Research Institute, future Mach-based versions of OSF/1 will be able to run the OSF/1 server system either in user space or kernel space, depending on the system administrator’s choice when configuring the system. Running the OSF/1 server in kernel space will improve performance, because procedure calls will replace message passing, and all server code will remain in memory. Running the server in user space makes it swappable, potentially freeing memory for user programs. Note that USL is planning the same sort of flexibility for its Chorus-based offering. Arthur Sabsevitz, chief scientist at USL, expects the same advantages that NetWare 4.0 developers currently enjoy. Services will be developed and tested in user space. Once debugged and deemed trustworthy, they can move to kernel space for best performance.
The OSF is still investigating the issue of where to locate device-driver support. Currently, drivers reside within the Mach microkernel. Goldstein says this approach should not preclude dynamic configuration of drivers. Since the OSF is working closely with IBM on microkernel issues, it will look at the IBM approach to device drivers when it receives the technology.
Is NT Really a Microkernel OS?
NT’s microkernel serves primarily to support a specific set of user environments on top of a portable base. Its concentration of machine-specific code in the microkernel makes NT relatively easy to port across diverse processors. NT is also extensible, but not in the same way IBM’s Workplace OS will be. Whereas IBM wants to license its microkernel separately, it is unlikely that Microsoft will attempt to unbundle NT’s microkernel. This is one reason why many observers now conclude that NT is not, in fact, a true microkernel in the same sense that Mach and Chorus are. These critics also note that NT does not rigorously exclude layered services from kernel space (although OSF/1 and Chorus/MIX aren’t religious on this point either) and that NT’s device drivers cooperate minimally with the kernel, preferring to interact directly with the underlying HAL.
Workplace OS applications talk to user-mode “environment subsystems” that are analogous to the Workplace OS’s personalities. Supporting these subsystems are the services provided by the NT executive, which runs in kernel space and does not swap to disk. Executive components include the object manager, the security monitor, the process manager, and the virtual memory manager. The executive, in turn, relies on lower-level services that the NT kernel (or microkernel, if you will) provides. Its services include scheduling threads (the basic level of execution), handling interrupts and exceptions, synchronizing multiple processors, and recovering from system crashes. The kernel runs in privileged mode and is never paged out of memory. It can only be preempted to handle interrupts. The kernel rides on the HAL, which concentrates most hardware-specific code into a single location.
Lou Perazzoli says that NT’s design was driven by strong biases toward performance and networkability, as well as by the requirement to support a specific set of layered personalities. The resulting separation of function between kernel and nonkernel modules reflects these goals. For example, data transfers to the file system and across the network run faster in kernel space, so NT provides in-kernel buffering for the small (16 to 32 KB) reads and writes that typify client/server and distributed applications. Locating these I/O functions in the kernel may violate the academic purity of the NT microkernel, says Perazzoli, but it supports NT's design goals. Decisions regarding mechanism and policy were motivated by similarly pragmatic concerns. For example, Win32 support did not require a traditional process hierarchy, but other environment subsystems (e.g., OS/2 and Posix) did. The NT executive provides a set of process management services sufficient for the current set of NT personalities, and potentially for others that are similar but not yet supported (e.g. VMS). Radically different alternatives that would require modifying the executive are, however, beyond the scope of NT users.
Because executive components such as the process manager and the virtual memory manager run in kernel space (although they're not technically part of the kernel), some critics say NT is more monolithic than Microsoft likes to admit. However, while these executive-level resource managers do reside in kernel space, they nonetheless function as peers and communicate by passing messages just as the user-level subsystems do.
The NT model is object-based, even though not completely object-oriented. System resources such as processes, threads, and files are allocated and managed as objects; each object type exposes a set of attributes and methods.
User-visible resources including windows, menus, and files are also built on object foundation. Because of their status as objects, these resources can be named, protected, and shared. NT distinguishes between kernel- and executive-level objects. Kernel objects have threads, events, interrupts, and queues. Executive objects, which executive resource managers create and manipulate, package the more basic kernel objects— adding, for example, names and security descriptors— and, in turn, pass them to user-mode subsystems.
Interrupts and Device Drivers in NT
Like other microkernels, the NT kernel also handles interrupts and context switching. An interrupt is handled within the kernel and then dispatched to an ISR (interrupt service routine). The kernel uses an interrupt object to associate an interrupt level with an ISR; this arrangement conceptually separates the device drivers from the interrupt hardware. It also leads to a distinction between NT and most other microkernels in terms of the I/O subsystem. In Mach and in Chorus, device drivers reside above the kernel and access the hardware entirely through its services. In NT, the I/O manager, which includes file systems, device drivers, and networking support, generally bypasses the kernel and works directly with the HAL underneath the kernel. Kernel support is still required for interrupt processing, but in other respects, drivers work autonomously.
Perazzoli says there are good reasons to design the device-driver interface this way. For example, IBM found that it could not accomplish all device-driver functions out-of-kernel and had to find a way to let parts of drivers run in kernel space. NT establishes an object-based link to device drivers for interrupt handling and dispatch and then lets the drivers work directly with their associated devices through the HAL. Nothing prevents applications vendors from writing specialized device drivers, Perazzoti notes, but these must be distinct from the application and must cooperate with the NT I/O subsystem. Is that a limitation? Perhaps not, in view of the impressive I/O performance NT has shown in benchmark tests.
AT&T and the Chorus Nucleus
The Chorus microkernel resembles IBM's and OSF's implementations of Mach in many respects. Like Mach, it takes a minimalist approach. Chorus includes support for distributed processors, multiple distributed operating-system servers (much like the Mach-OSF/1 combination), memory management, and interrupt handling. It can also communicate transparently with other instances of the Chorus microkernel, making it a good foundation for highly distributed systems.
There are several implementations of the Chorus nucleus micro kernel. Chorus/MiX, the version of the Chorus operating system with Unix interfaces, includes separate versions for SVR3.2 and SVR4 compatibility. USL will offer the Chorus/MiX V.4 as a microkernel implementation of SVR4. USL and Chorus Systems plan to work together to develop Chorus/MiX V.4 as the future direction of Unix. The figure "The Chorus/MiX Structure" shows how Chorus/MiX V.4 is configured on top of the nucleus microkernel. Chorus also supports an SCO-compatible implementation of Chorus/MiX for use specifically on PCs.
The Chorus nucleus does not include device drivers in the kernel. As with IBM's approach, device drivers work through the kernel to access hardware. According to Michel Gien, general manager and director of R&D for Chorus, this enables a higher-level component called the device manager to keep track of drivers dispersed throughout distributed systems.
On the Drawing Board
Sun, Apple, and Taligent are also moving toward a microkernel-based operating-system architecture for their respective platforms. None of these companies was willing to discuss its plans in any great detail, but all acknowledge that microkernel technology is a crucial ingredient of operating-system design.
Sun's SpringOS, which is still in the design and implementation phase, is incorporating a microkernel and making use of object extensions. While details are sketchy, it appears that SpringOS will use a large amount of existing Solaris code, much in the same way that OSF/1 uses the existing OSF/1 server. Sun has not yet announced support for any of the independent microkernels, and it may be developing its own. Still less is known of Apple’s and Taligent's efforts. Although Apple will have the rights to use the Taligent Operating Environment, the company is also rumored to be developing a microkernel for the Mac System 7.
Microkernels Here and Now
QNX and CTOS are two mature microkernel operating systems that have been shipping for years. The 8-KB QNX microkernel handles only process scheduling and dispatch, IPC, interrupt handling, and low-level network services. It exports just 14 kernel calls. The compact kernel can fit entirely in the internal cache of some processors, such as the Intel 486.
A minimal QNX system can be built by adding a process manager, which creates and manages processes and process memory. To make a QNX system usable outside of an embedded or diskless system, add a file system and device manager. These managers run outside of kernel space, so the kernel remains small QNX Software claims that this message-passing system has performance at least comparable to that of other traditional operating systems.
CTOS, introduced in 1980, was written for Convergent Technologies workstations, a family of Intel-based machines built to run in “cluster networks” linked by ordinary telephone wire. Now sold by Unisys, these CTOS-based machines were demonstrating the benefits of message-based distributed computing long before the term became fashionable. The tiny 4-KB CTOS microkernel concerns itself only with process scheduling and dispatch and message-based IPC. All other system services communicate with the microkernel and with each other through well-defined message interfaces.
Networking is integral to CTOS workstations and effectively transparent to applications, which do not need to know whether a request for service will be handled locally or remotely. The same message-based IPC transmits the request in either case. Building modular system services to service such requests is straightforward. One practical result has been that CTOS applications running unattended in remote branch offices are easily controlled by central management tools.
The Microkernel Advantage
If you're charting the enterprise computing strategy for your organization, you've got to be excited about the trend toward microkernel-based operating systems. Increasingly, you will be able to match kernel-independent networking, security, database, and other services to your available hardware, and customize systems for individual user's needs.
Of course, end users don't care much about how operating systems work, they just want to run the applications that enable them to do their jobs. Will microkernels influence end-user computing? You bet. By abstracting application-level interfaces away from underlying operating systems, microkernels help ensure that an investment in applications will last for years to come, even as operating systems and processors come and go.
The full benefits of microkernels won't be apparent for years. It will take that long to field the operating systems and for useful add-on modules to appear. Some benefits (e.g., quality and robustness) may never be directly apparent to users. However, it's clear that microkernels are here to stay.
-Peter D. Varhol