Evolution of the PowerPC/SCI Prototype Interface

Glen D. Stone and Haakon Bryhni
Apple Computer, Inc.
One Infinite Loop
Cupertino, CA 95014

Abstract

IEEE Std 1596-1992 Scalable Coherent Interface (SCI) provides bus-like services on scalable interconnect topologies. As such, the SCI protocols are optimized for point-to-point connections, and present a number of design challenges when bridging SCI to a conventional multidrop bus protocol. We have implemented an interface bridging the Motorola 68040 processor bus to SCI using a non-coherent shared memory model. In the process of implementing the 68040 to SCI bridge, and in our current design of a PowerPC 601 to SCI bridge, we have identified a number of interface incompatibilities. These incompatibilities were described in a paper presented at the last SCIzzL conference. The purpose of this paper is to update the reader on the progress of our PowerPC-SCI nterface as some implementation details have changed. In addition, we document a proposal for SCI sub-block transfers that are used in the PowerPC 601 to SCI bridge.

1. Introduction

When using SCI as an interconnect between conventional processor buses, a number of implementation issues arise that are intimately related to the underlying system architecture. For the purposes of this paper, we assume a distributed computing environment using a non-coherent shared memory model. SCI is employed to directly map processor bus transactions between nodes; thus movement of data between systems is accomplished by simple processor load and store operations.

Section 2 discusses the problem of transferring data blocks using SCI where the data block size does not directly map to an SCI standard packet. We propose to use DMA hint bit information to better describe the data block contained in the SCI packet.

Section 3 describes the feature set that is to be implemented in our PowerPC 601 - SCI interface. The feature set has been somewhat reduced as compared to our previous paper [6].

2. SCI Sub-Block Transfers

The current SCI standard provides efficient support for selected-byte transactions (ranges of bytes within 16) as well as aligned burst transfers (64 and 256-byte transfers). However, the lack of 16-byte aligned longer-than 16-byte transfers has become a concern. The concern is evident for interface designers using 32-byte transfers based on their processors cache line size, and for efficient transport of 48-byte ATM</a> payload for SCI/ATM interconnections.

Given this deficiency, there have been various proposals for redefining address bits from the address offset field in an SCI packet. The redefined bits are used to embed specialized op codes to better describe the data within the packet. Such changes are likely to be more than a short term "hack", and these capabilities are likely to be incorporated in a significant number of future designs. For that reason, we document our method of embedding op codes and its incorporation in our SCI interface.

We choose to redefine the DMA hint bits in the packet header. The redefinition requires that the DMA hint bits are not used as specified in the original SCI specification. The current definition of DMA hints are somewhat confusing, and we believe that the fewer DMA hints we propose are sufficient.

The goal is to make the 64-byte and 256-byte transactions have a sub-block granularity of 16 and 64 bytes respectively, leaving 4 bits reserved for DMA transfer hints. The DMA transfer hints are left undefined, until bridge designs provide a better indication of how they should be used. The 16-byte address granularity benefits the following:

CPU- it is possible to read and write 32-byte data blocks, i.e. the cache line transfer size of the PowerPC 601 processor.

DMA - simplifies transfers involving unaligned end points (selected bytes or sub-blocks for the first and final, 64 or 256 for all others).

ATM - for transferring individual ATM packets into memory, fewer SCI transactions are required.

The next section details the redefinition of the DMA hint bits to support sub-block transfers.

2.1. Transaction encoding

The current SCI address encoding is shown in figure 2.1.

TR138.gif1.gif

Figure 2.1 Current command encoding

The least-significant bits of the address (which currently represents hint bits) are re-encoded to support the selected sub-block packet granularity while maintaining compatibility with designs where all hint bits are 0, i.e. a standard SCI packet without DMA hints, see figure 2.2. A similar encoding scheme is proposed for the 256-byte transfers, illustrated in figure 2.3.

The e field specifies the functionality of the off (res in 256 byte packet format) and end fields. For transfers where e is 0, the off/res and end fields are concatenated to form a 4-bit DMA hint field and the transfer refers to a full block transfer. When the e field is 1, the off and end fields specify sub-block transfers.

TR138.gif2.gif

Figure 2.2 Proposed command encoding for 64-byte transfers.

TR138.gif3.gif

Figure 2.3 Proposed command encoding for 256-byte transfers.

A sub-block transfer starting address is computed by concatenating addressOffset and off to generate a 16-byte aligned address. The end field specifies the number of the last accessed sub-block, i.e. end = 2 indicates that sub-block2 is the last sub-block to be accessed. The starting address for a 256-byte transaction is specified by the addressOffset value which is already 64-byte aligned (as defined by SCI). No off field is required and the field is defined as reserved for future use. For 64-byte transfers, end < off corresponds to no-data transfer; for 256-byte transfers, end < off is valid and the critical sub-block is transferred first.

2.2. Transaction Examples

Consider all possible transfers using nread64. The possible data transfers of 16-byte sub-blocks are detailed in figure 2.4. In all cases, addressOffset is concatenated with off to form Start and Start pecifies the first sub-block that is transferred, and end specifies the last sub-block which is transferred (unless end <off

, in which case no data is transferred).

TR138.gif4.gif

Figure 2.4 Summary of 64-byte transfers

The 64-byte sub-block transfers within the 256-byte-packet are slightly different, since off is ignored and wrapping is allowed. (SCI allows critical sub-block first in 256-byte transfers). In all cases, addressOff specifies the first sub-block which is transferred, and end specifies the last sub-block transferred. Note: the transfer wraps at the end of the block.

3. The PowerPC 601 to SCI Interface

This section is an update of section 4 of our previous paper [6]. Some original parts of section 4 are included for background.

The PowerPC 601 is a 32 bit processor with separate 32 bit address and 64 bit data buses. The PowerPC 601 to SCI interface maps remote memory transactions generated by the PowerPC 601 to appropriate SCI transactions. The 601 generates 1, 2, 4, and 8 byte read/write transactions as well as 32 byte read/write bursts. The SCI transactions generated are readsb16 / writesb16 and nread64 / nwrite64 . The 601 burst transactions are mapped to non-coherent SCI transactions using the packet encoding described in section 2.1.

TR138.gif5.gif

Figure 2.5 Summary of 256-byte transfers

The 601 burst reads require the critical hexlet</a> to be delivered first. Critical hexlet ordering requires the four bus beats to return octlets ordered as either 0, 1, 2, 3 or 2, 3, 0, 1. The SCI interface will reorder the data to support critical hexlet first. Future members of the 60X series will require support for critical octlet order as well.

The PowerPC 601 SCI interface will accept the following SCI commands: readsb16 , writesb16 , nread64 , nwrite64 , and dmove64. A limitation of our PowerPC bus allows support of only one outstanding transaction. The interface also includes a DMA controller, section 3.2, and an SCI addressable interrupt register, section 3.3.

3.1. Address Translation

The PowerPC 601 32 bit addresses are translated to full 64 bit SCI addresses through a configurable address translation table. The translation table has 256 entries, which implies a maximum of 256 simultaneously addressable nodes if each entry is for a separate node. Multiple entries can be configured to address the same node at different offsets within the node's address space. Figure 3.1 illustrates the address translation mechanism. Addresses with the high order four bits equal to "E" are mapped to remote memories. There is no address translation/protection on incoming SCI transactions. The lower 32 bits of incoming SCI addresses are mapped directly to the local 601

32 bit address.

3. 2. DMA

The DMA controller for the PowerPC 601 SCI interface uses a set of CSRs to describe the transfer. The DMA engine transfer is unidirectional; data is read from the local system memory and written to a remote SCI node. A key implementation objective is to provide reasonable functionality using a minimum of silicon real estate, and minimizing the design and validation time as well. The DMA architecture supports a transfer size range of 64 bytes to 4 megabytes in aligned 64 byte increments.

To initiate a DMA transfer requires the DMA requester to load the DMA information into the CSRs. The last CSR written signals the DMA controller to start the DMA operation. While the DMA engine is executing the transfer, a CSR can be queried that indicates the number of 64 byte blocks left to be transferred. The DMA engine will generate an interrupt when the transfer is completed.

There are 5 CSRs required for the DMA engine. Below is a description of each register:

The LocalStartAddress holds the starting address of local memory where data is to be read from.

TR138.gif6.gif

Figure 3.1 32 to 64 bit address translation of the PowerPC - SCI interface.

The RemoteStartAddress holds the start address of the remote location for data to be written to. The 32 bit address will be translated using the previously presented 32 bit to 64 bit address translation mechanism. Thus, from a software point of view, the DMA controller and local processor operate within the same 32 bit physical address space.

The Size holds the number of 64 byte blocks to transfer. Valid sizes are 1 through 65535, a 0 indicates transfer complete.

The Pace holds a 4 bit number that indicates a cycle count. The DMA engine will decrement the count to zero between the 64 byte transfers, thus pacing the transfers. Pacing the DMA engine can be a desirable feature if more important SCI traffic is required concurrently with an SCI DMA transfer.

The GoStop is the control for starting and stopping the DMA transfer.

3.3. Interrupts

The PowerPC 601 SCI interface supports four interrupts from SCI. An SCI node can send an interrupt to another SCI node by sending a write packet to the receiving node's interrupt CSR. When any of the four bit positions of the interrupt CSR are written, the interface will generate an interrupt. The interface supports masking of interrupts

through a corresponding 4 bit masking CSR.

3.4. Other Interface Features

The PowerPC 601 SCI interface allows remote SCI accesses of the internal interface

CSRs. A general purpose 32 bit CSR is included to help facilitate intra-node communication.

The interface contains extensive error logging facilities. The error logging can be used for debugging the interface and monitoring/logging errors of an operational interface.

4. Summary

Section 2 presented a new interpretation of the DMA hint bits for supporting sub-block transfers. The proposed standard efficiently supports 32 byte cache line transfers, 48-byte ATM packet transfers, as well as other sub-block combinations. The encoding is compatible with the SCI standard when DMA hints are not used as specified.

The revised PowerPC 601 SCI interface presented in section 3 is in the process of being implemented.

5. Acknowledgments

The sub-block transfer encoding has been a collaborative effort between SCIzzL, Dolphin Interconnect Technology AS, Telenor Research and Apple Computer, Inc.

The RD24 group at CERN, Switzerland provided valuable input for our 68040 and PowerPC to SCI interface designs. David James is our SCI technical lead.

Cbus and NodeChip are trademarks of Dolphin Interconnect Technology AS. PowerPC is a trademark of International Business Machines Corporation. Macintosh Quadra is a trademark of Apple Computer, Inc.

References

[1] IEEE Std 1596-1992: "IEEE Standard for Scalable Coherent Interface (SCI)," August, 1993.

[2] Dolphin SCI, "Cbus Specification," Dolphin SCI Technology AS, August 1992.

[3] Dolphin SCI, "NodeChip Functional Specification," Dolphin SCI Technology AS, August 1992.

[4] Motorola "MC68040 User's Manual," Motorola, Inc. , 1989.

[5] Motorola "PowerPC 601 RISC Microprocessor User's Manual," Motorola, Inc. , 1993.

[6] Stone,G. and North,D., "Implementation Issues of Bridging SCI to a Conventional Processor Bus," Proceedings , 1st Int. Workshop on SCI-based High-Performance, Low-Cost Computing. , Santa Clara, August 17-18, 1994.

See Also