The Indispensable PC Hardware Book Your Hardware Questions Answered THIRD EDITION
Hans-Peter Messmer
UNIVERSITAT JAUME...
254 downloads
1880 Views
30MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Indispensable PC Hardware Book Your Hardware Questions Answered THIRD EDITION
Hans-Peter Messmer
UNIVERSITAT JAUME BIBLIOTECA
I
(iti+
ADDISON-WESLEY
.
6
T p.
,,,
I;‘.. .2 ;.i’”~
P
‘?
‘_,I
Harlow, England l Reading, Massachusetts l Menlo Park, California l New York 6 ,._ -1. Don Mills, Ontario l Amsterdam l Bonn l Sydney l Singapore 0 L,‘, Tokyo l Madrid l San Juan l Milan l Mexico City l Seoul l Taipei C’O , ,:;t;;r
?+
,, *
I ?’
I
Part 1 Basics
This chapter outlines the basic components of a personal computer and various related peripherals as an introduction to the PC world. Though this chapter is intended for beginners, advanced users would also be better prepared for the later and more technically demanding parts of the book.
1 Main Components 1.1 The Computer and Peripherals Personal computer (PC), by definition, means that users actually work with their own apersonaln computer. This usually means IBM-compatible computers using the DOS, OS/2 or Windows (NT) operating system. Mainframe users may wonder what the difference is between a PC and a terminal: after all, a terminal also has a monitor, a keyboard and a small case like the PC, and looks much the same as that shown in Figure 1.1. Where there is a difference is that the PC contains a small but complete computer, with a processor (hidden behind the names 8086/SOSS, 80286 or i486, for example) and a floppy disk drive. This computer carries out data processing on its own: that is, it can process files, do mathematical calculations, and much more besides. On the other hand, a terminal only establishes a connection to the actual computer (the mainframe). The terminal can’t carry out data processing on its own, being more a monitor with poor input and output capabilities that can be located up to a few kilometres away from the actual computer. That a small PC is less powerful than a mainframe occupying a whole building seems obvious (although this has changed with the introduction of the Pentium), but that is only true today. One of the first computers (called ENIAC, developed between 1943 and 1946, which worked with tubes instead of transistors) occupied a large building, and consumed so much electricity that the whole data processing institute could be heated by the dissipated power! Nevertheless, ENIAC was far less powerful than today’s PCs. Because PCs have to serve only one user, while mainframes are usually connected to more than 100 users (who are logged in to the mainframe), the impact of the lack of data processing performance in the PC is thus reduced, especially when using powerful Intel processors. Another feature of PCs (or microcomputers in general) is their excellent graphics capabilities, which are a necessary prerequisite for user-friendly and graphics-oriented programs like Microsoft’s Windows. In this respect, the PC is superior to its of a fixed size called pages. In the case of the i386, a page size of 4 kbytes is defined (this size is fixed by processor hardware and cannot be altered). Thus the complete linear address space consists of one million pages. In most cases, only a few of them are occupied by data. The occupied pages are either in memory or swapped to the hard disk. Whether the i386 actually uses this paging mechanism is determined by the PG bit in the CRU register. If the PG bit is set then paging is carried out; otherwise it is not. Because the operating system must manage and eventually swap or reload the pages, it is not sufficient to simply set the PC bit. The i386 hardware only supports paging analogous to the protection and task switch mechanisms. The operating system must intercept the paging exceptions and service them accordingly, look for the swapped pages, reload them, etc. Thus a double address mapping is carried out in the i386. The segment and offset (virtual address) of a memory object are combined to form a linear address in the linear 4Gbytes address space corresponding to one million pages. Then, these one million pages corresponding to the linear address are converted into a physical address, or an exception >, and the associated interrupt Oeh. The three AVAIL-bits are available to the operating system for page management. All other bits are reserved. In the i386, segmenting and paging produce duplicate address conversions. Segmenting produces the base address of the segment from the virtual address in the segment:offset format with the help of the descriptor tables and the segment descriptors and, with the addition of the
123
Logical Memory Addressing and Memory Access
offset, forms the linear address in the large linear address space. Paging converts this linear address into a physical address of the usually much smaller physical address space or to a , exception. This procedure is shown in Figure 3.23. In most cases, the i386 can execute this conversion without a time delay, due to the segment descriptor cache register and the translation lookaside buffer.
15
031
Selector
\
0
Offset
II
Logical (Virtual) Address
I
Descriptor Table
+ Linear Address
I (5)
.r
“;
Physical Address
3.8.3 Paging Exceptions In addition to the previously mentioned exceptions, paging can lead to a further exception,
namely: - Page fault (exception 14): if, during the conversion of a linear to a physical address, the paging unit determines that the required page table or the page itself is externally stored, or if the task that wishes to read the data in a page is only running at the user level but is, however, identified as being at the supervisor level, then the i386 issues an interrupt 14 (Oeh). The operating system can load the respective page or page table into memory, or can register an access error.
3.8.4 Test Registers TR6 and TR7 for Checking the TLB Because the two test registers TR6 and TR7 (Figure 2.5) are only implemented for testing the TL-B, I would like to discuss them here in connection with the TLB itself. If you are a little unsure about cache memory, then at this stage you should refer to Chapter 9.
124
Chapter 3
The TLB is implemented as a 4-way set-associative memory, and uses a random replacement strategy. It is an integrated part of the i386 chip, and does not require any external SRAM memory. Like other cache memories, each entry in the set O-3 consists of the 24-bit tag and the actual 12-bit cache data. Each set comprises of eight entries, therefore the TLB can accept 32 entries, giving a total size equivalent to 144 bytes. 32 entries for each 4 kbyte page mean that the TLB manages 128 kbytes of memory. Each 24-bit tag stores the 20 most significant bits of the linear address (the DIR and page part), a validity bit and three attribute bits. The data entry contains the 12 most significant value bits of the associated physical address. If the physical address of the page format is already stored in the TLB when a page table access occurs, the i386 uses it internally, to produce physical 32-bit addresses. Otherwise, the processor has to first read the page table entry. Intel specify a 98% TLB hit rate, that is, 98% of all memory accesses for code and data refer to a page whose frame address is already contained in the TLB. Stated differently, in 98% of all cases paging does not slow down code and data accesses. The i386 has two test registers for testing the translation lookaside buffers, these being the test instruction register TR6 and the test data register TR7. Figure 3.24 shows their structure.
31
1s
15
12 11
109
8
7
6
5
‘I
3
2
10
TR7 TR6
Figure 3.24: Structure of ihe test registers TR6 nnd TR7.
To test the TLB, you must first disable paging, that is, the i386 must not perform any paging. The TLB test is sub-divided into two procedures: writing entries to the TLB, and reading TLB entries (performing a TLB lookup). The following text explains, in more detail, the test register entries. The C-bit in the test instruction register indicates whether a write to the TLB (C = 0) or TLB lookup (C = 1) will be executed. The hear address represents the TLB tag field. A TLB entry is attached to these linear addresses during a write to the TLB, and the contents of TR6 and TR7 are transferred to the attached entry. During a TLB lookup, the test logic checks whether these linear addresses and the attribute bits agree with a TLB entry. If this is the case, then the remaining fields of TR6 and the test register TR7 are loaded from the confirmed entry into the TLB. During a write to the TLB, the physical address is transferred into the TLB entry that corresponds to the linear address. During a TLB lookup, the test logic loads this TR7 field with the value stored in the TLB. The meaning of PL varies depending on C. If a TLB entry is written (C = 0) and if PL = 1, then the TLB associated block to be written is determined by REP. If, however, PL is reset (PL = O), an internal pointer in the paging unit selects the TLB block to be written. For a TLB lookrrp (C = I), the PI, value specifies whether the lookup has led to a hit (PL = 1) or not (PL = 0).
Logical Memory Addressing and Memory Access
125
V is the TLB entry validity bit. A V-bit that is set shows that an entry is valid; a V-bit that has been reset shows that the entry is invalid. The remaining TR6 register bits correspond to the attribute bits of a page table entry. D, D represents the dirty bit, U, u the user/supervisor bit and W, w the write protection bit for the (or from the) corresponding TLB entry. The three bits are given in normal and complementary form (Table 3.7). In this way, for table lookups for example, it is possible to instruct the test logic to ignore the value of the bit during a lookup (if both are set, the logic will always agree). If both the attribute bit and its complement are set (= l), then the logic will always produce a hit for the corresponding attribute. In other words, the attribute value in this case is insignificant. D.&W
-__ DUW
Effects on TLB lookup
Value in TLB after write
0
0
always miss for D. U, W
0 1
1
hit for D, U, W = O hit for D, U, W=l a l w a y s hit for D. U. W
D, D, D, D,
1
0 1
U, U, U, U.
W W W W
undefined set (=l) cleared (=O) undefined
Table 3.7: Effrcfs of tht* coq&~~r~~tan~ bit pnirs Testing of the TLB is achieved by read and write (TLB lookup) of the TR6 and TR7 test registers. To set a TLB entry for a test, you must first load TR7 with the values for the physical address, PL = 1 and REP. Subsequently, you have to write suitable values for the linear address and the attributes to TR6, and erase the instruction bit C. To perform a TLB lookup (that is, reading of a TLB entry), you must load TR6 with the necessary linear address and set C. Subsequently, you have to read both test registers TR6 and TR7. The PL-bit indicates whether the result is a hit (PL = 1) or not (PL = 0). In the first instance, the values in the test register reflect the content of the respective TLB entry. However, if no hit occurs, the values are invalid.
3.9 i386 Virtual 8086 Mode - Myth and Reality With the i386 a new operating mode was introduced which, as was the case for the 80286 protected mode, led to some confusion and a few myths. In the following sections I want to clarify the virtual mode. Together with the paging mechanisms, extensive capabilities for multitasking are available even for programs that are actually unable to use multitasking (what a contradiction!). The incentive for the implementation of the virtual mode came from the enormous variety of programs for PCs running under MS-DOS, which is a purely real mode operating system. Remember also that powerful i486 PCs with MS-DOS always operate in real mode, and therefore use only the first of all the 4 096 Mbytes of the 32.bit address space. Moreover, neither the extensive protection mechanisms of the protected mode nor the i386 paging mechanisms are used. Therefore, the virtual mode now enables the execution of unaltered real mode applicncons, as well as the use of protected mode protection mechanisms. The virfrml 8086 tasks may run in parallel with other virtual 8086 tasks or i386 tasks, that is, mrrltitnskirrg with programs not designed for multitasking is taking place. 8086 programs running with MS-DOS are not designed for multitasking because DOS as the operating system is rrot rr-crrtmrlt. If a DOS function
126
Chapter 3
is interrupted by an interrupt, which in turn calls a DOS function, then the PC frequently crashes because after an IRET the original stack is destroyed. Because of this, the programming of resident (TSR) programs is very complicated. By means of the virtual mode and paging these problems can be avoided. The virtual 8086 mode is, for example, used by the DOS windows under Windows or OS/2. Alternatively, it is possible to use a proper protected mode operating system with the corresponding protected mode applications (for example, OS/2 and a database programmed for OS/2 right from the start).
3.9.1 Virtual Machines and Virtual 8086 Monitors In virtual mode the i386 hardware and the virtual 8086 monitor constitute a so-called virtzd machine. The virtual machine gives the program or user the impression that a complete computer is dedicated to its use alone, even if it is only a member in a multiuser environment. The complete computer, for example, has its own hard disk, its own tape drive, or its own set of files and programs. Actually, the system may generate and manage many of these virtual machines. The system divides the physically present hard disk into sections each dedicated to one of the virtual machines. Thus, the user has the impression that he/she is in possession of a complete hard disk. With a multiuser system such as UNIX/XENIX, many users can work on the computer. But every user has the impression that he/she is currently the only one. A mainframe succeeds with more than 1000 (!I such users. In a single-user system with multitasking such as OS/Z, virtual machines are not generated for several users. Instead, the system generates its own virtual machine for each task. Because virtual machines are only serviced for a short time period, multitasking occurs. Thus, for example, a compiler, the printer spooler and an editor may run in parallel. Windows/386 and Windows 3.0 (and higher) on an i386, as well as the 32-bit version of OS/2, use such virtual machines on the basis of the virtual 8086 mode to execute real mode applications in parallel. Just a few more words on the virtual 8086 monitor. This monitor, of course, has nothing to do with the screen, but represents a system program (typically a part of the operating system kernel) that is used to produce and manage (thus monitor) the virtual 8086 tasks. The i386 hardware uses a TSS to produce a set of virtual i386 registers. Further to this, a virtual memory is formed corresponding to the first Mbyte of the linear address space. A real mode program then has the impression that a normal 8086 is available. The processor then carries out physical instructions that affect these virtual registers and the virtual address space of the task. The purpose of the virtual 8086 monitor is essentially to supervise the external interfaces of the virtual machine: that is, of the virtual 8086 task. The interfaces are formed from interrupts. exceptions and I/O instructions, through which the virtual 8086 task can logically communicate with the other parts of the system. The exceptions and interrupts take on the role of physical interfaces (e.g. the control registers) in a normal 8086 PC. The monitor makes sure that the virtual 8086 machine fits into the extensive i386 system in protected mode, as a separ‘ite task without the possibility of interfering with other tasks, or with the i386 operating systc’m itself.
j I;
Logical Memory Addressing and Memory Access
127
3.9.2 Addresses in Virtual 8086 Mode The main incompatibility between real and protected mode is the different address determination methods. In real mode the value of the segment register is simply multiplied by 16. In protected mode, however, the segment register is used as an index to a table, where the address of the segment is stored. Therefore, it is impossible, for example, to determine the linear address of a memory object immediately from the segment and offset register values. However, many real mode programs do exactly that, and thus make it impossible to execute them in protected mode properly. If the i386 is in virtual 8086 mode and has to run 8086 programs, it must calculate the linear address of an object in the same way as the 8086 in real mode due to the reasons stated above. Thus, in virtual 8086 mode, the i386 produces the linear address of an object by multiplying the value of the applicable segment register by 16 and then adding the offset as in real mode. There is, however, a difference between the 8086 and the i386 in that the i386 makes use of 32-bit offset registers and the 8086 does not. Virtual 8086 tasks can produce linear 32-bit addresses through the use of an address size prefix. If the value in an i386 offset register in virtual 8086 mode exceeds 65 535 or ffffh, the i386 issues a pseudo protection exception, in that it generates the exception Och or Odh but not the error code. As in the normal real mode, the combination of a segment ffffh with an offset of ffffh in virtual 8086 mode leads to an address loffefh which is beyond the first Mbyte in the real mode address space. The i386 can then break through the 1 Mbyte barrier of real mode by almost 64 kbytes also in virtual 8086 mode.
I
3.9.3 Entering and Leaving Virtual 8086 Mode The i386 can be easily switched into virtual 8086 mode; the VM (virtud mode) flag in the EFlag register is simply set. Note that additionally, the i386 must already be working in protected mode. A direct jump from real into virtual 8086 mode is not possible. The VM flag can only be set by code with the privilege level 0 (the operating system kernel), a task switch through an i386 TSS, or an IRET instruction that collects the EFlags with a set VM-bit from the stack. A task switch automatically loads the EFlags from the TSS of the newly started task. In this way, it is not necessary for the operating system itself to decide at each task switch whether the newly started task should be carried out in protected or virtual 8086 mode.
j ;
Moreover, since the TSS of the operating system makes this decision only when the TSS for the task is created, it is necessary to set the EFlag entry, thus ensuring that the task will always be carried out in virtual 8086 mode. An 80286 TSS, on the other hand, cannot change the most significant word in the EFlag register containing the VM flag, because of the reduction of the flag entry to 16.bit. The i386 quits virtual 8086 mode when the VM flag is reset. This is possible through the use of an i386 TSS or an interrupt, that is, an exception that refers to a trap or an interrupt gate. The i386 then returns to the normal protected mode, in order to carry out other protected mode tasks. This is more clearly illustrated in Figure 3.25. In virtual 8086 mode all 8086 registers are available and are expanded to 32-bit (through the use of a preceding E). Additionally, the new i386 registers such as FS, GS, the debug register etc. are
128
Chapter 3
System (CPL=O...Z)
Applications (CPL=J)
IRET with
Virtual 8086 Monitor
VM=l
b
Interrupt/Exception with VM=O
8086 Program (Virtual 8086 Mode)
Task Switch with VM=l
T
I
I
I
Task Switch with VM=O
(Protected Mode)
Figurr 3.25: E&ring and leavmg virtual 8086 rnode
available. Further to this, a virtual 8086 task can use the new instructions that were implemented in the 80X36/286/386 such as BOUND, LSS, etc. Even though the mode is called virtual 8086 mode, it would be more accurate to call it virtual i386 real mode.
3.9.4 Tasks in Virtual 8086 Mode In addition to the hardware platform of the i386, it is necessary to use a number of software components to assemble a virtual machine that can execute an 8086 program in virtual 8086 mode. These are: - the 8086 program itself (real mode), - the virtual 8086 monitor (protected mode), - operating system functions for the 8086 program (real or protected mode). You can see that the i386 hardware only supports the virtual tasks; on its own it is not sufficient. The same applies to the normal protected mode. The i386 hardware supports a multitasking operating system through the use of automatic access checks, etc. in protected mode. That 011 its own is not sufficient, but enables a lot more work to be done by the system programmer to set up a stable system. Without protected mode, a multitasking operating system would still bc possible in principle, but, without the hardware support, it would be much harder to prograrl’ and keep stable (and it’s better to say nothing at all about the performance). Let US look at the example of an editor for MS-DOS; this is an 8086 program. The virtual 80% monitor is part of the protected mode operating system for the i386, because the monitor directly affects the system (for example, through the memory management). Further, the t&or needs operating system functions to open and close files which, up to now, have been carrrcC out by DOS.
LogIcal Memory Addressing and Memory Access
129
The three parts together form a virfual 8086 tnsk. It is managed by an i386 TSS. In this way, it is possible to call the virtual 8086 task like any other, and specifically for the protected mode formulated task, through a task switch and the i386 TSS. The 8086 program can be embedded in a multitasking environment. The virtual 8086 monitor runs in protected mode with the privilege level PL = 0 and contains routines for loading the 8086 programs and for handling interrupts and exceptions. In comparison, an actual 8086 program has a privilege level of CPL = 3 (after all, it is only an application). In virtual 8086 mode, the first 1OfffOh bytes of the i386 linear address space (from 4 Gbytes equals 100 000 OOOh) are used by the 8086 program, as in real mode. Addresses outside this area cannot be used by the 8086 program. The addresses beyond IOffefh are available to the virtual 8086 monitor, the i386 operating system and other software. The virtual 8086 task is now only missing the normal operating system functions of the 8086 operating system (in our case, above all, those of INT 21h). For the implementation of this, there are two strategies: - The 8086 operating system runs as part of the 8086 program, that is, the 8086 program and MS-DOS form a unit. In this way, all of the necessary operating system functions are available. - The i386 operating system emulates the 8086 operating system. The advantage of the first option is that the previous real-mode operating system can be used in a nearly unaltered form. Every virtual 8086 task has its own MS-DOS copy (or the copy of another real-mode operating system) exclusively dedicated to it. Several different operating systems may therefore run in an i386 machine: the overall i386 operating system for protected mode programs as well as the various 8086 operating systems for 8086 programs in virtual 8086 mode. But there remains a serious problem: the operating system functions of MS-DOS and other systems are called by means of interrupts; and interrupt handlers are very critical sections of the i386 operating system, which runs in protected mode. The way in which the problem is solved is described below. #If several virtual 8086 tasks are to run in parallel, their coordination is easier if we use the second option above. In this case, the real-mode operating system of the 8086 tasks is emulated ,by calls to the i386 operating system in most cases. ‘In protected mode, the I/O instructions are sensitive as to the value of the IOPL flag. A value
‘of CPL above IOPL gives rise to an exception. In virtual 8086 mode, these I/O instructions are not sensitive to the IOPL flag, however. Protection of the I/O address space is carried only by means of the I/O permission bit map. instead, the instructions IWSHF, POPF, INTn, IRET, CL1 and ST1 now respond to the IOPL value, as the instructions PUSHF, POPF, CL1 and ST1 may ‘alter flags. In the wider i386 environment, with possibly several virtual 8086 and protected mode tasks running, changing flags is the job of either the virtual 8086 monitor or the’i386 ‘operating system alone - and not the responsibility of antiquated and inferior MS-DOS programs. IB ecause of the dependency of the INTn and IRET instructions on IOPL, the virtual 8086 monitor .may intercept operating system calls from the 8086 program via interrupts. If the value of IOPL & lower than 3 (that is, lower than the CPL of the 8086 program), the monitor intercepts the I. Werrupt. If the 8086 operating system is part of the 8086 program, the monitor hands the interrupt Over to it. Call and result may eventually be adapted to the i386 environment. Alternatively, the
.
Chapter 3
130
monitor may emulate the function of the 8086 operating system concerned directly. Remember, however, that interrupts in real and protected mode appear very different. The 8086 program is usually written for an 8086 processor (or an i386 in real mode). The virtual 8086 task uses an interrupt vector table in real mode form with the interrupt vectors in the CSIP format. The table begins at the linear address OOOOh and contains 1 kbyte. Also, in virtual 8086 mode, the i386 does not use these real mode tables directly; first it uses an INT instruction of the 8086 program to call the corresponding handler of the i386 operating system through the IDT. It then quits the virtual 8086 mode. The flow of an interrupt in virtual 8086 mode can be seen in Figure 3.26. System (CPL=O...2)
Virtual 8086 Application (CPL=J)
-. (Virtual 8086 Mode)
(a) 8086 Operating System is Part of 8086 Program
System (CPL=O...2)
i386 OS Routine i386 OS Routine
(b) 8086 Operating System is Emulated by Virtual 8086 Monitor J
Logical Memory Addressing and Memory Access
131
As usual with an interrupt, the i386 stores the EFlags on the stack. Thus, the called handler knows by means of the stored flag whether or not the VM flag was set at the time of the interrupt, that is, whether or not the interrupt stopped a virtual 8086 task. If VM is set, the interrupt handler passes control to the virtual 8086 monitor. The monitor itself can handle the interrupt or again pass control to the interrupt handler of the 8086 program. For this purpose, the monitor checks the interrupt vector table of the interrupted 8086 task, to determine the entry point of the real mode interrupt handler. Through an IRET instruction (with the privilege level PL = 0 of the monitor), the i386 switches back into virtual 8086 mode and calls the handler. After the real mode interrupt handler has finished, the completed IRET instruction of the real mode handler is sent again to the virtual 8086 monitor. After it has prepared the stacks for the return, a further IRET is issued. In this way, the i386 restarts the virtual 8086 program that was stopped by the interrupt. The interrupt is served in this roundabout way through the IDT monitor handler from the real mode handler of the 8086 task. As you can clearly see, it forces an interrupt to use a very extensive and, therefore, protracted procedure. In the Pentium, the process is more optimized. Many MS-DOS programs set and clear the IE interrupt flag, to control the operation of hardware interrupts in critical code sections. The i386 operating system, which is responsible for the whole machine, cannot tolerate such interference. It is the responsibility of the operating system alone to decide whether a hardware interrupt request should be serviced immediately, after a short time, or not at all. In virtual 8086 mode, there are also the PUSHF, POPF, CL1 and ST1 instructions that depend upon the IOPL, as they can directly (CL1 and STI), or indirectly (PUSHF and POPF) change the IE flag. The monitor intercepts these instructions and processes them in such a way that they are compatible with the whole i386 system. The reason for the interference of the monitor when an IOPL dependent instruction is issued by an 8086 program is clear: the 8086 program is under the impression that these instructions should be executed as in real mode. A further critical area for multitasking systems is the l/O ports, because with their help, the processor can access the registers of hardware components. Here also, the i386 operating system cannot tolerate interference from an 8086 program in virtual 8086 mode. Many real mode programs, unfortunately, access the l/O ports directly; under MS-DOS, this is no great problem, the PC crashes regularly anyway. Under a multitasking system, on the other hand, most programs can only do this in a roundabout way through the operating system so that accesses to the I/O ports are coordinated. After all, a task containing errors should not affect other tasks, or other users in a multi-user system. In virtual mode, the I/O port access problem does not exist, as the system protects the critical ports through the use of the I/O permission bit map and not the IOPL flag. The I/O instructions are no longer IOPL sensitive even though the i386 is running in protected mode. In this way, the virtual 8086 monitor can, in one respect, permit an 8086 program to directly access critical l/O ports. If, for example, an 8086 program targets a plug-in board programmed for controlling a robot, the registers would not be used by another program, and so for this port a conflict would never occur - the roundabout way through the @6 operating system would only cause delays and would not provide any additional protec,tiOn. With the help of the permission bit map, however, critical ports such as the registers of a monitor screen control chip or a hard disk controller are protected against unauthorized and -‘Po sS1‘bl y erroneous accesses. The virtual 8086 monitor intercepts such accesses and handles them iccordingly
132
Chapter 3
3.9.5 Paging and Virtual 8086 Mode As long as only one task is running, no problems arise concerning the use of main memory. The first 1OfffOh bytes of the linear address space are reserved for the virtual 8086 task. The operating system and all other tasks occupy the addresses above. But what happens with the video memory? Many 8086 programs directly write into the video RAM to speed up text and graphics output, and they prefer to do this from the upper left to the lower right corner for the whole screen. If several tasks running in parallel output data onto the screen, a hopeless confusion is the result. This is especially true if some of these tasks output data in text mode and others output it in graphics mode (and, very likely, using various graphics modes). These problems led to the development of the Presentation Manager for OS/2 and Windows, which supplies a unique interface to the screen for all programs. An older 8086 program, of course, doesn’t know anything about this. The solution for this problem is paging; the address space reserved for the video RAM (see Figure 1.28) can be mapped to a video buffer in main memory. This video buffer forms a ccvirtuab screen. Then the virtual 8086 monitor can decide which data in the video buffer is actually transferred to which location in the real video RAM. It is thus possible to assign an 8086 program a window on the screen for data output. For this purpose, the data need not even be available in the same form. The virtual 8086 monitor, for example, may convert the data delivered by the 8086 program in text mode into a graphics representation suitable for output in a Windows window or OS/2 Presentation Manager window. In most cases, 8086 programs don’t occupy all the memory that is available with MS-DOS. Thus, a further advantage of paging even with only one virtual 8086 task is that the unoccupied sections of the 1OfffOh bytes can be used by other tasks. Without paging, the memory chips at the free locations are not filled with data, yet are unavailable for the storage of data and code for other tasks. As already mentioned, the 8086 program of a virtual 8086 task occupies t h e lower IOfffOh bytes of lineur uddress spclce. If several virtual 8086 tasks run in parallel they must be swapped completely during a task switch to avoid collision of the address and data of the various tasks. The only option to solve these problems is the i386 paging mechanism. Each of the various virtual 8086 tasks must map the lower IOfffOh bytes of linear address space to different physical address spaces. This can be achieved by different page table entries for the first 1OfffOh bytes corresponding to 272 pages with 4 kbytes each for the individual virtual 8086 tasks. Thus the individual virtual 8086 tasks, and therefore also the 8086 programs, are isolated Additionally, the other advantages of paging, for example, a large virtual address space anti protection of the pages, can be used. If the 8086 programs do not alter the 8086 operating systen (that is, they don’t overwrite its code), then several 8086 programs may share one copy of the 8086 operating system and the ROM code. The program code always remains the same, but tlrc differences of the 8086 operating systems for the individual tasks are only the different values of registers CS, ElP, SS, SE, DS, etc. These register values are stored in the TSS of the individual virtual 8086 tasks, and therefore define the state of the codes for the individual tasks without the wed for several operating system codes. Therefore, a very economical use of the nvnila~le memory is possible.
,
Logical Memory AddressIng and Memory Access
133
In the case of a bug in an 8086 program, it is possible that the 8086 operating system has been overwritten erroneously. Thus, an error in one program also affects the execution of another. This is a clear violation of the i386’s protection philosophy! It can be avoided by marking the pages for the 8086 operating system in the page table as read-only (R/W = 0) so that any write attempt leads to an exception. If the attempt is a legal update of an 8086 operating system table, the monitor can carry out this job. If an illegal attempt to write occurs, the monitor aborts the 8086 program. Program errors can therefore be detected very quickly. Destruction caused by overwriting the 8086 program or the 8086 operating system which is exclusively dedicated to the task concerned is not very serious. Only the erroneous program crashes. The virtual 8086 monitor, on the other hand, remains operable if one of the following precautions has been observed. - The first 1OfffOh bytes of the linear address space are exclusively reserved for the 8086 program and the 8086 operating system. Addresses beyond this section cannot be generated without issuing an exception. i386 protected mode applications and system routines are always loaded beyond the address 10fffOh. - The pages of the virtual 8086 monitor are identified in the respective page table as being supervisor pages (U/S = 0). As the 8086 program of virtual 8086 tasks always runs with CPL = 3, i.e. the user level, it cannot, therefore, overwrite the virtual 8086 monitor. Such an attempt will lead to an exception, which stops the monitor, to interrupt the erroneous 8086 program. Do not expect too much from the 8086 compatibility of OS/Z or Windows. The problem with real mode applications under DOS is quite simply that many programmers apply their skills and knowledge to out-trick DOS to obtain the maximum possible performance from a PC. Thus, such programs, for example, read the interrupt vector table and sometimes jump directly to the entry point of the handler, or they overwrite an interrupt vector directly using a MOV instruction instead of using a much slower DOS function call. Such contrived strategies do improve performance under DOS, but at the same time the i386 operating system or the virtual 8086 monitor is being regularly out-smarted. The PC itself is unwilling to execute such (<expert programs- for compatibility. In spite of this, the virtual 8086 mode is a powerful instrument, Particularly when used together with the paging mechanism of the i386, for embedding older real mode programs in a multitasking environment.
. .
4 Physical Memory Addressing and Memory Access The memory addressing described previously is of a purely logical nature. You can force the i386, for example, with the instructions MOV reg, mem or MOV mem, reg to access the memory. Besides the memory space, the i386 (and, of course, all other 80x86 processors) have a so-called I/O address space which is accessed by means of the machine instructions IN, OUT, etc. via ports in quantities of one byte, word or double word. More details about this in Section 4.3. In reality, to read and write data, the i386 must be able to transmit and receive various control signals and, of course, it also needs power for its circuitry.
4.1 i386 Pin Connections The i386 control signals and supply voltages are exchanged and received over a total of 124 pins. The i386 is supplied as a Pin Grid Array (PGA) package with 132 interface pins. Due to the advances in integrating electronic circuits, the terminals of a chip often need more space than the chip itself. On its underside, the PGA has a corresponding number of contact pins. What is difficult to imagine is that the chip interfaces require more space than the i386 chip itself. Figure 4.1 schematically shows the i386 pinout. Mark Pin Al
In the following, 1 would like to introduce the i386 interfaces and signals, Using these, the i3% is able to communicate with its environment, that is, the main memory, various controllers aiiii 134
Physical Memory Addrewng and Memory Access
the “y. led via In rol
ns. the Ian t is ure
135
drivers. In the following list, the of binary floating-point numbers by, for example, using 2’ complement for the mantissa as well as for the exponent. Your imagination can run riot here, but whether the result is, first, sensible, second, useful and, third, generally acknowledged is another matter. The implementation of floating-point arithmetic by electronic circuitry is far more complicated than that of integers. For example, two floating-point numbers cannot be added without a further investigation, because possibly their exponents do not coincide. The CPU must first check the exponents and equalize them, before the also adapted mantissas can be added. For that reason, the electronic circuity for floating-point arithmetic is frequently formed on a separate coprocessor. The i386 only has an ALU for integer arithmetic, not one for floating-point operations. That is provided by the i3S7 coprocessor.
6.1.2 The Standard - IEEE Formats As is the case for integers and long integers, for floating-point numbers you have to reserve a certain number of bits. In principle, you are free to choose an indicates that the i387 passes a signal to its respective pin. The pins are listed alpha numerically. ADS (1) Pink K7 The i386 signal address strobe specifies together with READY when the i387 bus controller can read the W/i? signal and the chip selection signal from the CPU. The i387 ADS pin is normally connected to the i386 ADS pin. BUSY (0) Pin K2 If thi s signal is low level, then the i387 executes an ESC instruction.
CKM
(I)
pin J11
men this pin is supplied with a high level signal, the i387 operates in synchronized mode. This means that all components are operated by CI’UCLKZ, which also clocks the i386. When CKM is connected to ground, only the bus controller uses the external processor clock CLKT all other Components are clocked by NUMCLK2. In this case, an external clock generator is necessary for the i387. &’
176
Chapter 6
C
CMDO (I) Pin L8
a C
The instruction signal indicates, during a read cycle, whether the control or status register (CMDO at low level, therefore active) or the data register (CMDO at high level, therefore inactive) will be read by the i386. If, during a write cycle, CMDO is active, the i386 will transfer an ESC opcode; otherwise, it transfers a data value.
ii I: T
CPUCLKZ (I) Pin K10 The external CPU clock signal CLK2 is supplied to this pin. The i387 uses CLK2 for the internal time harmonization of its bus controller logic. If CKM is high level, then the i387 is synchronized with the CPU. The other components are also clocked with this signal. D31-DO (I/O) Pins A2-A5, A7-AS, AlO, Bl, B3, B5-B6, BS-Bll, Cl-C2, ClO, Dl-D2, DlO-Dll, ElO-Eli, GlG2, GlO-Gll, Hl-H2, HlO-HI1 These 32 pins form the bidirectional data bus for data exchange between the CPU and the coprocessor.
,
P il e a
ii
r
I
‘I G
a b h
ERROR (0) Pin L2 When ERROR is active, this pin provides a low level signal as a non-maskable interrupt been issued in the i387. Thus, ERROR states the condition of bit ES in the status register.
has
NPSI, NPSZ (I) Pins L6, K6
S P
If both numeric processor select signals are active, then NPSl is at a low level and NE’S2 at a high level. In this state, the i386 records an ESC instruction and activates the i387, so that the instruction can be transferred and executed. NPSl is normally connected to the i386 M/z pin so that the i387 can only be activated for l/O cycles. NPS2 is normally connected to the most significant address bit A31 so that the i387 can recognize I/O cycles.
T
NUMCLKZ (I) Pin Kll
Z C( K (1 (1 (1
If CKM is at a low level, then the i387 runs in an asynchronous mode using the input signal to clock all components, with the exception of the bus controller, which is always run synchronized. The relationship between NUMCLK2 and CPUCLK2 must be between 0.625 and 1.400.
(I (I (I
PEREQ (0) Pin Kl
M
An active processor extension request signal, that is, a high level signal, indicates to the i386 tlljt the i387 is either waiting for data from the CPU or is ready to transfer data to the CPU. Oni”
T w
P:
Calculating SemIconductor - The i387 Mathematical Coprocessor
177
all of the bytes have been transferred, the i387 deactivates the signal, to inform the i386 that the complete data transfer has taken place. READY (1) Pin K8 This pin receives the same signal as the READY pin of the i386, that is, it indicates whether the present bus cycle can be finished or whether the addressed unit requires more time. The i387 itself does not access the main memory or carry out I/O cycles (with the exception of data exchange with the CPU). The READY signal makes it possible for the i387 bus unit to observe all bus activities. READY0 (0) Pin L.3 The i387 activates the signal ready output to this pin so that, after two clock cycles, write cycles can be finished, and after three clock cycles, read cycles can be finished. READY0 is sent, when an i387 bus cycle has finished, as long as there are no additional wait cycles. This is necessary because different ESC instructions require different times in advance of, during or after the transfer of operands.
I /---RESETIN (I) I Pin LlO I
A low-level signal at this pin has the effect of an i387 internal reset. It breaks off all instructions and adopts a defined condition. For this to occur, RESETIN has to remain at a high level for a minimum of 40 NUMCLK 2 cycles.
STEN (I) Pin L4 The purpose of the status enable signal is for the selection of the- coprocessor chip. When STEN is active, the i387 reacts to address signals M/IO, A31, A2, W/ R, ADS etc. from the i386. The following meanings apply to the combination (STEN NPSl NPS2 CMDO W/R) of coprocessor control signals: (OXXUX) i387 not (~Ixwx) i387 not (lxhx) i387 not (IOIOO) the i386 (IOIOI) the i386 (10110) the i386 (lOllI) the i386
fie
selected selected selected reads a control or status word from the i387 writes an opcode into the i387 reads data from the i387 writes data to the i387
level of 1sth’. slgnal determines whether the current I/O cycle with the i386 represents a wte (W/i? high level) or read cycle (W/R low level).
178
Chapter 6
vcc (I) Pins A6, A9, B4, El, Fl, FlO, J2, K5, L7 To supply the i387 with current, these pins are provided with the supply voltage (normally +5 V). GND Pins B2, B7, Cll, E2, F2, Fll, Jl, JIO, L5 These pins are connected to ground (normally 0 V). high Pins K3, L9 Both of these pins are pulled to a high level by means of a resistor, or are connected to Vcc free Pin K9 This pin is not connected to an element or to a potential. It is floating.
6.5 Structure and Function of the i387 The i387 makes additional registers, data types and instructions available to the CPU. This happens as a result of defined cooperation between the i386 and the i387 at the hardware IevelFor the programmer, assembler or machine code programmer, the combination of i386/i387 appears as one single processor, the difference being that together they have a much higher capacity for performing mathematical calculations at speed than an i386 on its own. The i386/ i387 combination can also be described as one processor on two chips. Figure 6.5 shows the internal structure of the i387. The i387 is divided into two main functional groups, the Control Unit (Cu) and Numeric Urlrt W-0. The numeric unit performs the mathematical calculations; the control unit reads and decodes the instructions, reads and writes memory operands using the i386, and executes the control instructions. The CU can dedicate itself to synchronization with the CPU, while the NL’ executes the difficult numeric work. On mathematical grounds, the exponent and mantissa of a floating-point number are subjectej to differing operations for calculation. For example, the multiplication of two floating-point numbers *Ml S B*E’ and +M2 * B’“’ leads to the addition of the exponents and multiplication et the mantissas. Thus, the result is *Ml* BiE’ * kM2 w B *e2 = (_+Ml ~ kM2) * B’@’ + *W Therefore, the arithmetic logic unit of the NU is separated into exponential and mantissa prrts The interface between these two parts has the function of normalizing the result by increasin!: or reducing the exponent.
I
Calculating SemIconductor - The i387 Mathematical Coprocessor
Data
Data
Buffer i
:
j
I
i
;
179
r
^. Bus Control
Figure 6.5: Mema/ i387 slnrcture. The 587 hns a control rrnit for driving he bus and for umfrolling the numeric unit. The numeric unit cwries out all cnlcrrlatious raith /lonting-pohf numbers in nn exponent and a mantissa module. Urdike the i386, the 1387 bras R register stnck instmd of discrete reyisfers.
The i386 and i387 coprocessor differ significantly in their instructions: the i387 cannot process any i386 instructions, and vice versa. One might expect that now there are two different instruction streams, one with instructions for the CPU and the other with (mathematical) instructions for the coprocessor. This is, however, not the case. Instead, the instructions for the two processors are mixed in a single instruction stream. Coprocessor instructions are essentially different from i386 instructions in that they always start with the bit sequence 11011 (=27) and are identified as ESC itlstructions whereas i386 instructions and prefixes start with a bit sequence ‘other than 11011. There are many individual general-purpose registers (EAX, EBX, etc.), as well as segment registers (CS, DS and SS), which are available for access in the i386. The i387, though, has a register stack containing eight SO-bit registers (Rl to R8), as well as various control and status registers (Figure 6.6). Each of the eight data registers is sub-divided into three bit groups corresponding to the ternPorary real format. The eight data registers are organized as a stack and not as individual ?$sters. The 3-bit field (TOP) in the status word (Figure 6.7) shows the current “Top” register. .‘QJ has similar qualities and tasks to those of the i386 stack pointer ESP. These can, via i387 bhuctions such as FLD (floating c load and push) and FSTI’ (floating store and pop), which are -&nilar to the i386 instructions PUSH and POP, reduce TOP by 1 and place a value in its +FPective register, or increase TOP by 1 and take off the applicable stack register. A S with the -i”.
Chapter 6
180
Register Stack 64 63
7978
1
Mantissa
SI Extmnent
Mantissa
k,I
1 EXDonent 1 Emonent 1 EXDonent 1 ExDonent I EXDonent I
Mantissa
ISi
Exconent
S Exponent
ISI ISI ISI ISI
Control and Status Register 15 1 I
Mantissa
Mantissa Mantissa Mantissa Mantissa
31
16 15
0
Control Status
tTaq Figure 6.6: i387 inferd registers.
B:
Busy Bit l=NU Busy O=NU Not Busy CX-CO: Condition Code TOP Stack Pointer (Top of Stack) lll=Reg. 7 is TOP OOO=Reg. 0 is TOP OOl=Reg. 1 is TOP ES: Error Status l=Non-masked Exception Occurred O=No Exception SP: Stack Flag O=lnvalid Operation Except Register Stack Overflow/Underflow l=Register Stack Overflow/Underflow Precision PE: Underflow UE: O=No Exception Overflow OE: I=Condition Generated Exception Division by Zero ZE: Denormalized Operand DE: Invalid Operation IE: I
i386, the stack increases downwards to registers with smaller numbers. Most coprocessor instructions implicitly address the top register in the stack, that is, the register whose number i>
stored in the TOP field of the status register. You can also explicitly specify a register with mall! i387 instructions. Note that the explicitly specified register is not absolute but relative to Tar
Calculating Semiconductor - The i387 Mathematical Coprocessor
Example :
BID st(3)
addresses the third resister. but which of the registers RO to
181
R7 ie
actually accessed depends upon TOP. If, for example, TOP is equal to 5 then resister 2 is accessed. If TOP is equal to 1, operation, as register
then PLD st(3) means an invalid
R(-2) does not exist. The i387 reports an error.
Within the
i387, the data transfer between registers RO and R7 is very fast because the coprocessor has an 84-bit path and within the i387 itself no format transfer is necessary. Information relating to the current condition of the i387 is held in the status word. The coprocessor can write the status word to memory suing the instructions FSTSW/FNSTSW (store status word). It is then that the i386 can examine the status word to determine, for example, the cause of an exception. The i387 B-bit is only partly available due to compatibility with the 8087, and is always equal to the ES-bit. On no account does B produce any data with regard to the condition of the numeric unit and therefore the pin BUSY. The i387 error status is indicated by the new ES-bit. If ES is set, then an unmasked exception has occurred, the cause of which is given by bits SP to IE. Bit SP is used to differentiate between invalid operations caused by register stack under- or overflow and invalid operations with other causes. If SF’ is set and an under- or overflow of the stack register has occurred, then due to bit Cl, it is possible to differentiate between an overflow (Cl = 1) and an underflow (Cl = 0). For an interpretation of the condition codes C3-CO of a comparison or similar operation, refer to Table 6.2. These are similar to the flags of the i386. Note that in this relationship, the i387 Instruction
c3
c2
Cl
co
Meaning TOP > operand (InstructIon FTST) TOP < operand (instruction FTST) TOP = operand (instruction FTST) TOP cannot be compared valid, positive, denormalized invalid, positive, exponent=0 (+NAN) valid, negatwe, denormalized !nvalld. negative, exponent=0 (-NAN) valid, positive, normalized Infinite, positive (*) valid. negative. normalized Infwte, negative (--) zero, positive (+O) not used zero, negative (-0) not used InvalId. posltwe. exponent=0 (+ denormallzed) not used Invalid. negative, exponent=0 (- denormallzed) not used
Type 0
0
x
0
compare
0
0
X
1
test
1 1 0 0 0 0 0 0 0 0 1 1 1 1 1
0
x
1
x
0 0 0 0 1 1 1 1 0 0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 1 0
0 1 0 1 0
1 1
1 1
0 1
1 0
1
1
1
1
investigate
‘able 6.2: i387
nl,lr/iti than the 8O86/87
Calculatmg SemIconductor - T h e 1 3 8 7 M a t h e m a t i c a l C o p r o c e s s o r
189
6.10 i387 Reset The i387 performs an internal initialization when it is reset. This occurs as the result of a transition of the signal at the RESETIN pin from a low to a high level, and subsequently maintaining the signal at a low level for at least 40 NUMCLKZ cycles. For this, the i387 requires 50 NUMCLK2 cycles, during which it cannot accept any external instructions. The instruction FNINIT (coprocessor initialization) also puts the i387 internally back into a clearly defined start condition. Table 6.4 shows the different register values after execution of a RESET or FNINIT instruction. Register
Value
Meaning
control word
37fh
status word
xy00h”
tag word
ffffh
all exceptions are masked, preclston 64 bits, rounding to nearest value all exceptlons cleared, TOP=000b2’, condition code undefined, B=O all registers are empty
“X=0 or x=4, osys7 *I TOP pants to register 7 (1 11 b) after the first PUSH mtructvm
Table 6.4: i387 register cor~tents after a Reset/FNINIT
the stack grows downwards.
7 i386 Processor Derivatives and Clones The above discussion dealt with the , only refers to the logical and functional structure of the processors, not the actual internal electronic structures. These are (with some limitations as far as microcode is concerned), at the circuit and manufacturing level, entirely new developments.
7.1 Cutting Down - The SX Variants of the Processors Shortly after introducing the i386 and i387 32-bit processors, Intel developed two (<cut down)) variants of these chips: the i386SX and the accompanying i387SX coprocessor. Internally they are identical to their big brothers, with the exception of an altered bus unit. Thus the i386SX can run in real, protected and virtual 8086 modes; it comprises the same control, debug and test registers, and carries out demand paging if the paging unit is active. Internally, it has the same 32bit registers as the i386, and can thus manage a virtual address space of 64 Tbytes per task, and may also execute 32-bit instructions. Moreover, the prefetch queue with its 16 bytes is the same size as in the i386. The i387 recognizes the same instructions, has the same registers, and exchanges data with the CPU in the same way as the i387. Table 7.1 lists the main differences, which exclusively refer to the address space and data bus.
Address bus Physical address space I/O address space Coprocessor port Data bus Data transfer rate*) Max clock speed
i386li387
i386SXli387SX
32 bits 4 Gbytes 64 kbytes 800000fxh 32 bits 80 Mbytes/s 40 MHz
24 bits 16 Mbytes 64 kbytes 8000fxh 16 bits 25 Mbytes/s 25 MHz
YOU can see that the (normal)> processor variants are far more powerful. The i386 is currentl! available with clock frequencies up to 40 MHz, compared to 25 MHz for the i386SX. Together with a data bus that is twice as wide, the fastest i386 is therefore able to transfer nearly four times more data than the fastest i386SX. While the physical address space of 4 Gbytes seem@’ as if it would be enough for decades, there are already PCs on the market that have a mai” memory of 16 Mbytes, that is, the maximum for the i386SX.
1386 Processor Derivatives and Clones
191
PCs, and especially notebooks with an i386SX base, have been developed successfully because all the innovative features of paging and the virtual 8086 mode are available to the user, but the system hardware, because of the smaller bus width, is easier and therefore cheaper. Only a 16bit data bus and a 24-bit address bus are needed on the motherboard, as was already the case for a normal 80286 AT. The i386SX comes in a plastic quad flatpack package (PQFP) with 100 pins, and the i387SX in a PLCC package with 68 pins. Figure 7.1 shows the pin assignments of these two SX chips.
Mark Pin’ \
7.1: i386SX and i387SX terminals. With the SX variants (that is, the i386SX and i387SXJ the external data bus is only 16 bits wide, although the data is processed internally with 32 bits, as is the case for the i386/ 887. Because of the address bus reductiotl to 24 nddress Iines, the i386SXli387SX physical nddress space &prises only 16 Mbytes, compared to 4 Gbytes.
Figure
;
It is immediately
noticeable that these processors have fewer connections when compared to
their counterparts. In the following, the altered pins and signals that are not self-explanatory are
briefly discussed. &, BLE (0) pins 19, 17 ; These bus high enable and bus low enable signals indicate which part (low-order and/or highOrder) - of the- data bus supplies or accepts valid data. On an i386 this job is carried out by the Pins BEO-BE3. 4. $%O, ,p 28 $l active signal at this float pin (that is, a low-level signal) forces all i387SX pins that are able
6i ” output signals into an inactive state. This signal is largely applied for test purposes. :.
192
Chapter 7
As for the i386, the i386SX can also carry out an internal self-test. It is issued in the same way and lasts for about 220 clock cycles, that is, about 30 ms with a 20 MHz i386SX. After the selftest the DX register contains a signature byte. The eight high-order bits have the value 23h. The (c3~ characterizes the i386 family and the x2>) indicates that this is the second member of the family, that is, the i386SX. The low-order byte indicates the i386SX version number. The i386SX and i387SX cooperate in the same way as the i386 and i387, thus the i386SX accesses the coprocessor with reserved I/O ports located outside the normal I/O address space. But because the i386SX has only 24 address lines instead of 32, the address line A23 instead of A31 is now activated. The I/O addresses for communication between CPU and coprocessor therefore comprise only three bytes (Table 7.2). The transfer protocol is the same as for the i386/i387 combination, but the data transfer for 32-bit operands lasts longer as more bus cycles must be carried out because of the 16-bit data bus. i3865X I/O port 8000f8h 8000fch 8000feh”
i387SX register opcode register operand register operand register
*I used durmg the second bus cycle I” a 32.bit operand transfer.
Table 7.2: Coprocessor port addresses
7.2 386SL Chip Set - A Two-chip AT The steadily increasing integration density of transistors and other electronic elements not only refers to RAM chips or the CPU (the number of integrated transistors rose from the 30 000 on an 8086 up to more than three million (!) for the P5 or Pentium). Another trend is that more and more single elements (for example, DMA controller, interrupt controller, timer, etc.) are integrated on one or two chips. PCs are therefore not only getting more compact, but the power consumption has also decreased significantly; an obvious advantage for notebooks. The provisional summit of this development is Intel’s 386SL chip set, which has the CPU chip 386SL and the I/O subsystem 8236OSL, and integrates nearly all CPU, support and peripheral chips on these two chips. Figure 7.2 schematically shows the internal structure of the 386SL and 82360SL. The 386SL integrates an improved i386 CPU core, a cache controller and tag RAM, an ISA bu5 controller and a memory controller on a single chip. The CPU core is implemented in a static design, and may be clocked down to 0 MHz to save power. Presently, the 386SL is available in two versions for clock frequencies of 20 MHz and 25 Mhz. The cache interface may be directly connected to a 16, 32 or 64 kbyte cache. The cache organization is programmable as direct mapped, 2- or 4-way set associative. The 20 MHz version may also be delivered without a cd~r controller and tag RAM. The memory controller accesses the main memory via a 24-bit address bus and a lh-bit data bus. To its environment, the 386SL CPU therefore behaves like an i386SX. Main memory can be
193
i386 Processor DewaWes and Clones
fi 386Sl.
Figure 7.2: Intmal structure
/
of the 386SL nnd 8236OSL
organized with one-, two- or four-way interleaving, and the number of wait states is programmable in the 386SL. Additionally, the memory controller incorporates an EMS logic, which implements LIM-EMS 4.0. The controller may also carry out ROM BIOS shadowing. An external shadow logic is not required - the ISA bus controller generates all necessary signals for the ISA bus. In the 386SL sufficiently powerful driver circuits are integrated for this purpose so that no external buffers and drivers are necessary. This significantly simplifies the design of a PC. Additionally, a peripheral interface bus (PI bus) is implemented. The 386SL may exchange data with fast peripherals (for example, the integrated 82370SL VGA controller) or flash memories by -___ means of this bus. For controlling the PI bus the pins (and signals) I’CMD, PM/IO, PRDY, l’START and PW/R are implemented. For a lower power consumption the 386SL can operate in pozuu rr~nr~ngemv~f mode in which, for example, the 386SL clock frequency can be decreased, or even switched off. The power management is carried out by a non-maskable interrupt SMI of a higher priority. The handler of this interrupt is located in its own system power management RAM, which is separate from the ordinary main memory. The 386SL may access this memory only by means of the SMRAMCS signal. Thus, this concept is similar to that which AMD and Chips & Technologies follow with their low-power versions of the i386 (see Section 7.3). On the 82360SL a real-time clock, a CMOS RAM, two 8254 timers with three channels each, two 8237A DMA controllers with improved page register 74LS672, a bidirectional parallel interface, ho 8259A interrupt controllers, two serial interfaces (a UART 16450 and a support interface for a floppy controller), a keyboard controller and an IDE interface are integrated. Of significance is the additional power management control. This always activates only the currently driven unit, and thus reduces the power consumption significantly. Instead of the internal real-time clock, an external clock may also be u s e d .
194
Chapter 7
The 386SL chip set is already prepared for the integration of a VGA, and the 386SL outputs a special interface signal VGACS when it accesses the address space where the VGA is usually located. With the 386SL and 82360SL chips, some DRAM S and a high-integrated graphics controller (for example, the 82370SL VGA controller) you can build a complete AT. Thus the PW (personal workstation) on your wrist would seem to be only a short time away.
7.3 Processor Confusion by i386 Clones The passion for copying that has hit the PC market, and which is the reason for the enormous drop in prices over the past ten years, does not exclude the heart of a PC. Only a few years ago, Intel more or less held the CPU monopoly, but today enormous competition is taking place (up to the Pentium level) even there. It is not only coprocessors that are now manufactured by competitors, but also the CPUs. Users are happy with this, as prices seem to be in endless decline, and the CPU clones are not only equal to the Intel models, but often do better in terms of performance and power consumption. At this time there are two main microelectronics firms manufacturing Intel-compatible processors: AMD (Advanced Micro Devices) and Cyrix. The most important i386 clones are briefly discussed below.
7.3.1 AMD Processors AMD is currently the most important manufacturer in the i386 market segment other than Intel. AMD processors are not only cheaper, but in most cases also faster and, especially, far more power efficient than Intel’s models. These chip types are particularly suitable for notebooks. Currently, AMD offers three different DX models with a complete 32-bit data bus, as well as three SX models with a 16-bit data and a 24-bit address bus. Am386DXfAm386SX The Am386DX is an i386DXcompatible processor available for clock frequencies from 20 MHZ up to 40 MHz, that is, its clock rate and therefore the performance can be up to 20% higher than the original i386. The Am386SX is the well-known cut-down version, and has only a 16-bit data and a 24.bit address bus. Both processor types are strictly orientated to the Intel models, and have no significant differences in comparison. They come in a 132-pin PGA or a 132-terminal PQFP (Am386DX) and a loo-terminal I’QFP (Am386SX), respectively. Am386DXLIAm386SXL The i386DX and i386SX-compatible processors, respectively, are implemented with a special power-saving design (the *L,, in the processor name means low power). The Am386DXL/SXL consumes about 40% less power than the comparable Intel models. The DXL type may be clocked with between 20 and 40 MHz, and the SXL model with 25 MHz at most. The main feature of the Am386DXL/Am386SXL is the static design of its circuits. The clock rate may thus be decreased to 0 MHz, that is, the clock signal can effectively be switched off. The
1386 Processor Denvatlves and Clones
17,
processor does not lose its register contents in this case. With a dynamic circuit design, on the other hand, the clock frequency may not fall below a minimum value without causing a processor malfunction. If the external clock signal is switched off, then, for example, the DXL processor draws a current of only about 0.02 mA compared to 275 mA at 33 MHz. This means a reduction in power consumption by a factor of 10 OOO(!). Battery-operated notebooks are very happy with this. Using the static processor design, a standby mode can readily be implemented. The BIOS need not save the processor registers, but can just switch off the clock signal after the lapse of a usually programmable time period within which no key press or other action has taken place. If, for example, you then press a key to reactivate the PC, the standby logic releases the clock signal. Then the CPU again operates without the need for the BIOS to restore the registers’ contents. This simplifies the implemention of a power-saving standby mode significantly. The Am386DXL comes in a 132-pin PGA or a 132-terminal PQFI’, the Am386SXL in a 100-terminal PQFP. Am386DXLV/Am386SXLV The Am386DXLV/Am386SXLV is directed uncompromisingly towards low power consumption. It operates with a supply voltage of only 3.3V compared to the 5V of an Am386DXl Am386DXL (hence, the LV, for low voltage). The power consumption of CMOS chips is proportional to the square of the supply voltage. By decreasing the supply voltage to 3.3 V, 50% of the power is saved. But at 3.3 V the chip may only be operated up to 25 MHz; to use the maximum 33 MHz a supply voltage of 5 V is necessary. The Am386DXLV/SXLV, too, is implemented in an entirely static design. Thus the external processor clock may be switched off and the Am386DXLV, for example, needs only 0.01 mA compared to 135 mA at 25 MHz. The Am386DXLV comes in a 132.terminal PQFP, the Am386SXLV in a 100-terminal PQFP. The terminals (except those for system management mode, etc.) provide and accept, respectively, the same signals as the Intel models. The following lists the additional signals and the assigned terminals.
FE
(I) Pin 54 (Am386DXLV), 28 (Am386SXLV) If a low-level signal is applied to the float pin, the Am386DXLV/SXLV floats all its bidirectional
and output terminals.
Pin 58 (Am386DXLV),
29 (Am386SXLV)
If an active low-level signal is supplied to this I/O instruction break input, the Am386DXLV/ SXLV suspends instruction execution until it receives an active signal at its READY input. Upon this it executes the next instruction. This function is mostly used in connection with system management mode. The Am386DXLV/SXLV can give a device accessed through an I/O port and currently in a power-down mode (such as a interface) the opportunity to reactivate itself. This reactivation process may take many more clock cycles than a usual I/O access with wait states.
i386 Processor Denvatwes and Clones
197
: The &X)> Cyrix 486 model has, like an i386SX, only a 16-bit data bus and a 24-bit address bus. That is very small for a 486 chip; thus, the 486SLC is, despite the ~~486,,, more suited to notebooks. -
The burst mode for quickly filling the on-chip cache is not supported by the Cyrix 486DLC/ SLC; therefore, cache line fills are significantly slower.
A further disadvantage of the 486DLC/SLC, namely the lack of inquiry cycles for the on-chip cache to establish cache consistency spanning several 486DLC/SLC CPUs, seems to be of a more theoretical nature. For such a multiprocessor system requiring considerable safeguards to be carried out by the operating system, an engineer would surely use the presently most advanced high-end processor, such as the Pentium or the PowerPC. All the overhead of distributing the tasks over several 486DLC/SLC CPUs can be saved when only one Pentium is installed (which runs two pipelines up to 99 MHz). The properties listed above, which argue against the term 486, are compensated for by some pure 486 characteristics, and even some improvements compared to the original Intel 486. These are: - All instructions are executed in a pipeline, as is the case for the ) ((ntel))
E M EBX ECX EDX
Olh
identification value” reserved (=O) reserved (=O) feature flags”
329
level)
‘I bits 31 .14 reserved (=OOh) bits 13 .12 type (OOb=pnmary
Pentlum startmg with 75 MHz, 01 b=Pentium OverDnve, lOb=dual (secondary)
Pentium
starting with 75 MHz, 1 lb=reserved) bits 1 I..8
processor family (05h for Penturn)
bits 7. 4
model (Olh for 60/66 MHz Pentum. 02h starting wth 75 MHz Pentium)
bits 3.. 0 rewon
(stepping)
*) bits 31..9: reserved (= OOh) bit 8:
CMPXCHGBB
bit 7.
machine check exception (l=implemented. O=not
bit 61
reserved
bit 5:
model-speciftc registers (l-according to Penttum
b i t 4.
(l=implemented. O=not
Implemented)
time stamp counter (l=implemented, O=not
Implemented) CPU, O=others)
Implemented)
bit 3:
reserved
bit 2:
IWO dreakpolnts (l=lmplemented, O=not Implemented)
bit 1:
reserved
i
on-chip FPU (l=yes, O=no)
bit 0:
Table 11.7: Call nnd return vahres
of CPUID
L Exception 18 - machine check exception: this exception indicates a hardware error of the “I’ Pentium itself. Details are given in Section 11.7.6.
.!I.4 The Pentium Bus ,.,? three pipelines of the Pentium achieve an instruction throughput which would completely Overload a simple memory bus. Thus, the f’entium bus is, without compromise, dedicated to a quick second-level cache. Frequently required instructions and data are held ready in the two on-chip caches. For the connection to the second level cache (LZcache), the Pentium’s data bus h been widened to 64 bits, so that the on-chip caches can be reloaded and written back with .kficient speed. \: Despite this, the Pentium can also address individual bytes, words, or double words through _$e byte enable signals BE7- BEO. Every bus cycle addresses memory through A31-A3 at quad Word limits, thus in multiples of 8 (0, 8, 16, etc.). Accesses to memory objects which span such {iboundary (so-called ~~isnligr~of nccc~~s) are split into two consecutive accesses by the Pentium. c 2:: $he Pentium tries to carry out memory accesses as cache line fills or write-backs as far as POSS.$‘le. Only accesses to the I/O address space and non-cachable areas in the memory address p
330
Chapter
iI
space (such as register addresses for memory-mapped I/O) are carried out as single trans.er cycles. The single transfer cycles described in Section 11.4.1 are more or less the exception. The burst mode, already known from the i486, has been extended to the writing of data (the ;186 allows only burst read accesses). Together with the improved address pipelining, the maximum data transfer rate has been increased to 528 million bytes (at 66 MHz). Unlike memory accesses, accesses to the l/O address area have not been widened to 64 bits. The maximum width here is only 32 bits as in the i386 or i486. The reason for this is that I/O accesses, in principle, do not pass through the on-chip data cache; instead, they are directlv ((switched through,, from a 32-bit register to the bus. Of course, the Pentium can also address B- or 16-bit ports. The I/O subsystem must decode the port address bits A2-A0 from the byte enable signals BE7- BEO.
11.4.1 Single Transfer Cycles The single transfer read and single transfer write cycles are the two simplest memory acces3 cycles of the Pentium. During their execution, data of 8, 16, 32 or 64 bits in size is transferred from memory to the Pentium, or from the Pentium to memory, respectively. Such Pentium bu5 cycles are not very different from those in the i386 or i486, despite the larger width of the dally bus. For a single transfer it holds CACHE on a high level to indicate that no line fill should by carried out. Single transfer write cycles in the Pentium are also not very different from those in the i386 or i486. The CACHE signal is again inactive. Additionally, the Pentium supplies write data anti the required parity bits. In single transfer mode (read and write), a data transfer without wait states requires at least tbx.0 CLK cycles. With a data bus width of 64 bits, this leads to a maximum data transfer rate (‘1 264 Mbytes/s. This can be doubled in pipelined burst mode to 528 million bytes/s or 504 Mbytes/ L1
11.4.2 Burst Cycles For the transfer of larger quantities of data, the Pentium implements a burst mode similar to that of the i486. Unlike the i486, the Pentium burst mode can also be used for write-back cycles. For the write-through cache of the i486 this is not necessary because each write access is switchcci through to the memory subsystem. But there are other differences, too. In principle, all cachal?ic’ read cycles and all write-back cycles are carried out in burst mode. With the extension of thL’ data bus to 64 bits, 32 bytes are transferred in a burst cycle with four bus cycles. They arl’ contiguous and are aligned to 32-byte boundaries (this corresponds to a cache line of the t\“l’ on-chip caches). A burst cycle is started, as in the i486, by a normal memory access which lasts for two clock cycles. Figure 11.16 shows the flow of the most important signals for a burst read cycle. For a burst read cycle, the CACHE signal from the Pentium and the KEN signal from t”’ memory subsystem also play an important role. The Pentium indicates to the subsystem, throu$ an active CACHE signal with low level, that it wants to transfer the addressed object into “”
Borrowing from the RISC World - The Superscalar Pentlum
331
~Figure 22.26: Burst read cycle without wait states and piprlinitrg. In bxrst mode, the bus cycle for a 32.byte address area is reduced from two processor clock cycles to one cycle, starting wifh the second access. Thus, a cache line of the internal caches can be filled very quickly. The cycle shown is a 2-1-2-2 burst.
!on-chip cache. If the KEN signal delivered by the memory subsystem is active, then the Pentium independently and automatically extends the single transfer to a cache line fill to store a com‘plete data block with the addressed object in the on-chip cache. ;As already mentioned, a burst cycle is limited to an address area which begins at a 32-byte $oundary. With the first output address, therefore, the three other burst addresses are also iirnplicitly known. The subsystem can independently calculate the other three burst addresses $mthout needing to decode further address signals from the Pentium. This is much faster than Ibefore, and makes it possible to reduce the bus cycle to one clock cycle. The Pentium only sends $he add ress and BEx signals during the first cycle; they are not changed during the subsequent .$hree bus cycles (the i486, however, addresses the data within the 16.byte group further via the _‘“address signals A3-A2 and BE3- BEO). :r_. :,,hI the Pentium the address sequence in burst mode is fixed dependent on the first address. This 7h necessary because the first address given out by the Pentium need not necessarily define a 32ibyte boundary; it can lie anywhere in the address space. In the course of the first transfer cycle, cm determines whether a single or burst transfer should be carried out. But by then the first rina d-word has already been transferred. The fixed address sequence shown in Table 11.8 is not clic, but has been chosen to support 2-way interleaving of DRAM memory. er the decoding of the start address, the external memory subsystem calculates the sequence the addresses given, and then follows the sequence when memory is addressed. The Pentium ves the data without sending out any more addresses, and then distributes the data delivby the subsystem according to the quad word entries in the applicable cache line. In the
3
3
2
Chapter
II _
B( -
First address output by the Pentium
Second address
Third address
last address
Tl bl
Oh 8h 10h 18h
8h Oh 18h 10h
10h 18h Oh 8h
18h 10h 8h Oh
cJ 4( rf T
Table 12.8: Address sequence for burst read cycles
’ i486, a burst cycle is explicitly initiated by a BLAST signal (which is not available in the Pentium) during the first cycle. KEN alone is not sufficient for a burst cycle. Moreover, a burst transfer for the i486 always starts at a 16-byte boundary. The write-back cache of the Pentium requires that burst mode can also be used for write transfers. Figure 11.17 shows the signal flow for such a write burst transfer.
CLK [ - A3l...A3- c BE7...BEO ADS [ CACHE [ w/F? c KEN [ BRDY [i
/
D63...DO [ ------------~---(WriteDitaXWrlteDataXWrlteDataXWrlteData)-DPT...DPO [------------‘;--
zylFg gUPZLEg2
0
8 7
IIIIIII Un,t Mask
IIIIIII Event Select
Counter Mask: counting behaviour INV: inversron of counting behawour o=no n 1 =yes enable (event select register 0 only) EN: O=counter disabled l=counter enabled INT: interrupt on counter overflow l=enabled O=dsabled PC: pin control O=srgnal upon count l=signal upon counter overflow E: edge trigger O=cksabled l=enabled OS: operating system flag O=counting for CPL=O. 1, 2. 3 l=counting for CPL=O only USR: user flag O=counting for CPL=O, 1. 2, 3 l=counting for CPL=O only Unit Mask: modification of counting behaviour Event Select: event code according to Table 14.3 Figure 14.13: Event select register (model-specific registers 186h nnd 187hJ
If OS is set the Pentium Pro counts an event only if CPL = 0, that is, the processor is running at the highest privilege level and executes system routines. The same applies to the USX bit: the Pentium Pro counts an event only if the processor is running at user level (CPL = 1, 2, 3) and carries out application programs or system-near routines. For some events you may define an ant mask which, for example, for events referring to caches, additionally provides the MESl state which is necessary to generate a counter pulse. Finally, the entry evem select defines the event which is to be counted. The event codes are listed in Table 14.3. You can start a counter by writing the control information into the corresponding event select register with WRMSR. Immediately afterwards, the initialized counter starts counting. You can stop the counter by clearing all bits in the event select register, or by clearing the EN bit in the event select register 0. Note that the first possibility stops the counters individually, whereas the second method stops both counters simultaneously.
14.4.5 Debug Support by Model-specific Registers Apart from the usual debug registers DR7-DRO (see Figure 2.10), the Pentium Pro also supports debugging with its model-specific registers. They allow backward observation of interrupts, exceptions and branches which, for example, may lead to task switches. This enables powerful debugging if some errors occur only in the context of task switches or operating system functions. The functions of the debug registers DRO-DR7 has already been described in Section 2.3.2, and the extensions of the Pentium which also hold for the Pentium Pro are listed in Secticn 11.3.1. Note that the Pentium Pro treats registers DR4 and DR5 as reserved registers and locks
Pure 32.bit Technology - The Pentium Pro
_
429
them against accesses if debug extensions are active (bit CR4.DE). Without active extensions these registers are mapped onto registers DR6 and DR7, that is, there is an aliasing, and the accesses to register pairs DR4/DR6 and DR5/DR7 have the same effects. The five new modelspecific registers to support the debug function are listed in Table 14.4.
Model-specific register
MSR
Debug control register Branch source Branch target Interrupt source Interrupt target
ld9h ldbh 1 dch 1 ddh 1 deh
L
Table 14.4: Model-specific registers to supporf the debug firnctim
With the debug control register (index ld9h) you can configure the functions of the modelspecific debug registers. Figure 14.14 shows the structure of this register.
.
TR:
Trace Special Cycle l=enabled O=disabled PB3..PBO: Performance Monitoring/Breakpoint Pins l=PBx pins indicate breakpoints O=PBx pins indvzate events Branch Trace Flag BTF: l=slngle-step execution for branches O=single-step execution for instructlons Last Branch/Interrupt/Exception LBR: l=recording O=no recording
If the TR bit is set, trace special cycles are enabled. This means that the Pentium Pro outputs source and target address for a branch, exception or interrupt onto the bus. External debug logics can use these addresses to keep track of program execution. Note that the entries in other model-specific debug registers (indices ldbh-ldeh) are invalid. - The four - bits PB3-PBO control the behaviour of the performance monitoring/breakpoint pins PBS- PBO. A set PBx bit configures the corresponding pin as a breakpoint pin. The Pentium Pro then provides a pulse at this terminal upon a breakpoint match. If PBx is cleared then the assigned terminal indicates performance monitoring events. If you set bit BTF, the Pentium Pro modifies the meaning of the T flag in the EFlag register. A set T bit stops program execution after every branch and issues an exception 1 (single-step execution). For a cleared BTF and a set T bit, program execution is stopped after every instruction. Finally, a set LBR bit determines that the Pentium Pro stores the source and .i addresses for all branches and exceptions/interrupts in the four corresponding model2. target g specific registers. The debugger can read these addresses with an RDMSR instruction and keep & track of branches and interrupt/exception. I :.; ‘.
Chapter 14
430
14.5 Reset and Power-on Configuration As with the Pentium, the Pentium Pro can also carry out an initialization upon INIT (pin 01) in addition to an ordinary reset through the RESET signal. The Pentium Pro returns to real mode. Moreover, the segment and offset registers are initialized in a well-known manner, and the TLBs as well as the BTB are invalidated. Unlike a reset, here the internal caches, write buffers, model-specific registers and floating-point registers are not reset, but retain their values. The initialization values for a reset and an init are shown in Table 14.5.
Initialization value ALU/integer registers EAX, EBX, ECX EDX EBP, ESP, ESI. EDI EIP CS DS, ES, FS, GS, SS EFlag CR0 CR2, CR3. CR4 IDTR, TR GDTR, IDTR DRO-DR3 DR6 DR? All others
OOOOOOOOh Identiflc." OOOOOOOOh OOOOfffOh f000h2' 0000h3' 00000002h 60000010h OOOOOOOOh OOOOOOOOhS' OOOOOOOOhS' OOOOOOOOh ffffoffoh 00000400h uuuuuuuuh
OOOOOOOOh Identific." OOOOOOOOh OOOOfffOh f000h2) 0000h3' 00000002h 41
FP registers Control word Status word Tag word HP. FEA, FOP FCS, FDS FSTACK Caches Cache TLBs BTB, SDC
0040h OOOOh 5555h OOOOOOOOh OOOOh OOOOOOOOOOh uuuuuuuuuuu uuuuuuuuuuu uuuuuuuuuu"
wwh wwh wwh wwwwh wwh wvwwvwh
OOOOOOOOh 00000000h" 00000000h6' OOOOOOOOh ffffoffoh 00000400h uuuuuuuuh
u undefined v unchanged ‘I CPU ldentkation 00000600h + model ‘) base address = ffffOOOOh, limit = ffffh (segment descriptor cache regtstet). access rights = 00000093h ” base address = 00000000h. limit = ffffh (segment descrtptor cache register), access rights = 00000093h 4, CD, CW unchanged. blt4 = 1, all others = 0 ” se!ector = OOOOh. base address = 00000000h. llmlt = ffffh. access rights = 00000082h 61 base address = 00000000h. llmlt = ffffh
431
P u r e 32-bit T e c h n o l o g y T h e Pentrum P r o
The more complete Pentium Pro reset is issued by an active signal at the RESET terminal. If the signal returns to a high level afterwards, then the Pentium Pro strobes various signals to determine certain configurations. Other configurations can be defined by the configuration registers. Table 14.6 gives a brief summary of the possible options. Option (configured by hardware)
Signal
Tristate test mode Built-rn self test BIST Detection of AERR Detection of BERR Detection of BINIT Depth of in-order queue (1 or 8) Reset vector after power-on Functional redundancy checking APIC cluster ID ID for bus arbitration Clock frequency ratio
FLUSH actrve Fii? actrve A8 A9 A10 A7 ” A62’ A5 -_ As, A l l ” BRO-BR3. A54’ LlNTl, L I N T O , IGNNE, A20M5’
Option (configured by software)
Standard
Data bus error checktng Parity for response srgnals RSZ..RSO Behaviour of AERR Behaviour of BERR for initiator bus error Behaviour of BERR for target bus error Behavrour of BERR for internal inrtiator error Behaviour of BINIT APIC-Modus
disabled disabled disabled disabled disabled disabled disabled enabled
” E=O: 1. iii=l: 8 ” %O=OffffOh. % =l=OfffffffOh ') OOb: AP,C 0~3; Olb APIC iD=2. lob. APIC iD=l, llb: APIC ID=0 "Ol,,, ,D=O, 1,100: ID=,.
11011 ID=2,
10111 ID=3.
10001'
ID=O, 11100' ID=O. 11010: ID=2, 10110: ID=2
') CPU kernel/BCLK: OOOOb=2. OOOlb=4, OOlOb=3. OlOOb=5/2. OllOb=7/2. llllb=2. else=reseWed
14.6 The Pentium Pro Bus By integrating the L2-cache into the Pentium Pro package and supporting up to four Pentium Pros operating in parallel, the bus has changed significantly compared to that of the Pentium or its predecessors. The most important change is the implementation of an arbitration scheme which seems to be more a peripheral bus (such as PCI) than a CPU bus. Note also that the Pentium Pro has two more independent buses, namely for the boundary scan test and the APIC bus.
14.6.1 Bus Phases Because the various processes on the Pentium Pro bus are much more decoupled than in its .I. Predecessors (the reasons being the support of multiprocessing and the dynamic execution of c-‘
432
Chapter 14
the u-ops) these processes are now called transactions. A transaction comprises all processes (,n the bus including the corresponding signals which refer to a single bus request (thus, for example, data read, a certain special cycle, etc.). Every bus transaction can have up to six phases; you may find a similar division of one bus cycle into several phases, for example, in the case of the SCSI bus. Figure 14.15 shows the time-like behaviour of the various phases.
Figure 14.15: The time-like behauiow of the various bus phases Arbitration If an agent wants to carry out a transaction then it must first get control of the bus. This is carried out with an arbitration phase. If -~ the agent already has control, this phase is neglected. The control signals for this phase are BREQS-BREQO (according to pins BR3-BRO), BPRI, ~ BNR and LOCK. Request This phase comprises two BCLK cycles. The processor outputs all necessary request and address - signals which characterize the transaction. ADS, m- REQO, A35- A3, API- APO, ATTR7- _ _ ATTRO, DID7 - DIDO, BE7 - BEO, EXF4- EXFO and I@ are the signals of this phase. The activation of ADS defines the beginning of a request phase. Error The error phase starts three BCLK cycles after the beginning of the request phase and indicates parity errors in the request phase. If an error occurs (signal AERR active) then the present transaction is aborted. After this error phase the Pentium Pro can also indicate errors bv means of other signals: BINIT, BERR, IERR and FRCERR. They do not refer to the error phase. Snoop The snoop phase indicates the result of snoop cycles in the system caches (hit to cache lines/ modified cache lines) and starts four clock cycles after the beginning of the request phase. The _snoop signals are HIT, HITM and DEFER. Response This phase indicates the result of the transaction, that is, whether it was successful, or whether it had been terminated with an error. Valid responses are: normal data, implicit write-back, 1~
Pure 32.bit Technology - The Pentlum Pro
433
data, hardware error, deferred and repeat. Note that the occurrence of response and data phases depends on the speed of the accessed subsystem. There may be several wait cycles; this is indicated in Figure 14.15 by cd+x~). The control signals are RS2-RSO, RSP and TRDY.
,
Data
.
The data phase may already overlap with the response phase, that is, response and data phase coincide. in this phase, the data of a transaction is transferred. But it is not necessary for the initiator of a transaction to request the data phase; this may also be done by another agent which observes the transaction (for example, in the case of a hit to a modified cache line in another cache). I/O accesses always occur with a maximum width of 32 bits, that is, BE7-BE4 are, in the request phase, always equal on a low level. This phase comprises the signals m, i%%, DEp7 - m and (of course) D63 - DO. Not every transaction need have all phases (a transaction without data transfer, for example, does not require a data phase). Moreover, various phases of various transactions may overlap (for example in Figure 14.15 the error phase of the first transaction overlaps with the request phase of the second transaction). Figure 14.15 clearly shows that the Pentium Pro performs bus pipelining very intensively. The bus interface is sufficiently intelligent to handle such interleavings. You can configure the Pentium Pro so that it handles a maximum of either one or eight pending (that is, not yet completed) transactions (which have been issued by other bus agents and not necessarily by the Pentium Pro itself). The number of its own pending transactions is limited to four; more would be senseless because of the instruction pipeline stalls associated with pending bus transactions. You can see that the Pentium Pro performs not only a very deep instruction pipelining but also a complex transaction pipelining. All bus transactions are strictly observed in all bus agents by means of an in-order queue. Every agent has its own queue holding the same information. Only in this way is it possible for all agents to know, for example, that the data put onto the bus in cycle 13 +x of Figure 14.15 belongs to transaction 1. At first glance, the Pentium Pro bus seems to be quite slow (between arbitration and response phases there are 11 bus cycles). But remember that the L2-cache is already integrated into the same package and runs up to 200 MHz so that the external bus usually only accesses the DRAM main memory or even slower l/O devices, and the external bus is still clocked by an impressive frequency of 66 MHz. Additionally, a data phase may consist of a burst with four data transmissions (a total of 64 bytes) which are successive. For the most widely used one-processor configuration (which is still the standard of today’s PCs), the arbitration phase is obsolete and never executed; all transactions start immediately with the request phase. This saves three more BCLK cycles.
14.6.2 Bus Arbitration The Pentium Pro arbitration protocol implements two classes of agents: symmetrical and priority agents. S~prrr~~~ricnl nytwts have identical priorities; priority qmts (as the name indicates) have ‘ Priority iver symmetrical agents in all arbitrations. The &ntium Pro bus supports a maximum
434
Chapter 14
of five agents; four of them are symmetrical agents which perform arbitration through the signals BREQ3-BREQO and one is a priority agent which is assigned the signal BPRI. In a multiprocessor configuration a maximum of four Pentium Pros are supported; they form the symmetrical agents. Typically, there is also a priority agent which is formed by the memory subsystem or an I/O agent. The symmetrical agents are assigned a unique identification between 0 and 3; they hold a 2-bit ID. After a reset, the priority order is 0+1+2+3+0+1+. , that is, the symmetrical agent 0 may access the bus first. All symmetrical agents that agent which has owned the bus least. For that purpose a 2-bit RID (rotating ID) is implemented which points to the agent with the presently lowest priority. This means that with the next arbitration the agent which wins is the next one after the last bus owner in the priority order - if not, the priority agent also requests the bus. Every agent may perform only one transaction per request. Afterwards, it must return the ownership of the bus so that other agents can win, too. The physical interconnection of the request signals and the request terminals is shown in Figure 14.16.
b
BREO2 BREQS
I 1
I
Note the difference between the bus request terminals BR3-BRO and the bus request signals BREQ3- BREQO. The bus request termids BR3- BRO are interconnected in a cyclic manner, for example, BRO is connected to BR3-BR2- BRl, BRI is connected to BRO-BRS-BR2, etc. Fronl all of these terminals only BRO is implemented as an output pin; the remaining three mBRI receive the bus request signals BREQ3- BREQO. The BRO terminal of agent 0 outputs the bus request signal BREQO, the BRO terminal of agent 1 the bus request signal BREQI, and so on. The bus request signal BPRI of the priority agent acts directly on terminals BPRI of the four
Pure 32-bit Technology - The Pentlum Pro
435
symmetrical agents (Pentium Pro CPUs). In the course of a reset, the system controller activates signal BREQO so that agent 0 boots the system and initializes the other agents for multiprocessing. Figure 14.17 shows an example of a typical arbitration process.
11
I2
13
14
15
6
17
,8
19
110
111
112
113
114
115
116
: : : : BCLK > : : : : : : : : : : : : : : : : : : : : : : : : : BREQO : : : : ; ; : : I I : : : :
__ BRE,,J
:
: :
,
: :
: :
:
:
: :
:
j
;
: :
: ;
: ::
:
: : :
: : :
:
: : :
:
: ::
: : : :
:
:
:
: :
: : : : : : : : : : : ; ; ; ; ; ; I : : : ; : : : : ,A,DB ; :;; : +0+ : -_;2+ -+2-k i3i ; Transactlo” . . . . . . . . . . . .: . . . . . . . . . . { : 1 i ) . . ..I ( :
BPRI
I
:
:
: : : :
:
:
: :: :
.,
I
;
; ;
;
j
Figure 14.17: Typical arbitration process
First (tl), all request signals BREQO- BREQ3 and BPRI are inactive, that is, no agent is requesting the bus. The rotating ID has the value 1, that is, the priority order is 2 + 3 --f 0 --3 1. In t2, agents 2 and 3 activate their request signals. Because of the priority order, agent 2 wins and gets the control of the bus; beginning with t4 it carries out a transaction (-2-j. In t4, agent 2 disables its request signal BREQ2 to release control of the bus; the rotating ID now has the value 2. Agent 3 keeps its request signal active and is granted ownership of the bus. In t7, agent 3 starts its transaction and disables its request signal simultaneously. The rotating ID has a value of 3 and the priority order is 0 + 1 -+ 2 + 3. In t5, agent 2 again activates its request signal BREQ2 to carry out another transaction. But this request is ignored until agent 3 disables its request signal BREQ3 in t7 and a new arbitration cycle starts. Because agent 0 also requests the bus in t7, agent 2 has to wait first because agent 0 has a higher prority at that moment. The value of the rotating ID is changed to 0, and the priority order becomes 1 + 2 4 3 + 0. After agent 0 has disabled its request signal in t10, agent 2 can get again control of the bus because agent 1 does not output any request. BPRI is inactive all the time here. An activation would grant the ownership of the bus immediately to the priority agent as soon as the current bus owner disabled its request signal. Note that the transfer of bus control to the priority agent does not affect the value of the rotating 1D and thus the priority order of the symmetrical agents. They only have to wait until the priority agent hands over control. The arbitration protocol of the Pentium Pro also implements so-called bus pwkitls. This means that an agent which had control of the bus least often, need not perform a bus arbitration again when it wants to carry out a bus transaction and no other agent requests the bus. Therefore, unnecessary arbitrations and delays are avoided. Additionally, no special handling of a oneprocessor configuration is necessary. The single Pentium Pro takes over control of the bus immediately after the reset by means of one single arbitration cycle and afterwards always accesses ’ the bus directly.
I
436
Chapter 14
There are two more important signals which affect arbitration: LOCK and BNR. An actice LOCK signal indicates that the Pentium Pro performs an atomic access and must not be interrupted, although the process may be combined with several transactions (for example, the update of page table entries which issue a read-modify-write process). The block next request signal BNR indicates that at least one agent is present in the system which cannot accept or handle another transaction at the moment (for example, the transaction queue of this agent is full).
F
F
c E 1
1
14.6.3 Deferred Transactions The Pentium Pro bus implements so-called . This allows, for example, slow peripherals to perform a transaction in a deferred manner without slowing down later transactions. For example, accesses to a main memory with ED0 RAM (access time 20 ns) or even an L3-cache (access time 12-15 ns) are much faster than an l/O cycle (delays up to 200 ns). If all transactions have to be completed exactly in the issued order (according to the in-order queue) such a slow transaction could slow down the execution of the whole program. Deferring allows the rearrangement of transactions according to the velocity of the accesses’ subsystems. Transactions issued later (for example, upon the transfer of write data) are completed before the previously issued deferred transaction has terminated (for example, by transferring the value from an I/O status register). The decisive signals for deferred transactions are DEN (Defer Enable), DEFER and DID7DID0 (Defer ID). If the agent which issues the deferred transaction (for example, the Pentium Pro) indicates that a transaction may be completed in a deferred manner, then it outputs a low-level DEN signal. If the transaction is actually completed in a deferred manner, the addressed agent must activate the DEFER signal in the snoop phase and return the status deferred in the following response phase. This removes the transaction concerned from the in-order queue. Because the transaction cannot be identified according to its place in this queue, the eight deferred ID signals DID7 - l?%% now identify the transaction, that is, issuing and addressed agent > of the write data. As the signal from the data input buffer is stronger than that from the memory cell concerned, the amplification of the write data gains the upper hand. The potential on the bit line pair of the selected memory cell reflects the value of the write data. All other sense amplifiers amplify the data held in the memory cells so that after a short time potentials are present on all bit line pairs that correspond to the unchanged data and the new write data, respectively. These potentials are fetched as corresponding charges into the storage capacitors. Afterwards, the DRAM controller deactivates the row decoder, the column decoder and the data input buffer. The capacitors of the memory cells are disconnected from the bit lines and the write process is completed. As was the case for the data read, the precharge circuit sets the bit line pairs to a potential level Vcc/2 again, and the DRAM is ready for another memory cycle. Besides the memory cell with one access transistor and one storage capacitor, there are other cell types with several transistors or capacitors. The structure of such cells is much more complicated, of course, and the integration of its elements gets more difficult because of their higher number. Such memory types are therefore mainly used for specific applications, for example, a so-called dual-port RAM where the memory cells have a transistor for reading and another transistor for writing data so that data can be read and written simultaneously. This is advantageous, for example, for video memories because the CPU can write data into the video RAbf to set up an image without the need to wait for a release of the memory. On the other hand, the graphics hardware may continuously read out the memory to drive the monitor. For this purpose, VRAM chips have a parallel random access port used by the CPU for writing data into the video memory and, further, a very fast serial output port that clocks out a plurality of bits, for example a whole memory row. The monitor driver circuit can thus be supplied very quickly
Memory Chips
487
and continuously with image data. The CRT controller need not address the video memory periodically to read every image byte, and the CPU need not wait for a horizontal or vertical retrace until it is allowed to read or write video data. Instead of the precharge circuit, other methods can also be employed. For example, it is possible to install a dummy cell for every column in the memory cell array that holds only half of that charge which corresponds to a . Practically, this cell holds the value ((l/2,>. The sense amplifiers then compare the potential read from the addressed memory cell with the potential of the dummy cell. The effect is similar to that of the precharge circuit. Also, here a difference and no absolute value is amplified. It is not necessary to structure the memory cell array in a square form with an equal number of rows and columns and to use a symmetrical design with 2048 rows and 2048 columns. The designers have complete freedom in this respect. Internally, 4 Mbyte chips often have 1024 rows and 4096 columns simply because the chip is longer than it is wide. In this case, one of the supplied row address bits is used as an additional (that is, 12th) column address bit internally. The ten row address bits select one of 2’” = 1024 rows, but the 12 column address bits select one of 2”= 4096 columns. In high-capacity memory chips the memory cell array is also often divided into two or more subarrays. In a 4 Mbyte chip eight subarrays with 512 rows and 1024 columns may be present, for example. One or more row address bits are then used as the subarray address; the remaining row and column address bits then only select a row or column within the selected subarray. The word and bit lines thus get shorter and the signals become stronger. But as a disadvantage, the number of sense amplifiers and I/O gates increases. Such methods are usual, particularly in the new highly-integrated DRAMS, because with the cells always getting smaller and smaller and therefore the capacitors of less capacity, the long bit lines ) the signal before it can reach the sense amplifier. Which concept a manufacturer implements for the various chips cannot be recognized from the outside. Moreover, these concepts are often kept secret so that competitors don’t get an insight into their rivals’ technologies.
19.1.3 Semiconductor Layer Structure The following sections present the usual concepts for implementing DRAM memory cells. Integrated circuits are formed by layers of various materials on a single substrate. Figure 19.5 is a sectional view through such a layer structure of a simple DRAM memory cell with a plane capacitor. In the lower part of the figure, a circuit diagram of the memory cell is additionally illustrated. The actual memory cell is formed between the field oxide films on the left and right sides. The field oxides separate and isolate the individual memory cells. The gate and the two n-doped ,regions source and drain constitute the access transistor of the memory cell. The gate is separjated from the p-substrate by a so-called gate isolation or gate oxide film, and controls the conductivity of the channel between source and drain. The capacitor in its simplest configura!hon is formed bv an electrode which is grounded. The electrode is separated by a dielectric aolation film from the p-substrate in the same way as the gate, so that the charge storage takes f
488
Chapter 1 9
GND
W
BL
I Figure 29.5: A typical DRAM cell. The nccess trnnsistor of the DRAM cell genemlly consists of or, MOS transistor. The yote of the transistor sirn~rltnneously forms the word line, and the dmin is connected to the bit line. Charges that represent the stored information are held in the substrate in the regiotr below the electrode
place below the isolation layer in the substrate. To simplify the interconnection of the memoq cells as far as possible, the gate simultaneously forms a section of the word line and the drain is part of the bit line. if the word line W is selected by the row decoder, then the electric field below the gate that is part of the word line lowers the resistance value of the channel between source and drain. Capacitor charges may thus flow away through the source-channel-drain path to the bit line BL, which is connected to the n-drain. They generate a data signal on the bit line pair BL, K, which in turn is sensed and amplified by the sense amplifier. A problem arising in connection with the higher integration of the memory cells is that the si= of the capacitor, and thus its capacity, decreases. Therefore, fewer and fewer charges can be stored between electrode and substrate. The data signals during a data read become too weak to ensure reliable operation of the DRAM. With the latest 4 Mbit chip the engineers therefore went over to a three-dimensional memory cell structure. One of the concepts used is shown in Figure 19.6, namely the DRAM memory cell with trench capacitor. In this memory cell type the information charges are no longer stored simply between two plane capacitor electrodes, but the capacitor has been enlarged into the depth of the substrate. The facing area of the two capacitor electrodes thus becomes much larger than is possible with al’ ordinary plane capacitor. The memory cell can be miniaturized and the integration densit! enlarged without decreasing the amount of charge held in the storage capacitor. The read-out signals are strong enough and the DRAM chip also operates very reliably at higher integratror’ densities. Unfortunately, the technical problems of manufacturing such tiny trenches are enormous. We must handle trench widths of about 1 urn at a depth of 3-4 pm here. For manufacturing s LlCll small trenches completely new etching techniques had to be developed which are anisotroPic, and therefore etch more in depth than in width, It was two years before this technology \“a’
I
489
Memory Chips
GND
w
BL
/ Charge Storage Area
BL
!Figurc 19.6: Trench capacitor for htghest integration densities. To enhance the electrode area of the storage ‘cqmtor, the capacitor is built into the depth of the substrate. Thus the memory cells can moue closer together I.’ :_rotthout decreasing the stored charge per cell. ‘reliably available. Also, doping the source and drain regions as well as the dielectric layer between the two capacitor electrodes is very difficult. Thus it is not surprising that only a few big companies in the world with enormous financial resources are able to manufacture these memory chips. 5 i. GT o enhance the integration density of memory chips, other methods are also possible and C$applied, for example folded bit line structures, shared sense amplifiers, and stacked capacitors. #_*k of space prohibits an explanation of all these methods, but it is obvious that the memory pchips which appear to be so simple from the outside accommodate many high-tech elements s;?,‘and methods. Without them, projects such as the 64 Mbit chip could not be realized. k:
&9.1.4 DRAM Refresh g: m Figure 19.5 you already know that the data is stored in the form of electrical charges in capacitor. As is true for all technical equipment, this capacitor is not perfect, that is, it rges over the course of time via the access transistor and its dielectric layer. Thus the charges and therefore also the data held get lost. The capacitor must be recharged periy. Remember that during the course of a memory read or write a refresh of the memory ells within the addressed row is automatically carried out. Normal DRAM S must be refreshed 1-16 ms, depending upon the type. Currently, three refresh methods are employed: RASefresh; CAS-before-RAS refresh and hidden refresh. Figure 19.7 shows the course of the involved during these refresh types.
490
Chapter 19
;
Refresh
Es [T cAs[
Cycle
I
;
Refresh
Cycle
I
\
j h
: Ia)
KS [“\-----/ Address [
-
R A S [-$
Read
Cycle I
; 1
Refresh
Cycle /
i J_
Figure 19.7: Three refresh types. (a) RAS-only refresh; (b) CA.5before-RAS refresh; (c) hidden refresh,
RAS-only Refresh The simplest and most used method for refreshing a memory cell is to carry out a dummy read cycle. For this cycle the RAS signal is activated and a row address (the so-called refresh nddressj is applied to the DRAM, but the CAS signal remains disabled. The DRAM thus internally reads one row onto the bit line pairs and amplifies the read data. But because of the disabled 66 signal they are not transferred to the I/O line pair and thus not to the data output buffer. To refresh the whole memory an external logic or the processor itself must supply all the row addresses in succession. This refresh type is called RAS-only refresh. The disadvantage of this outdated refresh method is that an external logic, or at least a program, is necessary to carry out the DRAM refresh. In the PC this is done by channel 0 of the 8237 DMA chip, which is periodically activated by counter 1 of the 8253/8254 timer chip and issues a dummy read cycle. In an RAS-only refresh, several refresh cycles can be executed successively if the CPU or refresh control drives the DRAM chip accordingly. CAS-before-RAS
Refresh
Most modern DRAM chips additionally implement one or more internal refresh modes. The most important is the so-called CA.5before-RAS refresh. For this purpose, the DRAM chip has its own refresh logic with an address counter. For a CAS-before-RAS refresh, CAS is held low for a certain time period before RAS also drops (thus CAS-before-RAS). The on-chip refresh (that
Memory Chips
491
is, the internal refresh logic) is thus activated, and the refresh logic carries out an automatic internal refresh. The refresh address is generated internally by the address counter and the refresh logic, and need not be supplied externally. After every CAS-before&AS refresh cycle, the internal address counter is incremented so that it indicates the new address to refresh. Thus it is sufficient if the memory controller ((bumps, the DRAM from time to time to issue a refresh cycle. With the CASbefore-RAS refresh, several refresh cycles can also be executed in succession. Hidden Refresh Another elegant option is the hidden refresh. Here the actual refresh cycle is more or less drive into a transparent peripheral. Example:
m
SCSI hard disk with 105Mbytes
storage capacity comprises physically six heads.
an outer group with 831 cylinders, and 35 sectors per track, a~ well as an inner group with 188 cylinders and 28 sectors per track. But to the BIOS and the ueer the drive appears as a hard disk with 12 heads, 1005 cylinders and 17
sectors per
track.
But the BIOS extension is not only effective when installing a new hard disk drive. It is a very general and therefore also a very powerful concept. For example, the BIOS extension on EGA and (S)VGA adapters replaces the INT 10h of the standard BIOS to make the specialized and powerful functions of these graphics adapters available to the system. The BIOS on the motherboard recognizes only the simple and, for programmers, not very exciting functions of the monochrome and CGA adapters. On the other hand, EGA and @VGA boards support many functions on a hardware base (for example, scrolling of the screen contents or zooming of several sections). With older adapters these functions had to be programmed by software, and were thus quite slow. Very new graphics adapters with a graphics processor or a video RAM of more than 128 kbytes cannot be accessed conventionally. They need their own BIOS, as the : interface to the PC system bus is incompatible with conventional adapters. The open and pow1 erful concept of BIOS extensions enables the integration of such graphics adapters into a PC f without any problems. As long as you remain on a BIOS level, no difficulties arise; the BIOS i extension handles all the problems. That is, of course, no longer the case if you program the ’ adapter directly using the registers. For graphics adapters, de facto standards are not yet established to the same extent as is the case for floppy and hard disk controllers concerning addresses and the meanings of the registers. BIOS extensions for hard disks usually start at address c800:0000, those for graphics adapters at cOOO:OOOO. In most cases, you can alter the start address of ROMs extensions by means of a jumper on the adapter board. This is useful if two ROMs disturb each other. Meanwhile, the drivers are often provided on disk, and are no longer integrated as a ROM extension. For DOS, they must be integrated as DEVICE=xx or as a TSR Drogram. t is not impossible that certain BIOS variants are more powerful than others. Particularly with rew technical developments, it takes some time for a standard to become established. For example, all 8086/88 machines don’t recognize an interrupt lSth/function 89h for switching the >rocessor into protected mode, as the 8086/88 CPU is not capable of this. If you attempt to letect by software whether a BIOS supports certain functions, this can get quite laborious and nor-prone. Depending upon the quality of the BIOS, it returns an error message, the registers Ire not changed, or the registers are filled with unpredictable values.
872
Chapter 31
As already mentioned, compatibility on a register level among various interface and drive specifications is very rare. But within a certain type the controllers and floppy/hard disk drives fortunately get along with one another, in most cases. You may, for example, connect a no-name controller with an ST412 interface for RLL drives to an NEC RLL hard disk with an ST412 interface without problems, and ‘access it by means of the AT system BIOS.
31.3.3 Translating and Zone Recording You may know that the circumference of a circle rises with increasing radius. Applied to hard disks, this means that the length of a track gets larger the lower the cylinder number is, because the outermost cylinder with the largest diameter is assigned the number 0 and the innermost cylinder with the smallest diameter has the highest possible number. But now the number of bytes per track is dependent upon the length of the track, that is, the circumference of the cylinder, and thus the cylinder number. It seems absurd to fill the large outer cylinders only with the same number of sectors as the smaller inner ones: the outer sectors would be elongated by this. To use the storage capacity of a disk best it would be more favourable, therefore, to raise the number of sectors per cylinder with increasing cylinder diameter. However, this gives rise to the fact that the fixed number of sectors per track must be given up. Accordingly, the BIOS functions become more complicated as you must additionally know for every hard disk how many sectors each cylinder accommodates. Thus, simpler unintelligent drives without their own controller are content with a fixed number of sectors per track throughout the whole disk. High-quality and intelligent drives such as SCSI and AT bus hard disks, on the other hand, carry out such a conversion. These drives have their own controller which is fixed to the drive and is no longer inserted as an additional board into a bus slot. The connection to the host is established by a so-called host adapter, which is often but erroneously called a controller. The controller now knows which track accommodates how many sectors; but to the user the disk appears on a BIOS level homogenously with a fixed number of sectors per track. This means that the controller converts the indicated values for head, cylinder, sector of a BIOS call into the actual values for head, cylinder, sector on the hard disk without any external intervention. This process is completely hidden from the user. Usually you can’t affect this translation even on a register level, but some IDE hard disk drives allow a variable translation. In practice, a hard disk is usually divided into two or more zones which each have a fixed number of sectors per track. Thus the number of sectors per track changes only from zone t@ zone, and not from track to track. This is called zone recording. Because of the extensive translation, the logical format used by the BIOS has virtually nothing to do with the actual physical format. But this seems to be more complicated than it really is. A controller separated from the drive, as was usual a few years ago, must of course be able to control many different drives. But a controller that is integrated into the drive’s case can best be adapted to this single hard disk on which it is mounted. The rapidly falling prices in the field of microelectronics now mean that every hard disk has its own controller without raising the costs too much.
Hard Disk Drives
a73
But the BIOS (and also you as the programmer) want to know, of course, with which geometrical format you can now access the hard disk. For this purpose, function 08h determine drive parameters of interrupt INT 13h is also implemented. The function returns the geometric parameters that you can use for accessing the hard disk. The function call is carried out by the BIOS on the host adapter of the intelligent hard disk which has intercepted the already existing interrupt 13h at power-up. When you look at the hard disk functions of INT 13h in Appendix F, you can see that the cylinder number has only 10 bits. Only 1024 cylinders can thus be represented, while powerful ESDI or SCSI drives accommodate far more cylinders. The ESDI interface is designed for up to 4096 cylinders as standard. With translation, the BIOS can be altered in such a way that 4096 cylinders can actually be accessed. For this purpose, translating increases the virtual head number until the cylinder number has been decreased to 1024 or less. The increase of the head number together with the decrease of the cylinder number keeps the drive capacity constant, and makes it possible for the physical cylinders beyond 1024 to be accessed via the BIOS in the same format as before. Another note on translation and the formatting of hard disks. During the course of a low-level formatting the controller generates sectors and tracks on the data carrier. Depending upon the translation and the boundary between the areas of different sector numbers per track, different formatting parameters must be passed to the controller. The conversion of BIOS sectors and the physical sectors on disk within the controller does not depend on any rule from the hard disk used: there is no standard which determines how many sectors per track must be generated up to which track, etc. If you now try to format such a hard disk by means of direct register commands, then you will usually cause chaos, but nothing more. SCSI and IDE hard disks therefore always come preformatted, and can be successfully reformatted on a low level by the user only in some rare cases. Even professional utilities such as DiskManager, Spinrite or Norton Utilities usually assume a fixed format. These very popular and useful utilities for conventional hard disks cannot be employed here. It’s best to refrain from such formatting attempts if you are not absolutely (really 100%) sure that you know everything about the translating of your hard disk, and you are also absolutely sure that your formatting routine is free of any bugs. Also, be somewhat sceptical when interpreting benchmark results concerning, for example, the access time of a hard disk whose controller carries out translation. In many head positionings the logical head only ccmoves,, one track but the physical one remains on the same track. On the other hand, it is also possible that the physical head is moved although the program accesses only a single logical track, for example to read a complete track. The access times and transfer rate determined may therefore differ from the actual properties of the investigated drive.
j 31.4 Access via DOS and the BIOS The variety of hard disk interfaces unavoidably gives rise to incompatibilities. Here the hierar-
\ chical access scheme DOS-BIOS-register comes to the rescue again, because you can access i logical sectors by the two DOS interrupts 25h and 26h, and physical sectors via the BIOS 1 interrupt 13h, without the need to take the interface actually used into account in detail. Of course, you achieve a higher execution speed with direct register programming, and you can
Chapter 3 1
a74
further use the features of the interface concerned which are not accessible with the BIOS. The BIOS standard functions orient to the first XT and AT models, and cannot, in the nature of things, cope with the much more advanced ESDI, IDE or SCSI interfaces. For programs that are to be executed on as many PCs as possible and have to be compatible (unfortunately reduced to a common denominator), the DOS interrupts 25h and 26h as well as the BIOS interrupt 13h are indispensable.
31.4.1 DOS Interrupts 25h and 26h With the DOS interrupts 25h and 26h (already presented in Chapter 7 in connection with floppy drives), you can read and write the logical sectors within the DOS partition. Thus you may access all data areas, including the boot sector and the FAT. The partition table as well as other partitions, however, remain unreachable. The calling procedure for the two interrupts in the case of sectors with numbers lower than 65 536 is the same as already discussed in this chapter. But with DOS 4.00 a new calling format was introduced to serve partitions comprising more than 65 536 sectors or 32 Mbytes. Figure 31.14 shows the two calling formats with and without the new parameter block.
Partitions up to 32 Mbytes or 65 536 Sectors INT 25h - Read One or More Logical Sectors
INT 26h - Write One or More Logical Sectors
~
1) OOh=Drive A:. 01 h=Drive B: etc.
2) see Table 31.7
Partitions Larger Than 32 Mbytes or 65 536 Sectors INT 25h - Read One or More Logical Sectors Resister 1 call
Value
1
Return
INT 26h -Write One or More Logical Sectors
Value ~
1) OOh=Drive A:, Olh=Drive B: etc. 3) Parameter Block First Sector (4 Bytes)
2) see Table 31.7
Number of Sectors (2 Byte: Address of Read/ (4 Byte: write Ruffl+r
Figure 31.14: IhT 25h and INT 26h calling formats.
I I
a75
Hard Disk Drives
When using interrupts 25h and 26h, note that they leave a status byte on the stack that you must remove with a POP instruction. If the intended sector cannot be read or written for some reason, then DOS sets the carry flag and returns an error code in register ax, indicating the cause of the failed read or write attempt. Table 31.7 lists the valid error codes. Code
Error
Olh 02h 04h 08h 10h : 2Oh / 40h 8 0 h
invalid command incorrect address mark sector not found DMA overflow CRC or ECC error controller error seek error drive not ready
i Table 31.7: n\lT 25h and INT 26h emv codes
i The following sections discuss only the extended function call via the new parameter block for accessing sectors with a number beyond 65 535. The conventional calling format is discussed in Chapter 30. According to an entry ffffh in the register cx, DOS 4.00 and above recognizes that the extended calling format is to be used. The calling program must pass the offset and segment of the parameter block in the registers bx and ds for this purpose. The parameter block defines the first sector, the number of sectors, and the address of the read or write buffer for these sectors. With the conventional format these quantities are passed by registers cx, dx, bx and ds. Now the use of the extended form is explained using an example. Example:
Read three sectors beginning with
sector 189063 from hard disk C: (language:
Microsoft c 5.10).
type
Dam_bloCk (
unsipned long atart_aector; unsigned num_of_sectors; char far 'buffer;
union
struck
REGS
inregs, outrege;
SREGS 8e~regs;
BtNCt far Dam_bloCk P_ block
:;
I' Construct &XXX-~&U b lock *I ~_block.atart_sector=16 9063; D_block.num_of_seCtots=
31
p_block.buffer=
(char far l ) _fmalloc(1536); I' buffer for three Bettors
/' call function
l /
l /
inraga.h.al=OxO2i
I* drive C: 'I
integs.x.cx.0xffff;
/'
inrega.x.bx.PP_OPF(p_bloCk);
/' transfer parameter block offset into bx 'I
oinregs,
routregs,
format 'I
I' transfer par-&XI. block sepment into ds 'I
seQreQe.ds=FP_SEG(D_bloCk); int86x(Ox25,
extended calling
h~egre~13);
I"
call i,IterI-Upt, sectors LIB read into l
/
.
876
Chapter 31
I' p_block.buffer if
((outreQs.x.cflag
printf ('\nError
L 0x01)
=.0x01) (
code: 'k", Outregs.x.ax); I'
exit(l); 1 .....................
"I
I* check whether carry is set display
error
code
'I
"I
I' abort with ERRORLEVEL 1 'I /' P~OCBBB read sectors l /
exit(O); 1
31.4.2 Hard Disk Functions Of BIOS Interrupt 13h This section briefly discusses the most important functions of INT 13h that you haven’t already met in connection with the floppy drives, or which differ significantly from that case. All functions are summarized in Appendix F. When the BIOS has completed the requested function it indicates (by means of the carry flag) whether the operation could be carried out successfully. If the carry flag is set (CF = 1) then an error has occurred and register ah contains the error code. You will find all possible error codes and their meaning in Appendix F.2. Note that not all codes are valid for both floppy and hard disk; because of their more elaborate intelligence, hard disk controllers can classify the errors in
more detail. Note also that for the hard disk functions the two most significant bits of the sector register cl represent the two most significant bits of a lo-bit cylinder number which is composed of the two cl bits mentioned and the eight bits of the cylinder register ch. Thus you may access a maximum of 1024 cylinders by means of the BIOS. You must never call any of functions listed below which refer to the formatting of a track or
whole drive if you use an embedded controller or a drive which carries out translation. By doing this you would only disturb the internal management of the tracks and the translation, or even the bad-sector mapping of these intelligent drives. I cannot predict the consequences for every drive, but take into consideration that a logical track as you can access it via the BIOS may possibly be only part of a physical track, or may be distributed over two tracks or even two cylinders. If you attempt to format such a partial track or a divided track, then this can only fail, of course. Function 05h - Format Track or Cylinder Unlike floppy drives, in the case of hard disks the interleave is also of importance. You may adjust the interleave factor using the format buffer. It successively contains the track, head, sector number and sector size entries in the same way as they are written into the address field of the sector concerned of an ST412/506 hard disk. Note that you can only format a complete track, not individual sectors. The BIOS passes the controller the entries in the format buffer for every sector to format. If you don’t count up the sector number by one from entry to entry, but arrange it in such a way that it corresponds to the intended interleave value, then you also achieve a corresponding sector shift (that is, interleaving) on the hard disk. Thus you may alter the interleave value for a track without any major problems, for example to determine the optimum interleave factor. Advice: if you own an already preformatted hard disk that carries out translation then you must never call this function.
Hard Disk Drives
a77
The drive may behave completely unpredictably, especially if you alter the interleave value. The number of sectors per track with which the drive appears to the system, and therefore to your program, has nothing to do with the drive’s actual geometry. If you attempt to format a certain (logical) track with function 05h you may refer to some unintended location on the disk or even cross a cylinder boundary. You can imagine that you would cause confusion. Example:
Pczmat.
track 0 of the first hard disk with interleave
3 (lanrmge:
Microsoft
C 5.10). type fomt_buffer
(
unsigned char cyl; unsigned char head; unsigned char Be∨ unsigned char byte_p_sector; 1 struct far format_buffer f_buffer[171; I' format buffer for 17 sectors 'I int sector; /* construct format buffer for (sector=0;
'/
sector < 17; sector*+) (
f_buffer[sector].cyl. 0;
/* cylinder 0 'I I' track 0 *I
f_buffertsectorl.head= 0; f_bufferlsector].byte_p_sector= ) I' Bet
interleave
0x02; /' 512 bytes 9er sector '/
3:l 'I
f_buffer[Ol.sector=O,
f_buffer[l].sector=6,
f_buffer[21.sector=12;
f_buffer[31.aector=l,
f_buffer[4l.sector.7,
f_buffer[5].sector=13;
f_buffer[6l.sector.2,
f_buffer[7].sector.8,
f_buffer[8].eector=14;
f_buffer[9l.aector=3,
f_buffer[lOl.aector.9,
f_buffer[l2l.sector=4,
f_buffer[l31.sector=lO,
f_buffer[l5].sector=5,
f_buffer[161.sector=ll;
f_buffer[ll].sector.lS; f_buffer[l4].sector=16;
inregs.h.al.17;
I' 17 mectors 9er track '/
inregs.h.ch=OxOO;
I* cylinder 0 *I
inrege.h.dh=OxOO;
/' Head 0 according to track 0 '/
inrags.h.dl=Ox80;
/' fust hard disk '/
sagrege.es=PP_SEG(f_buffer); for (sector.0;
sector
I* segment address of format buffer 'I
< 17; sector++)
(
/' function 05h l /
inregs.h.ah.Ox05: inraga.h.cl= sector; inregs.x.bx.PP_OPP(f_buffer);
I* offset address of format buffer l
int86x(Oxl3,
I' format Bettor by means of interrupt 13h '/
f-buffer++;
hinrege,
houtregs):
/' next entry in format buffer
/
*/
Function 06h - Format and Mark Bad Track
’ This function manages bad tracks within the frame of the bad-sector mapping if the track has . : L 1 j
more than one defect sector, and thus the whole track is unusable. The 06h function writes address marks onto the track whose flags indicate a defective track. The controller then skips this track for data recording and assigns an alternative track automatically. The function can be employed only for ST412/506 controllers and drives. Intelligent hard disks with an embedded controller and translation carry out the bad-sector mapping automatically. Unlike function 05h, you may pass the interleave value directly via register ah. You don’t need a format buffer; no rearrangement of the sector numbers in that buffer according to the interleave value is therefore required.
Chapter 31
878
Advice: never use this function if your drive has its own controller or carries out translation. The BIOS track length doesn’t coincide with the physical track length if translation is in force. Therefore, you would only mark part of the track as bad, or two partial tracks if a cylinder boundary is crossed. What your controller does with your drive in this case is unknown. Example:
Mark bad drive 80h, cylinder 951, head 2 with interleave 2.
inrege.h.al.2;
J' interleave 2 l /
inregs.h.ch=183;
I' 8 low-order cylinder bits 'I
inregs.h.dh=2;
,' head 2 'I
inregs.h.dl.Ox80; for
(sector=O;
,* first hard disk ',
sector specification. The 5 Mbits/s and 7.5 Mbits/s, respectively, also contain, besides the actual data bit, address marks, CRC and ECC bytes, as well as gap bits. With a sector length of about 575 bytes, MFM-encoded hard disks with an ST412/506 interface can thus accommodate 17 sectors and RLL-encoded hard disks 26 sectors per track. Although the ST412/506, strictly speaking, defines only the interface between drive and controller, but doesn’t make any assumption for the integration of controller and drive into the PC system, the following configuration has been established. The controller is located on a separate adapter card which is inserted into a bus slot; a maximum of two hard disk drives are connected to the controller by a control and data cable. The controller adapter card establishes a connection to the PC bus simultaneously; a host adapter is not required here. Unlike floppies, where the connection between drive and controller was effected by a single flat conductor cable, in the hard disk system the control and data signals run through separate cables. The wider cable with 34 wires is the control cable, and the narrow one with only 20 wires the data cable. Because some readers surely still have such an old disk somewhere, the cable signals are presented in brief in Figure 31.15. You can connect a maximum of two hard disks with the control cable. However, every hard disk requires its own data cable. Note also that for hard disks control cables which have twisted
Hard Disk Drives
881
Ground
Ground
Ground
Ground
1.3, . . ..33J-
Y
.m
n
e *
Figure 31.15: ST412/506 interface control and signal cob/e. The drives ore comected to the controller with a control cable with 34 wires, and a data cable tuifh 20 wires.
882
Chapter 31
wires between the plug in the middle and the end of the cable (similar to the cable for floppy drives) do exist. But, unlike floppies, wires 25-29 are twisted here. On the PC, the signals drive select 3 and drive select 4 are not used, so you may connect a maximum of two hard disks. Of all hard disk interfaces used on the PC, the ST412/506 is the least (Gntelligent,,. It is a pure signal interface, thus the controller is unable to pass any command to the drive. The drive itself accommodates only the control circuitry for stabilizing the disk rotation and the head positioning. All other control functions are carried out by the controller itself, for example, interpretation of the commands from the PC system, the encoding and decoding of the read and write data, the generation of address marks, etc. ST412/506 controllers and drives were used first in the XT, and later also in the AT. Because the XT BIOS was not designed as standard for the support of hard disks, all XT controllers must have their own BIOS with the hard disk functions of INT 13h. The start address of this BIOS extension is usually c8000h. The AT, on the other hand, supported hard disks from the first day, and the required routines are already implemented in the system BIOS at address f0000h. But there are other differences between XT and AT controllers with an ST412/506 interface: The XT controller uses DMA channel 3 for transferring data between sector buffer and main memory; in the AT, on the other hand, the BIOS carries out a programmed I/O by means of the port instructions IN and OUT without using any DMA channel. The XT controller employs IRQ5 for issuing a hardware interrupt; the AT controller IRQ14. The XT controller is accessed via the XT task file, the AT controller via the AT task file; the register assignment and addresses of these two task files are incompatible; drivers for XT hard disk controllers with an ST412/506 interface cannot be used for an AT controller. The commands for an XT controller always consist of a 6-byte command block to a single register; the AT controller, on the other hand, is programmed by means of single command bytes to several individual registers.
31.5.2 Connecting and Configuring ST412/506 Hard Disk Drives The connection of ST412/506 hard disks by means of the control cable to the controller is carried out in the same way as with floppy drives. You must connect the first hard disk to the end of the control cable, an eventual second hard disk to the connector in the middle. If you are using a control cable without twisted wires, then configure drive C: as drive select 0 and drive D: as drive select 1 by means of the corresponding jumpers on the drives. If you are using a control cable with twisted wires then you may configure both hard disks as drive select 1, as was Ihe case for floppy drives. Because of the exchange of the select signals, the intended disk is always enabled.
31.5.3 The ESDI Interface ESDI was conceived by Maxtor in 1983 as a powerful and intelligent successor to the ST412/506 interface. The main problem of the long transfer distances between hard disk and data separator was solved, in that ESDI already integrates the data separator on the drive.
I
Hard Disk Drives
883
ESDI is designed for a transfer rate of up to 24 Mbits/s between drive and controller; typically lo-15 Mbits/s are achieved. ESDI hard disks use the RLL method for data encoding. Furthermore, an ESDI controller is intended for connecting up to seven ESDI drives, and may access hard disks with a maximum of 64 heads in four groups of 16 heads each, as well as a maximum of 4096 cylinders. The controller of its predecessor interface (ST412/506), on the other hand, allowed a maximum of only 16 heads and 1024 cylinders. An ESDI controller may also pass complete commands which are decoded and executed by the drive. On the other hand, the generation of address marks, synchronization pattern and the decoding of the NRZ into parallel bit data for the PC system bus are carried out by the controller. Thus an ESDI controller is neither a pure controller which takes over all control functions, nor a host adapter which solely establishes a connection to the system bus; instead, it is something like an intermediate product between controller and host adapter. ESDI signals and ESDI commands will not be discussed here because the interface is already outdated. For the connection of ESDI hard disks, in principle the same rules as for an ST412/506 drive apply. First you must configure the drives, that is, adjust their ESDI address. Because of the different uses of the cable wires and the binary encoding of the drive address on the control cable, no cables with twisted wires are available for ESDI to free you from this drive configuration. With ESDI you always need to assign every drive an ESDI address. However, it is not significant here which plug of the control cable you connect with which ESDI drive.
31.6 Drives with IDE, AT Bus or ATA Interface Recently, a new hard disk interface standard was established for PCs which is overtaking the ST412/506 standard more and more: the so-called IDE or AT bus interface. IDE is the abbreviation for intelligent drive elecfronics - an indication that the connected drives are intelligent on their own. With the conventional controller-hard disk combination, the drive itself has only those electronic elements required to drive the motors and gates of the drive. The more extensive control for executing commands (for reading a sector, for example, a head seek, the reading of the encoded signals, the separation of data and clock signal, the transfer into main memory, etc. must be carried out) is taken over by the electronic equipment on a separate adapter, that is, the hard disk controller. Thus the drive itself is rather ustupid,. A further disadvantage of this solution is that the still encoded signals must run from the drive via the data cable to the controller to be decoded there. The transfer path worsens the signals; a high data transfer rate between drive and controller fails because of the relatively long signal paths. Further, the exploding market for hard disk drives gave rise to a nearly infinite variety of drive geometries and storage capacities, so that a separate controller (which possibly comes from a third-party manufacturer) is simply overtaxed to serve all hard disk formats. The falling prices for electronic equipment during the past few years, in parallel with a remarkable performance enhancement, gave a simple solution: modem and powerful hard disk drives already integrate the controller, and it is no longer formed by a separate adapter card. The signal paths from disk to controller are thus very short, and the controller can be adapted in an ; optimized way to the hard disk it actually controls. The IDE and SCSI interfaces follow this i method of integrating drive and controller into a single unit. But SCSI has another philosophy
884
Chapter 31
in other aspects; details concerning SCSI are discussed in the next chapter. ESDI, as a middle course, integrates the data separator on the drive but the rest of the controller (for example, the sector buffer and drive control) is still formed on a separate adapter. The IDE interface (discussed in thq following sections) lies, in view of its performance, between the conventional solution with a separate controller and an ST412/506 interface to the drive on the one side, and the SCSI and ESDI hard disks as high-end solutions on the other. AT the end of 1984, Compaq initiated the development of the IDE interface. Compaq was looking for an ST506 controller which could be directly mounted onto the drive and connected to the main system by means of simple circuitry. In common with hard disk manufacturers such as Western Digital, Imprimis and Seagate, the AT bus interface arose in a very short time. Too many cooks spoil the broth, and so in the beginning incompatibilities were present everywhere. To take remedial action several system, drive and software manufacturers founded an interest group called CAM (common access method), which elaborated a standard with the name ATA (AT attachment) in March 1989. Besides other properties, the command set for IDE drives was also defined. As well as the 8 commands with several subcommands already present on the AT controller, 19 new commands were added, which mainly refer to the drive control in view of low power consumption. For example, the sleep command for disabling the controller and switching off the drive if no access has been carried out for a while is one of these. Appendix H lists all the necessary and optional commands. Today, all manufacturers orient to this specification, so that incompatibilities are (nearly) a thing of the past. You may use the terms AT bus, IDE and ATA synonymously. An extension to the standard with a higher transfer rate, and also to drives with removable mediums (especially CD-ROM), is presently in preparation, and will be called Enhanced IDE; this seems to be a response to the triumph of SCSI. However, more flexibility and higher performance aren’t at all bad for IDE either.
31.6.1 The Physical CPU-Drive interface IDE is a further development of the AT controller with an ST506 interface so that the AT bus hard disks orient to the register set and the performance of such hard disks. Thus, IDE is a logical interface between system and hard disk, and accepts high-level commands (for example, read sector or format track). ESDI and ST412/506, on the other hand, are physical interfaces between controller and drive and refer, for example, to the control signals for the drive motors to move the head to a certain track. As with IDE the controller and hard disk form an inseparable unit, it is the job of every manufacturer to design the control of the drive and the transfer of the data. The definition of a physical interface is therefore obsolete. The physical connection between the AT bus in the PC and the IDE interface of the drives (or better, the controllers on the drives) is established by a so-called host adapter. The motherboard plays the role of host here. The host adapter accommodates only a few buffers and decoder circuits, which are required to connect the IDE drives and the AT system bus. Newer motherboards already integrate these host adapters, otherwise they need a separate adapter card which is inserted into a bus slot. Many host adapters further have a floppy controller so that they are often called an AT bus controller. That’s not correct as the controller is located immediately on the board of the drive; the adapter only establishes the connection between the drive and system
Hard Disk Drrves
885
bus. To the system and you as a programmer, the AT bus drives appear to be the usual controllers and drives with an ST412/506 interface which had been operating in your PC up to now. Thus AT bus drives can be accessed by the routines of INT 13h implemented in the conventional AT BIOS. Unlike ESDI or SCSI hard disk drives, no BIOS extension is required. For connecting the drives, only a single 40-wire flat conductor cable is used, with which you connect the host adapter and the drives. The IDE interface can serve a maximum of two drives, one of which must be the master, and the other the slave (adjust the jumper or DIP switch accordingly). The master drive is assigned address 0, the slave address 1. Table 31.10 lists the assignment of the 40 wires and the signals running on them. Pin 20 of the cable is locked to avoid a misinsertion of the plug. Most of the 40 IDE lines are grounded or can be directly connected to the AT system bus. This explains the name AT bus interface. Between host adapter and IDE drive there are only five signals, m, m, SPSYNC, DASP and PDIAG, control the IDE drives and are not connected to the AT -whichbus. The two first signals CSlFx and CS3Fx are chip select signals generated by the host adapter to select the register group with the base address 1fOh
signals are not implemented in many older IDE models manufactured before the ATA
: An optional but, nevertheless, important signal is IORDY. With a low level a drive can inform the CPU that it requires additional clock cycles for the current I/O cycle, for example, for reading the sector buffer or transferring the command code. The CPU then inserts wait states. But many IDE drives don’t use this signal, and always fix the corresponding line at a high level.
: For performance enhancement the IDE standard defines two more signals, which were not to be found on an ST506 controller in the original AT: DMARQ (DMA request) andDMACK (DMA acknowledge). In the AT, the data exchange between main memory and the controller’s sector buffer was not carried out via a DMA channel, as was the case on the PC/XT, but by means of the CPU; a so-called progrurnmed I/O @‘IO) is executed. If, for example, a sector is to be read, then the sector data read into the sector buffer is repeatedly transferred via the data ; register into a CPU register by an IN instruction, and from there into main memory by a MOV : instruction, until the sector buffer is empty. Thus the AT controller didn’t carry out a DMA : transfer, and therefore didn’t provide any DMA control signals. As with modem and powerful / DMA chips, the transfer rate between sector buffer and main memory is much higher (a factor i of two can readily be achieved) and the development of multitasking systems like OS/2 request 1 a relief from such ~<ssilly)~ data transfer operations, the two optional DMA control signals are
Chapter 31
886
IDE signal
AT signal
Signal direction
RESET DRV”
host-drive
SD7 SD8 SD6 SD9 SD5 SD10 SD4 SD1 1 SD3 SD12 SD2 SD13 SD1 SD14 SD0 SD15
bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional bidirectional
DRQx
drive-thost
write data via VO
IOW
host+drive
read data via l/O
loR
host-+drive
IOCHRDY _
drive-thost drive+drive host-tdrive
Pin
RESET
DD9 D D 5 DDlO DDI DDl2 DD2 DDI
DDl5
DIOW
DACKx
DMACK3’
IOCS16
i/O 16 bit transfer via
IRQx VOCSl6 SAl SAO SA2
DA0
drive-thost drive+host host-drive drive+drive host+drive
1fOh
_ ‘I inverted signal of AT bus signal l1 pm locked to prevent incorrect insertion of plug )) optional
Table 31.10: IDE interface cable layout
drive+host
’
I:
Hard Disk Drives
887
implemented in the new IDE standard. Some AT bus hard disks can be instructed by a software command or a jumper to use a DMA channel instead of PI0 for exchanging data between sector buffer and main memory. But as the programmer, you must then take into account the preparations for carrying out such a DMA transfer. The integration of the controllers on the drives makes it possible to integrate more intelligence into the hard disk control. To this belongs, for example, intelligent retries if an access has failed. It is especially important that many IDE drives carry out an automatic bud-sector remapping. Usually, you can mask defective sectors and cylinders during the course of a low-level formatting process via the defect list, and use error-free alternative sectors and tracks instead. But if, after such a low-level formatting, a sector or track is damaged, the mapping is no longer possible and the sector is lost for data recording. This becomes fiendish, especially in the case of sneaking damage. The controller then always needs more retries to access the sector concerned correctly. Using the in-built retry routine, the operating system seldom recognizes anything about this as the data is read or written correctly after several retries. But at some time the point is reached where even the retry routine is overtaxed, the sector is completely inaccessible, and all data is lost. Many IDE drives are much more clever: the controller reserves several sectors and tracks of the hard disk for later use during the course of bad-sector remapping. If the controller detects several failed accesses to a sector, but finally leads to a correct data access, then the data of the sector concerned is written into one of the reserved spare sectors and the bad sector is marked. Afterwards, the controller updates an internal table so that all future accesses to the damaged sector are diverted to the reserved one. The system, or you as its user, doesn’t recognize this procedure. The IDE drive carries out this remapping without any intervention, in the background. The emergence of battery-powered laptops and notebooks gave rise to the need for powersaving drives. In a computer, powerful hard disks are one of the most power-consuming components, as they require strong current pulses for fast head seeks, and unlike floppy drives the hard disks are continuously running. Most specialist drives for portable computers can be switched off or disabled by software commands to minimize power consumption. Also, for the IDE hard disks according to the ATA standard such commands are optionally implemented. In the order of decreasing power consumption such hard disks can be operated in the active, idle, standby and sleep modes. Of course, it takes the longest time to aawaken)> a drive from sleep into the active state. For this purpose the disk has to be accelerated from rest to the operation ‘pm, the head must be positioned, and the controller needs to be enabled.
31.6.2 Features of IDE Hard Disk Drives i
i iI c ’
k
Intelligent drives with an embedded controller, the most powerful among all IDE hard disks, carry out a translation from logical to physical geometry. The high recording density allows drives with up to 50 sectors per track in the outer zone with a large radius. IDE hard disks run virtually exclusively with an interleave of 1:l. To reduce the average access time of the drives, some hard disks are equipped with a cache memory which accommodates at least two tracks, in most cases. Even if your PC is unable to stand an interleave value of 1:l as the transfer via the slowly clocked AT bus is not fast enough, this is not a disaster. Because of the 1:l interleave,
888
Chapter 31
the data is read very quickly into the controller cache which is acting as a buffer. The CPU fetches the data from the cache with the maximum transfer speed of the AT bus. An interleave value which is adjusted too low, therefore, has no unfavourable consequences as it would do without the cache. For high-capacity IDE hard disks; the RLL encoding method is mainly used; simpler ones may also use the MFM method. High performance IDE drives enable data transfer rates between drive and main memory of up to 5 Mbytes/s; a value which comes near the top of the practical values of SCSI. On average, transfer rates of more than 3 Mbytes/s are realistic for usual IDE drives. Thus they are located between the older ST412/506 controllers and the high-end SCSI solutions. The simpler interface electronics of the IDE host adapter and the support of the AT bus drives by the AT’s on-board BIOS make it appear that the IDE hard disks are a rather good solution for personal computers in the region of medium performance. An IDE interface manages a maximum of two drives. As long as the connected drive meets the IDE interface specification, the internal structure of the drive is insignificant. For example, it is possible to connect a powerful optical drive by means of an IDE interface. Usually, one would select an SCSI solution as this is more flexible in a number of ways than the AT bus. One restriction of IDE is the maximum cable length of 18” (46 cm); some manufacturers also allow up to 24” (61 cm). For larger systems which occupy several cabinets, this is too little, but for a personal computer even in a large tower case it is sufficient. These values are part of the IDE standard. Thus, it is not impossible that the cables may be longer; but the IDE standard does not guarantee this.
31.6.3 The AT Task File The CPU accesses the controller of the IDE hard disk by means of several data and control registers, commonly called the AT task file. The address and assignment of these registers is identical to that of the hard disk controller with an ST506 interface in the IBM AT, but note that the registers are not compatible with the XT task file, or other interfaces such as ESDI or SCSI. The AT task file is divided into two register groups with port base addresses 1fOh and 3fOh. The following sections describe the registers of the AT task file and their meaning in more detail. Table 31.11 lists all the registers concerned. The data register, which is the only 16-bit register of the AT task file, can be read or written by the CPU to transfer data between main memory and the controller. The original AT interface supported only programmed input/output via registers and ports, but no data transfer by means of DMA. The reading and writing is carried out in units of 16 bits; only the ECC bytes during the course of a read-long command are passed byte by byte. In this case, you must use the low-order byte of the register. Note that the data in the data register is only valid if the DRC? bit in the status register is set. The CPU can only read the error register; it contains error information concerning the last active command if the ERR bit in the status register is set and the BSY bit in the status register is cleared; otherwise, the entries in the error register are not defined. Note that the meaning of this register differs for the diagnostics command. Figure 31.16 shows the structure of the error register.
Hard Disk Drwes
889
Register
Address [bit]
Width Write(W)
Read (R)
data register error register precompensatlon sector count sector number cylinder LSB cylinder MSB drive/head status register command register alternate status register digital output register drive address
1fOh lflh lflh lf2h lf3h 1 f4h lf5h 1 f6h lf7h lf7h 3f6h 3f6h 3f7h
16 8
R/W R
Table 32.11: The AT
a
W
8
R/W R/W
8 8 8 8
R/W
a
a
8 8 8
R/VI RhV R
W R W R
task file
BBK: UNC: NID: ABT:
1 =sector marked as bad by host 1 =uncorrectable data error l=ID mark not found command abort l=command aborted NTO: 1 =track 0 not found NDM: 1 =data address mark not found X: unused Enhanced IDE only: MC: l=medium changed MCR: l=medium change required
O=no error O=no or correctable data error O=no error O=command O=no error O=no error
executed
O=medium not changed O=no medium change required
Figure 31.16: Error register ilflh)
A set NDM bit indicates that the controller hasn’t found a data address mark on the data carrier. If NT0 is set this means that after a corresponding command the drive was unable to position the read/write head above track 0. If the controller had to abort execution of the active command because of an error, the ABT bit is set. If the NID bit is equal to 1, the controller was unable to detect the ID address mark concerned on the data carrier. A set UNC bit shows that an uncorrectable data error has occurred; the data is invalid even after applying the ECC code. If BBK is equal to 1 then the CPU has earlier marked the sector concerned as bad; it can no longer be accessed. For supporting drives with removable volumes, enhanced IDE implements the (formerly reserved) MC and MCR bits. A set MC bit indicates that the volume in the drive has been changed, thus it corresponds to the disk change bit of the floppies. A set MCR bit shows that
890
Chapter 31
the user has requested a medium change, for example, by operating the eject key. The system must complete all running accesses and send a pulse or command to the drive actually to eject the volume. The precompensation register (lflh) is only implemented for compatibility reasons with the AT task file of the original AT. All data passed by the CPU is ignored. The IDE hard disk drives with an embedded controller process the precompensation internally without any intervention by the CPU. The sector count register flf2hl can be read and written by the CPU to define the number of sectors to be read, written or verified. If you pass the register a value of 0, then the hard disk carries out the command concerned for 256 sectors, and not for 0 sectors. After every transfer of a sector from or into main memory, the register value is decreased by one. Thus the register’s contents, which can be read by an IN instruction, indicate the number of sectors still to be read, written or verified. Also, during the course of a formatting process, the controller decrements the register value. Note that the meaning of the register differs somewhat for the command set drive parameters. The sector number register flf3h) specifies the start sector for carrying out a command with disk access. After processing every sector the register contents are updated according to the executed command. Thus the register always indicates the last processed sector independently of whether the controller was able to complete the concerned command successfully or not. The two registers cylinder MSB (lf5h) and cylinder LSB (lf4h) contain the most-significant (MSB) and least-significant byte (LSB) of the IO-bit cylinder number. The two most-significant bits are held by the register cylinder MSB, the eight least-significant ones by the register cylinder LSB. The six high-order bits of register cylinder MSB are ignored, thus the registers are able to represent cylinder numbers between 0 and 1023, as is also the case for the original AT. Because many IDE hard disks carry out a translation, the physical cylinders of the hard disk are not limited to this range. The physical drive geometry is then converted into a logical one, which has a maximum cylinder number of 1023. After processing of each sector, the contents of both registers are updated, thus the registers always indicate the current cylinder number. Some IDE drives, and especially hard disks corresponding to the enhanced IDE standard, also use the six high-order bits in the MSB cylinder register lf5h. Therefore, a total of 65 535 cylinders can be addressed at the most. By means of the registers drive/head flf6h) you can determine the drive for which the command concerned is to be carried out. Furthermore, head defines the start head with which the disk access begins. Figure 31.17 shows the format of this register. The three most-significant bits always have value of 101b. The DRV bit defines the addressed drive, and the bits HD3-HDO specify the number of that head with which the command concerned starts to execute. A maximum of 16 heads can therefore be accessed. IDE drives which can carry out a logical block addressing (LBA), additionally implement the L bit. If L equals 1, LBA is enabled for the present access. The status register (lf7h) can only be read by the CPU, and contains status information concerning the last active command. The controller updates the status register after every command, or if an error occurs. Also, during the course of a data transfer between main memory and controller,
891
Hard Disk Drives
7
6
5
4
3
2
1
0
-1
drive 1 =slava &master HO,-HD,: head number (binary) OOOO=headO OOOl=head
DRV:
Enhanced IDE only: L: 1 =LBA mode
1
001 O=head 2
llil=headl5
O=CHS mode
Figure 31.17: Drive/head register (lf6h). the register is updated to carry out handshaking. If the CPU reads the status register an eventually pending interrupt request (via IRQ14 in the PC) is cancelled automatically. Note that all bits of this register except BSY and all registers of the AT task file are invalid if the BSY bit is set in the status register. Figure 31.18 shows the structure of the register.
7
6
5
4
3
2
1
0
Lggqg@Jq
BSY: RDY: WFT: SKC: DRQ: GORR: IDX: ERR
busy O=drive not busy l=drive is busy ready 1 =drive is ready O=drive not ready write fault O=no write fault 1 =write fault head positioning (seek) O=in progress l=complete data O=no data access possible 1 =can be transferred correctable data error O=not data error 1 =data error disk index &disk index did not pass l=disk index has just passed error 1 =error register contains error information O=error register does not contain error information
1
Figure 31.18: Status register CIf7hJ
The BSY bit is set by the drive to indicate that it is currently executing a command. If BSY is set then no registers may be accessed except the digital output register. In most cases you get any invalid information; under some circumstances you disturb the execution of the active command. A set RDY bit shows that the drive has reached the operation rpm value and is ready to accept commands. If the revolution variations of the spindle motor are beyond the tolerable range, for example because of an insufficient supply voltage, then the controller sets the RDY
Chapter 3 1
892
bit to 0. A set WFT bit indicates that the controller has detected a write fault. If the SKC bit is equal lo 1, then Ihe drive has completed the explicit or implicit head posilioning. The drive clears the SXC bit immediately before a head seek. A set DRQ bit shows that the data register is ready for outputting or accepting data. If DRQ is equal to 0 then you may neither read data from the data register nor write data into it. The controller sets the CORR bit to inform the CPU that it has corrected data by means of the ECC bytes. Note that this error condition doesn’t abort the reading of several sectors. Upon the passage of the track beginning below the read/write head of the drive, the controller sets the IDX bit for a short time. If the ERR bit is set, the error register contains additional error information. The command register (lf7h) passes command codes; the CPU is only able to write to it. The command register is located at the same port address as the read-only status register. The original AT has eight commands in total with several variations. The new IDE standard additionally defines some optional commands, but I want to restrict the discussion to the requested command set which is already implemented on the IBM AT. The execution of a command starts immediately after you have written the command byte into the command register. Thus you have to pass all other required data to the corresponding registers before you start the command execution by writing the command byte. Table 31.12 lists the requested IDE commands as well as the parameter registers that you must prepare for the corresponding commands.
Command calibrate drive read sector write sector verify sector format track seek head diagnostics set drive parameters SC: sector count
SC
SN
CY
DR
xx xx xx
xx xx xx
xx xx xx
xx xx xx xx
xx xx xx
XX
XX
xx
xx
xx
xx
xx
SN: sector number
HD
xx CY: cylinder MSB and LSB
DR: drive (in register drive/head) HD: head (in register drive/head) xx: parameter necessary for corresponding command
Table 31.12: Command parameter registers
Besides the status register under the port address, additionally an alternate status register is implemented at I/O address 3f6h. It has the same structure as the normal status register, and contains the same information. The only difference between them is that a read-out of the alternate status registers doesn’t cancel a pending interrupt request via IRQ14. Under the same port address 3f6h you also find the digital output register DOR; the CPU is only able to write to it. The DOR defines the controller’s behaviour; its structure is shown in Figure 31.19.
I
893
Hard Disk Drives
SRST: system reset l=reset all connected drives IEN: interrupt enable l&W14 always masked
O=accept command O=interrupt after every command
figure 31.19: Digital output register C3/6hJ.
\ B
If you set the SRST bit you issue a reset for all connected drives. The reset state remains active until the bit is equal to 1. Once you clear the SRST bit again, the reset drives can accept a command. With the IEN bit you control the interrupt requests of the drives to the CPU. If IEN is cleared (that is, equal to 0) then an interrupt is issued via IRQ14 after every command carried out for one sector, or in advance of entering the result phase. If you set IEN to 1 then IRQ14 is always masked and the drives are unable to issue an interrupt. In this case, the CPU may only supervise the controller by polling. With the read-only drive address register (3f7h) you may determine which drive and which head are currently active and selected. Figure 31.20 shows the structure of this register.
write gate O=write gate open 1 =write gate closed HS3-HSO: currently active head as l’complement DS1, DSO: currently selected drive
WTGT:
Figure 31.20: Drive address register (3f7h).
If the WTGT bit is cleared (that is, equal to O), the write gate of the controller is open and the read/ . write head is currently writing data onto disk. The bits HS3-HSO mdicate the currently active head as 1’ complement. Similarly, the bits m and DSO determine the currently selected drive.
31.6.4 IDE Interface Programming and Command Phases The programming and execution of the commands for an IDE interface proceed similar to a floppy controller or other hard disk interface in three phases:
i - Command phase: the CPU prepares the parameter registers and passes the command code : to start the execution. : - Data phase: for commands involving disk access, the drive positions the read/write heads /
and eventually transfers the data between main memory and hard disk.
894
-
Chapter 31
Result phase: the controller provides status information for the executed command in the corresponding registers, and issues a hardware interrupt via IRQ14 (corresponding to INT 76hl.
The controller’s command and register are written and read by the CPU via ports, but unlike the PC/XT, the IBM AT and all compatibles don’t use the DMA controller for transferring the sector and format data between main memory and controller. Instead, this data transfer is also carried out by programmed I/O via CPU and data register. This means that the CPU writes sector and format data into or reads them from the data register in units of 16 bits. Only the ECC bytes are read and written in S-bit portions via the low-order byte of the data register. To synchronize CPU and controller for a data exchange, the controller issues a hardware interrupt at various times via IRQ14: - Read sector: the controller always enables IRQ14 when the CPU is able to read a sector, eventually together with the ECC bytes, from the sector buffer. Unlike all other commands, this command doesn’t issue an interrupt at the beginning of the result phase, thus the number of hardware interrupts is the same as the number of read sectors. -
Write sector: the controller always activates IRQl4 when it expects sector data from the CPU. Note that the first sector is transferred immediately after issuing the command, and the controller doesn’t issue an interrupt for this purpose. Furthermore, the controller activates, via IRQl4, a hardware interrupt at the beginning of the result phase. Thus the number of hardware interrupts coincides with the number of written sectors.
-
All other commands: the controller issues a hardware interrupt via IRQ14 at the beginning of the result phase.
The interrupt handler for INT 76h corresponding to IRQ14 in the PC must therefore be able to determine whether the controller wants to output data, is expecting it or whether an interrupt has occurred which indicates the beginning of a result phase. If you intend to program such a handler, use the status and error register to determine the interrupt source. The IRQ14 controller is disabled as soon as the CPU reads the status register (lf7h). If IRQ14 remains active, you must read the status information via the alternate status register (3f6h). Note for your programming that the controller of the addressed drive starts command execution immediately after the CPU has written the command code into the command register. Thus you have to load all necessary parameter registers with the required values before you start command execution by passing the command code. Appendix H lists all requested controller commands for the IDE interface, and the three optional commands for identifying the controller as well as reading and writing the sector buffer. As an example one command is discussed here in more detail: write four sectors beginning with cylinder 167, head 3, sector 7 with ECC bytes. The format for this command is shown in Figure 31.21. If the L bit is set then the four ECC bytes are also supplied by the CPU and not generated internally by the controller. The ECC logic then doesn’t carry out an ECC check. For a single sector you therefore have to pass 516 bytes. If L is equal to 0 then this means a normal write command. The CPU only passes the 512 data bytes, and the controller generates the four ECC
Hard Disk Drives
a95
1 Bit AT Task File Register 7)6)5)4)3)2/1)0 Command (lffh) 0 0 1 1 0 0 L R Sector Count (lf2h) Number of Sectors to Write Sector Number (lf3h) ST Se SS Sq SS Sz S, So Cylinder LSB (lf4h) Cr Cs Cs Cq Cs Ca Cl Co Cylinder MSB (lf5h) 0 0 0 0 0 0 ca cs Drive/Head (lf6h) 1 0 1 DRV H4 H& HD, Hb long l=with ECC bytes O=without ECC bytes R: retry l=carry out retry procedure O=no retry procedure sector count: number of sectors to be written onto disk S+B,: sector number (start sector) cylinder number (start cylinder) C&0: drive DRV: 1 =dnve 1 O=drive 0 HD,-HD,: head lllO=head 1 4 _.. OOOO-head0 llll=headl5
Figure 31.21: Write sector command. bytes internally and writes them, together with the data bytes, onto disk. The R bit controls the internal retry logic of the controller. If R is set, then the controller carries out an in-built retry procedure if it detects a data or address error during the course of the command execution. Only if these retries are also unsuccessful does the controller abort the command and return an error code. If R is cleared, the controller aborts the command immediately without any retry if an error has occurred. With sector count you may determine the number of sectors to be written onto disk. Possible values are between 0 and 255; a value of 0 writes 256 sectors onto disk. The sector numbers S,-S, indicate the number of the start sector to be written first. If the number of sectors to write is larger than 1, the controller automatically counts up the sector number until it detects the end of the track. Afterwards, it proceeds with the next head, and eventually with the next cylinder, until all sectors have been written or an error occurs. The values C-C, of the cylinder number define the start cylinder for the write process. The two bits C9 and C, represent the two most significant bits of the IO-bit cylinder number. Using DRV you can select one of the two drives, and with HD,j-HD, the head of the drive for which the command is to be carried out. Immediately after the command byte has been written, the controller starts the command execution, that is, the data phase. It sets the BSY bit in the status register to indicate that it has decoded the command and prepared the sector buffer for accommodating the 512 data bytes, as well as the four ECC bytes. If this is finished, the controller clears the BSY bit and sets the DRQ bit in the status register to inform the CPU that it now expects the sector data. The CPU first transfers the 512 data bytes word by word, and afterwards the four ECC bytes byte by byte. If all 516 sector bytes have been passed the controller sets the BSY bit again and clears the DRQ bit. Now it begins to write the data onto disk.
Chapter 31
If the first sector has been written then the controller issues an interrupt 76h via IRQl4. The handler concerned now transfers the 516 bytes of the following sector data via the data registers to the controller in the same manner as described above. This process is repeated four times until all four sectors, together with their ECC bytes, have been written. Example:
Write four eectors starting with cylinder 167, head 3, sector 7 together with ECC bytes onto master drive (language: Microsoft C 5.10).
unsigned
int
unsigned
char
unsigned
int 'word_winte+;
word-buffer byte-buffer
110241: 1161;
unsigned char 'byte_pointer; int int_count; main0 ( int word_count,
byte_count;
l old_irqlrl;
void far
word_pointet=&word_buffer;
/*
byte_pointer=&byte_buffer:
I' pointer 'I
initialize
l /
init_buffersO;
I'
old_irq14._dos_getvect(Ox76);
/* set new interrupt *I
_dos_setvect
/' for IRQ14 'I
(0x76,
new_irqlrlO)t
initialize buffer
'I
while((inp(Oxlf7) h 0x80) ==Ox80);
I' wait until BSY in status register is cleared
outp(Oxlf2,
/* register sector count: 4 sectors
0X04)>
outp(0xlf3,
0x07);
I' register sector number: 7 '/
outp(Oxlf4,
Ozca7);
/*
outp(0xlf5,
0X00);
I' register cylinder MSB: 0
outp(Oxlf6,
Oxa3);
I'
register
outp(0xlf7,
0x33);
/*
register
register
command:
*I
opcode.001100,
L=l, R=l ‘/
/ wait
until
BSY
in
l
f
status register is cleared and DRQ is set for
(word_count=O;
initialize
pointer
I*
transfer
256
words=512
bytesointer=byte_buffer;
/' initialize pointer
for (byte-count= 0; byte_count, Ethernet lines, thus two Ethernet LANs joined together. There are two types of repeater: local and remote. The maximum distance between two lines connected via a local repeater is 100 m; by using a remote repeater, on the other hand, a distance of up to 1000 m is possible. In the latter case, fibre optic cables are necessary for the repeater connections. Furthermore, a maximum of two such repeaters is possible (or stated another way: three Ethernet lines or yellow cables). For this reason, an Ethernet LAN is restricted to 300 stations and 1700 m, or 2500 m with a remote repeater. Only through the use of a gateway is a connection to another LAN possible (Ethernet or otherwise). In this way, in principle, networking knows no limits, except that at some point the distribution and transfer of data would collapse.
1004
Chapter 33
33.3.2 CheaperNet or Thin Ethernet This Ethernet LAN uses a thinner and, therefore, cheaper BNC cable. Owing to its simpler and more cost-effective installation, it is most frequently used inside buildings. Here, the connection to a station is not achieved through a transceiver, but by a simple BNC T-connector which is directly connected to the adapter card. It is not possible to use extension cables (do not, under any circumstances, attempt this!). The thinner cable only permits an Ethernet line of up to 185 m (as opposed to 500 m in c&uen Ethernet), and a maximum of only 30 stations (instead of 100) can be connected together. There must also be a minimum of 0.5 m between two T-connectors. Apart from this, CheaperNet is identical to thick Ethernet with respect to transfer performance and functionality. In addition to the previously described 15-pole D-sub socket located at the rear, most Ethernet LAN adapters also have a connection for the BNC T-connecter, and so can be used by both Thick and Thin Ethernet.
33.3.3 Fast Ethernet The newest development in the area of Ethernet is the Fast Ethernet. The main reason for this new standard is the ten times higher transfer rate of 100 Mbits/s. This leaves token ring far behind, and is equal to FDDI, although the dimensions of a Fast Ethernet LAN are much more limited (FDDI allows up to 200 km compared to just a few kilometres for Fast Ethernet). One disadvantage for the upgrade of an Ethernet LAN is that completely different cabling is required because of the increased signal frequencies. Thus many adapters combine Ethernet (with 10 Mbits/s) and Fast Ethernet. Depending on the adapter to which a connection is established, the standard Ethernet or the new Fast Ethernet interface is enabled. Fast Ethernet LANs require hubs.
33.4 Token Ring As already explained, the IBM token ring uses a logical ring structure, but also, at least in part, a physical star topology. For this, so-called ring line distributors or concentrators (sometimes they are also known as Multi Station Access Units, or MSAUs) are connected together in a physical ring. The stations themselves are then connected to these distributors in a physical star-shaped topology, but the logical ring structure remains intact. This concept, which may seem somewhat bewildering at first, is shown in Figure 33.2. As you can see, at first only the ring line distributors (concentrators) are connected together in the form of a ring: the output RO (Ring Output) of a concentrator is connected to the input RI (Ring Input) of the next concentrator. From each of these concentrators there are lines to the associated stations. This physically conforms to a star topology, because each station is exclusively connected to the concentrator and not to another station. Nevertheless, through simple but intelligent activation of the concentrator, a ring structure is achieved. Every concentrator contains eight connections for stations, not all of which need be used. UNally, a switch (mechanical relay) for every connection is located within the concentrator which, by means of a signal, knows whether a station is connected to the corresponding socket and if
Local Area Networks and Network Adapters
1005
it is switched on. In a true ring structure, a switched off station would break the continuity of the ring. This problem is solved by the token ring concentrators in an intelligent way: a socket which is not connected to a switched-on station is simply short-circuited. You can see this in Figure 33.2. Even if the line between the concentrator and the station is damaged and no data can be transferred, the ring line distributor interprets this as a missing station and connects the corresponding contacts together. For this purpose, the applicable station checks the connection to the concentrator. Only if the connection is fully functioning does it send a so-called phantom voltage, which activates a relay in the concentrator and switches in the loop to the station. In this way, the station is integrated into the ring. Thus, the ring line distributor breaks open the starshaped ring topology and permits the new station to be flexibly integrated into the ring, or an inactive station to be removed from the ring. Even though the connections from the concentrator to the stations are laid out in the form of a star, both the token and the data are nevertheless transported around the ring. A further advantage gained by using a concentrator is that stations can also be connected or removed while the system is in operation; it is not necessary to bring down and then restart the network.
Station
Switched Off or connection interrupted
: 1
The lobe cable, which is directly connected to the token ring adapter of a PC, can be a maximum of 2.5 m in length. The distance between the PC and the concentrator, on the other hand, is limited to a maximum of 100 m. For the remaining 97.5 m, for example from the ring line distributor to a network socket, a better protected cable must be used. The lobe cable only represents the connection between the socket and the PC itself. Without further amplification, the distance between two ring line distributors must not exceed 200 m. Using line amplifiers, the maximum
1006
Chapter 33
distance can be increased to 750 m. If low damping fibre optic cables are used as the transfer medium, the concentrators can even be situated up to 2 km apart. A token ring can integrate up to 33 ring line distributors; in total, a maximum of 264 stations can be networked together in a ring. The IBM token ring is currently available with transfer rates of 4 Mbits/s and 16 Mbits/s. This is comparable to standard Ethernet, but here, however, with its collision-free operating methods and a more equal handling of all stations, the 16 Mbits/s token ring clearly leaves Ethernet behind when in actual operating situations. Note, however, that the bits for the token, receiver and sender address, etc. are also contained within the 4 Mbits/s and 16 Mbits/s. The effective useful data rates are then approximately 0.5 Mbytes/s and 1.8 Mbytes/s, respectively.
33.5 FDDI In this last section, I would briefly like to delve into FDDI, a LAN at the higher end of the performance scale. In FDDI, or Fibre Distributed Data Interface, the transfer medium is quite obvious: fibre optics. FDDI represents a powerful further development of token ring, with a transfer capacity of up to 100 Mbits/s and a maximum length of between 100 and 200 km, for the networking of approximately 500-1000 stations. Here, the stations can be separated from one another by up to 2 km. The ring is formed by a two-veined gradient (of a distance less than 2 km) or mono-mode (distances up to 20 km) fibre optic cable. In accordance with ANSI recommendations regarding FDDI standardization, data transfer at 1300 nm takes place in the nearinfrared range; here, the fibre optic cables have an especially low signal damping effect. The two-veined fibre optic cable should contain two rings: a primary ring and a secondary ring. Initially, the plan is to use the secondary ring only as a safety backup for the network (a so-called backup ring). In principle, however, the secondary ring could also transfer data in normal operation; the FDDI bandwidth would then be doubled. There are three classes of station defined for FDDI: - A class: stations with four connections for fibre optic cables, that is, an input and output connection for both the primary and secondary rings. In this way, A stations can be installed directly into the ring. - B class: stations with only two fibre optic connections, that is, only one input and one output connection. For this reason, B stations cannot be installed directly into the ring; instead they must be integrated into the ring via a concentrator (ring line distributor). - C class: these stations correspond to the normal token ring concentrators and represent rl‘ single-veined connection to the B stations. Like the A stations, they contain four fibre optic connections and are installed directly into the ring.
‘
Both A and C stations include a station manager (STM). This is a combined hardware/software component which detects line errors between the stations and can act on them accordingly. If the STM discovers a connection error between A and C stations (which are located directly within the ring), then it automatically switches to the secondary ring. In a token ring, such a disruption has a fatal effect; the ring is broken and the network is paralysed. FDDI can also detect line errors to and from B stations. However, they cannot be rectified because there is no secondary line between the C concentrator and the B station. Then, like the ring line distributor in a normal token ring, the STM simply makes a bridge between the applicable B station and the C concentrator.
34 Keyboards and Mice This chapter discusses the most common and most important input devices for PCs - the keyboard and the mouse.
34.1 The Keyboard Despite all the until the end of time? Depending upon whether you use a keyboard with American, British or some other language assignment, some control, shift or other keys may be named differently. Furthermore, in the literature you will sometimes find different names for the same key, for example the enter or CR keys. Therefore, Table 34.1 lists some different names for these keys. In the following I will only use the names given in the first column of Table 34.1. Name
Alternative names
enter key
CR key
control key (Ctrl) alternative key (Alt) shift key (Shift) shift-lock key
caps-lock
cursor up cursor down cursor left cursor right insert delete cursor home end page up page down system request Table 34.2: Alternative key nnmes
clear-home
1008
Chapter 34
34.1.1 Structure and Functioning of Intelligent and Less Intelligent Keyboards The keyboard and the accompanying keyboard interface, especially those for the AT and today’s widely used MF II keyboard (multifunction keyboard), and more complex devices than they seem from the outside. Contrary to the widely held opinion, every keyboard has a keyboard chip, even the previous model of the less intelligent PC/XT keyboard with the 8048. The chip in the keyboard case supervises a so-called scan matrix, formed of crossing lines. At each crossing a small switch is located. If you press the key then the switch is closed. The microprogram of the keyboard chip is intelligent enough to detect a pressing of the keys. Bouncing is the phenomenon whereby the accompanying switch is first closed when a key is pressed, then the switch reopens and closes again. The reason for this behaviour is the sprung reaction force of the key switches. The chip must be able to distinguish such a fast bouncing from an intentional and slower double key press by the user. Figure 34.1 shows a scheme for the principle structure of a keyboard and the accompanying keyboard interface in the PC. Keyboard ,_______..._..___________........_________._......
Keyboard Interface ~______..........._.__________~
1 l-bit SDU
ill
I
Figure 34.1: Structure of keyboard and keyboard interface. The keyboard chip regularly checks the status of the scan matrix to determine the open or closed state of the switches. For this purpose, it activates successively and individually the X lines and detects from which Y terminal it receives a signal. By means of these X and Y coordinates, the newly pressed or released switch (that is, the newly pressed or released key) is unambiguously identified. The keyboard chip determines whether a key - and eventually which one - has been pressed or released, an it writes a corresponding code into a keyboard-internal buffer (details concerning these make and break codes are discussed in Section 34.1.2). Afterwards, the keyboard transmits the code as a serial data stream via the connection cable to the keyboard interface in the PC. Figure 34.2 shows the structure of the accompanying SDU for the data transfer, as well as the assignment of the keyboard jack on your PC. The line keyboard clock transfers the data clock signal for the data exchange with the keyboard interface on the motherboard. Thus the transfer is carried out synchronously, unlike the UART 8250. Activating the signal !qhonrd rrsct on some interfaces gives rise to a keyboard initializa-
Keyboards and Mice
1009
SDU 0
1
START: DB&DB7: PAR: STOP:
2
3
4
5
6
7
8
9
10
Start Bit (Always Equal 0) Data Bits 0 to 7 Parity Bit (Always Odd Parity) Stop Bit (Always Equal 1)
Contacts
Keyboard Connector
Figure 34.2: Keyboard SDU. tion. Via the line keyboard data, the data is exchanged between the keyboard and the keyboard interface in the PC.
In a PC/XT the keyboard interface is essentially formed of a simple serial interface that only accepts the serial data stream from the keyboard. Here no data transfer to the keyboard is possible. The 8048 in the PC/XT keyboard is therefore not prepared to accept data from the keyboard interface in the PC. Thus the PC/XT keyboard cannot be programmed. Upon receipt of a code from the keyboard, the interface issues a hardware interrupt via IRQl corresponding to INT 09h, and provides the data at port B of the 8255 PPI. In the AT, instead of the primitive serial interface a keyboard controller has been installed. In older ATs you will find the 8042 chip, in newer ones the 8741 or 8742 (or compatible) chip. Thus the keyboard interface became intelligent, and is able to do more than simply accept a serial data stream and issue an interrupt. The keyboard controller can be programmed, for example, to disable the keyboard. Moreover, a bidirectional data transfer between keyboard and keyboard controller is possible here; thus the keyboard controller can transfer data to the keyboard interface. The keyboard chip’s microcode is therefore prepared for receiving control commands through which you may, for example, set the repetition rate of the keyboard. Details concerning the programming of the AT or MF II keyboards are discussed in Section 34.1.5. In IBM’s F’S/2 models an additional mouse port for a PS/2 mouse is integrated into the keyboard interface. A brief description of the I’S/2 mouse interface is given in Section 34.2.4. 34.1.2 Scan Codes - A Keyboard Map You may have wondered how a keyboard with a British keyboard layout can be connected to a Taiwanese PC without the PC always mixing Chinese and English. The reason is quite simple: every key is assigned a so-called scan code that identifies it. For the scan code one byte is sufficient, as even the extensive MF I1 keyboards have a maximum of 102 keys. Only once the keyboard driver is effective is this position value converted into a character. Of course, this need
1010
Chapter 34
not always be an ASCII code as, for example, for no ASCII code exists. Additionally, it is required that, depending upon the pressed SHIFT keys, various characters are output if you press, for example, the key ~7,>: without SHIFT you get the digit 7, with SHIm the character /, and with the ALT Gr-key pressed the bracket I is output. On the PC/XT keyboard the individual keys are simply enumerated continuously. The principle of key enumeration and scan codes has been kept for the AT and MF II keyboards; only some new keys were added and the layout changed so that some keys are shifted. Figure 34.3 shows the layouts of these keyboards, together with the scan codes assigned to the individual keys. If you press a key (also a &lent), key such as Ctrl) then the keyboard first generates a so-called make-code, which is equal to the scan code of the pressed key, and transfers this make-code to the PC’s keyboard interface. There a hardware interrupt INT 09h is usually issued via IRQl and the handler fetches the make-code from the keyboard interface. The handler routine processes the code differently, depending upon whether SHIFT (which can affect a following key press), a function or a control key such as HOME, or a normal key such as A, has been pressed. Example:
press
SHIFT
first
and afterwards
c
without
transferred make-codes: 42 (SHIFT) and 46
releasing
S H I FT .
('C')
Note that here uppercase and lowercase are not distinguished. Only the keyboard driver combines the two make-codes into one ASCII code for the character C. Further, a repetition function is implemented in every keyboard that continuously repeats and transfers the make-code of the pressed key to the keyboard interface so that you don’t need, for example, to press and release the A key 80 times if you want to fill a whole line with (>. On a PC/XT keyboard the repetition rate is fixed and equal to 10 characters/s; on AT and MF II keyboards you can program the rate with values between 2 and 30 characters/s. If, on the other hand, you release a pressed key then the keyboard generates a so-called breukcode, which is transferred to the keyboard interface in the same way as the make-code. Also in this case, the interface issues an interrupt INT 09h via IRQI, and calls the handler of the keyboard driver. The break-code is simply the scan code with a set bit 7, that is, the most significant bit is equal to 1. Thus the break-code is equal to the make-code plus 128. According to the break codes, the handler can determine that a key hasn’t been pressed but released, and also which key has been released. In connection with a SHIFT key, the effect of the SHIFT key is cancelled for the following character. Now lowercase characters instead of uppercase characters are output again. Example:
the keys SHIFT and C of the above exanwle are released in the opposite order. transferred corresponding
break-codes:
174(=46+128 COX-responding to C) and 170 (.42+128
to SHIFT)
Compared to the PC/XT keyboard, on the AT keyboard only the SysReq key with scan code 84 was added. AI1 other keys were assigned the same make/break-codes as before, but some control keys are located at another place, and the numerical keypad was implemented as a separate block. Compatibility thus remains, even on the hardware level, as the only difference from the outside is the new SysReq key, and new components do not give rise to any incompatibility with older programs. The situation becomes somewhat more complicated with the new MF II keyboard. This keyboard not only has a completely different layout (for example, the function keys are no longer
Keyboards and Mice
1011
‘C/XT Keyboard
w 65 66 67 68
AT Keyboard
IF II Keyboard (102 Keys, UK, CN, AUS etc.)
@I
rrrn
r-l-I-nrrrnm
59 60 61 62 63 64 65 66 67 68 87 88 55 70
O O O
AF II Keyboard (101 Keys, ASCII, USA)
_I [
1
Figure 34.3: Scan codes of PC/XT, AT and MF II keyboards. The PC/XT kqboard has 83 keys with scan codes from 01 to 83. The AT keybonrd additionally hns n SysReq key. The MF II &board has several ww control krys III separate blocks, md three LEDs to indicafe the stnlrrs of the shift keys.
1012
Chapter 34
arranged on the left-hand side but on the top), but it has been extended by the two function keys Fll/F12 as well as separated control keys. On a PC/XT and AT keyboard the use of the numerical keypad with enabled NumLock is rather ponderous if the cursor has to be moved simultaneously. For this reason, IBM implemented the control keys on its extended or MF II keyboard in a separate control bl’ock between the alphanumerical and numerical keypads. Moreover, the keyboard has been extended by a second Ctrl and Alt key, as well as PRINT and PAUSE keys. If these new keys (as is actually the case) are assigned the same scan code as the former keys with the same function, then a program cannot distinguish whether you have pressed, for example, the left or right Alt key. But with DOS this is of significance as, by means of the right Alt key (Alt Gr), you have access to the third keyboard level with characters 1, [, etc. But for this purpose the new keys must differ from the former ones. A new scan code now gives rise to the problem that older programs that access the keyboard directly (for example, former versions of the BASIC interpreter) cannot detect and process the new keys. The engineers had (again) a good idea: if you press or release one of the new MF I1 keys then the precede byte eOh or elh is output first, followed by the make or break-code. Make and break-codes thus remain the same compared to the former AT keyboards. The precede byte elh is output if the PAUSE key is operated, the precede byte eOh for all other new keys of the MF II keyboard. Thus the keyboard driver can distinguish, for example, between the left and right Alt keys. The MF II keyboard additionally attempts to imitate and behave like the AT keyboard. This
means that the new control keys, whose equivalents are also present in the numerical keypad, output other make and break-code sequences if the NumLock function is enabled. With a disabled NumLock function you have to press only the intended control key. Example :
cursor left
with disabled N"mLock
function.
make-break sequence: 4bh (cursor make) cbh (cursor break)
If, on the other hand, the NumLock function is enabled, then you have to press the SHIFT key first, and afterwards the intended control key, to avoid outputting a number because of the enabled NumLock function. Example :
cursor
left with enabled NumLock
function.
make-break sequence: 2ah (SHIFT make) 4bh (cursor make)
ebb (cursor break) aah
iSHIFT break)
To simulate the AT keyboard the MF II keyboard therefore outputs another make-break sequence if the NumLock function is enabled. If you press and release the key cursor left in the separate control block with the NumLock function disabled, then the MF II keyboard outputs the make-break sequence eOh 2ah eOh aah. The two precede bytes eOh in front of the actual scan code indicate that you have pressed a new key of the MF II keyboard. If you press the same cursor key with the NumLock function enabled, the chip in the MF II keyboard automatically generates the sequence eOh 2ah eOh 4bh eOh cbh eOh aah. The same applies, of course, for the other control keys. Another special role is played by the PAUSE key. On the PC/XT and AT keyboard a program is paused by pressing Ctrl + NumLock. The MF II keyboard therefore supplies the following
Keyboards and Mice
1013
make-break sequence: elh ldh 45h elh 9dh c5h. elh characterizes the new MF II key, ldh and 9dh are the make and break-code, respectively, for Ctrl, and 45h and c5h the make and breakcode, respectively, for NumLock. Even if you keep the PAUSE key pressed the complete makebreak sequence is output. If you program a keyboard driver you have, in principle, every freedom to assign a scan code, depending upon the pressed shift keys, etc., a certain character. However, there is no effect if you remove the individual keys from the switches and rearrange them; even if you have arranged the keys in alphabetical order beginning in the upper left comer of your keyboard, the first key on the MF II keyboard will not return an ( but a cc%, as before. A keyboard driver uses an internal conversion table to assign the keyboard’s scan codes an ASCII code, and thus a character, or to carry out certain functions. Using another conversion table, for example, a Spanish keyboard can easily be connected to a PC. The technical structure and the passed scan codes remain the same, but the keyboard driver converts them to another ASCII code. Also operating a shift-lock key such as NumLock is processed only by software: the driver sets an internal indicator that indicates the status and enables the LEDs with a certain command. Only on the MF II keyboard is an internal circuit really switched if the NumLock key is operated so that the keyboard actually outputs corresponding make-break sequences if you operate a control key in the separate control block. After a keyboard reset the NumLock function is always enabled. Every operation and the following switching of the internal circuit is registered by the keyboard driver via the issued interrupt, so that the BIOS-internal NumLock indicator always corresponds to the NumLock state of the keyboard - until you disable IRQl once, press the NumLock key, and reactivate IRQl again. The internal keyboard status and the NumLock indicator of the keyboard driver are then complementary.
-
34.1.3 Keyboard Access Via DOS For accessing the keyboard seven functions of the DOS interrupt INT 21h are available: Olh, 06h, 07h, OSh, Oah, Obh and 3fh. You will find a list containing the calling formats as well as the returned characters in Appendix J. Note that these functions don’t access the keyboard itself, but only read and write a 32-byte buffer in the BIOS data area. Accordingly, they are inflexible and less powerful if you want to use the complete function palette of modem MF II keyboards. The first six keyboard functions are relics from the CP/M era, and always serve the standard input device only. If you make, for example, the printer at the serial interface COMl erroneously the standard input device by using the redirection (cprogram.exe < COMl*, then you will wait until the end of time. Your printer is unable to output any character, and because of the redirection you have ccdisabledn the keyboard. The keyboard hits are registered and the characters are wryten into the keyboard buffer; DOS doesn’t pass them to the program, but waits for a character from the printer. The only way out is the 3-finger input Ctrl-Alt-Del. Significantly better is the 3fh function (read file/dmice), which uses the concept of the handles. The standard input/output device is denoted by the reserved name CON (for console) with DOS, and is assigned the handle 0 as standard. We have already met the 3fh function, for example when reading a character from the serial interface. Thus you can see the power of handles in connection with device drivers: for an access to files, interfaces and the keyboard, a single function is sufficient.
:
1014
Chapter 34
The functions of INT 21h differ in how they process an input character. In the so-called raw mode, which the functions 06h, 07h, Oah as well as function 3fh (if configured accordingly) use, the control characters Y, “I’, etc. are not processed accordingly, but simply passed to the calling program. But the functions 08h and 3fh correspondingly set up, on the other hand, interpret these control characters rind, for example, call INT 23h for a program abortion if AC is input. Example:
buffered character input with a maximum, of 80 characters (function Oah; language:
char
Microsoft c! 5.10).
l ch_input(void)
( char *buffer, 'string; union
P.EGS inregs, outregs;
buffer.
(char
') mlloc(82);
/* provide buffer
l /
'buffs-30;
/* first byte indicates maximum number of characters
inregs.h.ah= OxOah;
/* function Oah *I
inregs.x.dx=FP_OFP(buffer);
I' buffer offset; segment already in I* call function 'I
int86(0x21,
fiinregs, houtregs);
l
/
DS ',
string=buffer + 2
/' pointer string to beginning of string l / /* further: 'buffer.80 l Lbuffer+l)=length of input string'/
return
/' return pointer to input string
(string);
l
/
1
Consult Appendix J or a good DOS reference for details concerning the various DOS functions.
34.1.4 Keyboard Access Via the BIOS The BIOS writes the characters passed by the keyboard into a temporary buffer called the keyboard buffer, which as standard starts at address 40:le, has 32 bytes, and thus ends at address 40:3d. Every character is stored in the buffer as a 2-byte value whose high-order byte represents the scan code and whose low-order byte indicates the ASCII code. Thus the buffer can temporarily store 16 characters. All input characters are first accepted by the INT 09 (the handler of IRQl), which determines the ASCII code from the scan code by means of a conversion table and writes both codes into the keyboard buffer afterwards. Structure and Organization of the Keyboard Buffer
The keyboard buffer is organized as a ring buffer managed by two pointers. The pointer values are stored in the BIOS data area at addresses 40:la and 4O:lc (see Table 34.2). The write pointer indicates the next free write position in the keyboard buffer, where the character input next will be stored. The read pointer refers to the character in the keyboard buffer to be read first, that is, to the character that will be passed to a program next. Because of the ring organization it may be that the value of the read pointer is higher than that of the write pointer. In this case, all words between the read pointer and the physical end of the buffer at 40:3d, as well as the characters between the physical beginning of the buffer at 40:le and the write pointer, are valid characters from the keyboard. However, all words between the write and the read pointers are empty, and may accept further characters from the keyboard.
1015
Keyboards and Mice
Address
Size
40:17
byte
Structure 76543210 1
Content
Meaning
first shift status byte
insert mode active shift lock mode active NumLock mode active scroll active Alt key pressed Ctrl key pressed left shift key pressed right shift key pressed insert key pressed shift lock key pressed NumLock key pressed scroll key pressed pause mode active SysReq key pressed left Alt key pressed left Ctrl key pressed
1 1
40
1 1 1 1 1 40:18
byte
1
second shaft status byte 1 1 1 1 1 1 1
40:19 40:la
We
word
alternative keyb. input read pointer
4O:lc
word
write pointer
40:le
32 bytes word word
keyboard buffer
40:80 40:82 40:96
byte
begin of keyboard buffer end of keyboard buffer keyboard status byte
1 1 1 1 1 1 1 1
40:97
byte
1
general keyboard status 1 1 1 0 x x x
points to character in buffer next to be read points to next free location in buffer 16 characters, but only 15 are used offset in segment 0040h offset in segment 0040h ID code is read last character was ID code activate NumLock when reading ID and extended code MF It keyboard installed right Alt key pressed right Ctrl key pressed last code equal EOh last code equal Elh error keyboard data LEDs are updated ACK sent back ACK received reserved shaft LED: l=on. O=off NumLock LED: l=on, O=off scroll LED: l=on, O=off
Table 34.2: BIOS storage area and keybonrd
the buffer organization it is apparent that the keyboard buffer is empty if the beginning and the end of the buffer coincide. On the other hand, the buffer is full if the write pointer refers to the character that precedes the character to which the read pointer points. If you press a further key a short beep sounds. This means that the keyboard buffer has 32 bytes but because From
Chapter 34
1016
of its organization is only able to accommodate 15 characters with 2 bytes each. If one were to exhaust the full capacity of 16 characters with 2 bytes each, then the write pointer might refer to the same character as the read pointer, that is, read and write pointers coincide. But this is, as I have already mentioned, characteristic if the keyboard buffer is empty. Figure 34.4 shows this behaviour. Thus you can recognize an empty keyboard buffer simply by the fact that the values for the read and write pointers coincide while a full buffer is present if the write pointer refers to the character immediately preceding the character referred to by the read pointer. Upon every keyboard hit the keyboard controller issues an interrupt 09h via IRQI, which accepts the scan code of the character and converts it into an ASCII code if this is possible. Afterwards, scan code and ASCII code are written to that location in the keyboard buffer to
I
.s
$
-ij L
(c)
I
0
L
$ f free free free free free free free I I I I I I I I I I
; j
2 s
s
T
t
free free
I
I
free
I
free
free
I
free
I
free
I
free free
I
I
Keyboards and Mice
1017
which the write pointer refers, and the write pointer is updated to the next character position in the buffer. If you read a character using the BIOS functions discussed below the function passes the character referred to by the read pointer to the calling program and updates the read pointer. The character is thus logically removed from the buffer, although ASCII and scan code are still physically held by the buffer. A program can, of course, write back one character into the keyboard buffer by writing the character in front of the word which is referred to by the read pointer, and updating the read pointer accordingly. The written character is passed to a program in advance of the already stored characters. Alternatively, a character may also be written behind the already present characters, and the pointer is updated afterwards to point to the next location in the buffer. The thus written character is passed to a program once all characters present have been transferred. Keyboard Status and BIOS Data Area In the BIOS data area, besides the keyboard buffer and the pointers for the beginning and the
end of the buffer, several bytes are also stored which indicate the keyboard’s status. Table 34.2 indicates the use of the BIOS data area as far as the keyboard is concerned. The bytes 40:17 and 40:18 refer to the keyboard staatus for the PC/XT and the AT keyboard. Because of several shift keys being present on the extended MF II keyboard, additional status bytes are required, for example to distinguish the left and right Alt keys. This and other status information is held in bytes 40~96 and 40:97. With the words 40:80 and 40:82, DOS or another program can define an alternative keyboard buffer, which the BIOS then uses instead of the buffer starting at 40:le. Note that the buffer address is limited to segment 0040h. The alternative buffer may exceed a size of 32 bytes, corresponding to 16 characters. Functions 4fh and 85h of BIOS Interrupt 15h Starting with the AT, and on the PS/2, two functions have been implemented in the INT 15h system interrupt which are effective before the input character is written into the keyboard buffer. The handler of the hardware interrupt 09h corresponding to IRQl, which accepts a character from the output buffer of the keyboard controller, internally calls the function 4fh of INT 15h for every character using the following instruction sequence. The function therefore forms a hook for the keyboard input. This is carried out as follows:
MO”
’ ’ : ; f i
ah,
4fh
; load function number into ah
HO" al, scan
;
STC
; set carry flag
IN-7 15h
; call interrupt 15h.
load scan code of key into al function 4fh
Normally the handler for INT 15h, function 4fh consists of a simple IRET instruction. Thus the scan code in al remains unchanged, and INT 09h writes it together with the corresponding ASCII code into the keyboard buffer in the BIOS data area. But the situation becomes more mteresting if you intercept INT 15h, function 4fh; now you can alter the passed scan code and fool the PC into recognizing an X for a U. For this purpose, you only have to load the al register with the new scan code 45 (corresponding to X) if a scan code 22 indicating U is passed during the call. The handler fragment of the following example carries out this process explicitly:
-
.
Chapter 34
1018 Example
:
replace ” (scan
code 22) by an X (scan code 45).
CMP ah, 4fh ; check whether function 4fh is called JNE further ; other function, therefore
jump
CMP al, 16h ; check whether scan code is equal 22.16h m return
: return if acan code is not equal 22
MO" al, 2dh ; load new scan code 45=2db into al return: IRET further: . . . . . . . . . . . ; something
else
A more serious application would be to replace the period in the numeric keypad of the MF II keyboard by a comma, which is the decimal sign in some languages (for example, German). But you have to carry out more checks to confirm that the user really has operated the period key in the numeric block, and not the Del key in the separate control block, with the same scan code but the precede byte Oeh. Function 4fh of INT 15h can aslur~~ a character. For this purpose, you must simply clear the carry flag and return with a RET 2 command, or manipulate the carry flag on the stack before you issue and IRET instruction. (Remember that an INT instruction pushes the flags onto the stack, and that an IRET instruction reloads the flags from the stack into the flag register; see Section 3.6.1.) INT 09h then ignores the key hit and doesn’t write any code into the keyboard buffer. Another keyboard hook is the function 85h of INT 15h, which the handler of INT 09h calls if you press or release the SysReq key on an AT keyboard or Alt+SysReq on an MF II keyboard. The standard routine comprises a simple IRET instruction, and the keyboard driver normally ignores the key hit. But you may intercept the call, for example to open a window as a consequence of the SysReq hit, which allows an access to a resident program with system commands. Most pop-up programs (for example, Sidekick) don’t use function 85h of INT 15h, but intercept INT 09h to supervise the keyboard before the handler of INT 09h processes the input character. A certain key combination (for example, Ctrl+Alt+Fl) then gives rise to the activation of the TSR program. The reason for this strategy is that the PC/XT doesn’t have a SysReq key, and thus the BIOS doesn’t implement a call to INT 15h, function 85h. Actually, SysReq is aimed at use in a multitasking operating system to switch between various applications. If the handler of INT 09h detects that you have operated the key combination CM-Break, then it calls interrupt 1Bh. The BIOS initializes the accompanying handler to a simple IRET instruction so that Ctrl-Break is ignored. But DOS and application programs have the opportunity to install their own routine, and may intercept and process a Ctrl-Break accordingly. Note that Ctrl-C is intercepted only on a DOS level, and doesn’t give rise to a call of INT 23h, but CtrlBreak is intercepted on a BIOS level. With the entry BREAK = ON in config.sys, you instruct DOS to replace the simple IRET instruction by its own handler. Two further reserved key combinations recognized by the INT 09h are Ctrl-A&Del for a warm boot and Print or Shift-Print for printing the screen contents. With a Ctrl-Alt-Del the handler of INT 09h calls interrupt INT 19h load bootstrap; with Print INT 05h is issued.
Keyboards and Mice
1019
Functions of BIOS Interrupt 16h A much better keyboard control than the DOS functions is offered by the BIOS interrupt INT 16h, which provides eight keyboard functions. All functions of interrupt 16h are listed in Appendix J. In principle, you may determine the same values that the BIOS functions return (that is, scan code, ASCII code and shift status) by directly accessing the keyboard buffer or the keyboard status byte in the BIOS data area. This way is much faster than via INT 16h, but you have to take care. Furthermore, you lose compatibility to a significant extent. Nearly all BIOS manufacturers comply with the formats indicated in Appendix J. The functions lOh, llh and 12h have been implemented in the newer BIOS versions (since the end of 1985) to support the new function and control keys of the extended MF II keyboard. Usually, the BIOS functions return an ASCII value of 0 if a function or control key has been pressed, for example a cursor key, Fl or HOME. Example:
read character by means of INT 16h. function OOh; assume that key A has been pressed. l4OV ah, OOh ; execute function OOh, INT 16h
;
issue
that is, read character
interrupt
Result: ah=30 (scan code for key A); al=97 (ASCII code for a) read character by means of INT 16h. function OOh; assume that key HOME in the separate control block of an MF II keyboard has been pressed.
MO" ah, OOh ; execute IN-T 16h
;
issue
function
OOh. that is, read character
interrupt
Result: ah.71 (scan code of key HOME); al=00
(characterizes
function
and
control
keys which are not assigned an ASCII code)
Thus, function OOh doesn’t distinguish between the operation of a anormal,) and the operation of a new function or control key of the MF II keyboard. But if you use function 10h for the extended keyboard instead of function OOh, then the interrupt returns an indicator eOh in the al register, which indicates the operation of an extended key. Example :
read character by means of INI 16h. function lob; assume that the key HOME in the separate control block of an MF II keyboard has been pressed. MO" ah, 10h ; execute INT 16h
;
issue
function
lOh, that is, read character
interrupt
Result: ah=71 (scan code of key HOME); al==0
(indicator for a separate function or
control key of the MF II keyboard which is not assigned an ASCII code)
The only key that you cannot access even with the extended functions is the PAUSE key. This key is already intercepted by the handler of INT 09h (corresponding to IQ11 and converted to an endless program loop. When you press another key the CPU leaves this loop and continues program execution. On a ES/2 and some ATs you can set both the repetition rate at which the keyboard transfers characters if a key is kept pressed, as well as the delay until the first character repetition occurs, using function 03h of the BIOS interrupt 16h.
1020
Example
Chapter 34
:
set a repetition rate of
20 characters/s and a delay of 500111s.
MOV ah, 03h ; load function number 03h into ah MCI" bl, 04h ; 20
characters/s
MO" bh, Olh ; 500 ms delay IW 16h
; call function
34.1.5 Programming the Keyboard Directly via Ports As already mentioned, you can program the AT and MF II keyboard similar to other peripheral devices. On the PC/XT keyboards this is not possible because this model doesn’t implement a keyboard controller able to transfer commands and data to the keyboard. Here, all transfers proceed in one direction; only the keyboard transfers scan codes to the keyboard interface on the motherboard. Thus the following description focuses mainly on the AT and MF II keyboards. With I’S/2 models you can also access the I’S/2 mouse via the keyboard controller.
Figure 34.5 shows the --!
scheme
for an AT, MF II or I’S/2 keyboard controller with a mouse.
Keyboard Controller
64h
Figure 34.5: AT, MF II or PSI2 keyboard controller.
Registers and Ports For directly programming the AT and MF II keyboards the two port addresses 60h and 64h are available. Using these you may access the input buffer, the output buffer and the control register of the keyboard controller. Table 34.3 lists the addresses of the corresponding registers. The PC/
XT keyboard is only able to transfer the scan codes via port address 60h and to issue a hardware interrupt. Note that the SW1 bit in port A of the 8255 must be set for this purpose. The following descriptions refer exclusively to AT and MF II keyboards. Using the status register you may determine the current state of the keyboard controller. The structure of the read-only status register is shown in Figure 34.6. You can read the status register by a simple IN instruction referring to the port address 64h.
Keyboards and Mice
1021
Port
Register
Read (R) Write (W)
60h 60h 64h 64h
output buffer input buffer control register status register
R W W R
PARE: parity error of the last byte from keyboard/auxiliary 1 =last byte with parity error TIM: general time-out 1 =error AUXB: output buffer for auxiliary device (PSI2 only) 1 =holds data for auxiliary device KEYL: keyboard lock status f=keyboard free command/data C/D: l=command byte written via port 64h SYSF: system flag l=self-test successful INPB: input buffer status l=CPU data in input buffer OUTB: output buffer state 1 =keyboard controller data in output buffer
device (PS12 only) O=last byte wtthout parity error O=no time-out error O=holds keyboard data O=keyboard locked O=data byfe written via port 60h O-power-on reset O&put buffer empty O=output buffer empty
Figure 34.6: Status register (64h).
The PARE bit indicates whether a parity error has occurred during the course of transferring the last SDU from the keyboard or the auxiliary device (beginning with PS/2). If TIM is set then the keyboard or mouse didn’t respond to a request within the defined time period, that is, a timeout error occurred. In both cases, you should request the data byte once more using the controller command Resend (see below). The AUXB bit shows whether a data byte from the mouse is available in the output buffer. If OUTB is set, a data byte from the keyboard is available in the output buffer. When the CPU reads the byte from the output buffer, AUXB or OUTB, respectively, is cleared automatically. Before you read the output buffer using an IN instruction you should always check (according to OUTB or AUXB) whether or not the controller has already transferred a byte into the output buffer. This may take some time, for example if you carry out a keyboard self-test and wait for the result byte. The keyboard is unable to transfer another character via the input port to the keyboard controller before the CPU has read the last passed character from the output buffer. Inversely, the INPB bit indicates whether a character is still in the input buffer of the keyboard controller, or whether the CPU can pass another. The C/D bit shows whether the last written byte was a command byte that has been transferred by the CPU
Chapter 34
1022
via the port address 64h, or a data byte that the CPU has written via the port address 60h. KEYL and SYSF, finally, indicate whether the keyboard is locked or not, and whether the self-test could be completed successfully. Example
:
read status register. . IN al, 64h
; the IN instruction referring port address 64 transfers ; the contents of the status register into al
You access the write-only control register (Figure 34.7) by an OUT instruction referring to the port address 64h. The keyboard controller interprets every byte you pass in this way as a command. Note that commands for the keyboard are written via the input buffer, that is, by an OUT instruction with the keyboard command code referring to the port address 60h. Table 34.4 lists the commands valid for the keyboard controller.
1 C,-C,: command bit 7-O Figure 34.7: Control register (64h).
Example:
disable
keyboard
start: IN
al,
64h
; read status byte
TEST al, 02h ; check whether input buffer is full JNZ start
; some byte still in the input buffer
OUT 64h, adh ; disable
keyboard
Note that aftewards you have no opportunity to input something via the keyboard: even Ctrl-Alt-Del doesn't work any more.
Using the input and output buffer you can transfer data to the keyboard controller, as well as pass commands and data to the keyboard, and you can receive data from the keyboard controller or the keyboard itself. The structure of these two buffers is illustrated in Figure 34.8. You may access the input buffer with an OUT instruction, referring to the port address 60h, if the INPB bit of the status register is cleared. Via the input buffer, data bytes are eventually transferred to the keyboard controller, which belong to a controller command issued in advance via port address 64h. Example
:
write byte Olh into the output port. O[pP 64h. dlh ; pass
code for the controller command write
output port-
; via the control register to the keyboard controller wait:
IN al,
64h
;
read
status
register
TEST al, 02h ; check whether input buffer is full JN!?, wait
; input buffer full thus wait
OUT 60h. Olh ; mass
data byte Olh for the controller command
1023
Keyboards and Mice
Command
Description
a7h aah a9h
disable auxiltary device enable auxiliary device check interface to auxiliary device
aah
self-test
abh
check keyboard interface
disables the auxiliary device enables the auxiliary device checks the interface to auxiliary device and stores the check code in the output buffer (OOh=no error, Olh=clock line low, OZh=clock line high, 03=data line low, 04=data line high, ffh=no auxiliary device) the keyboard controller executes a self-test and writes 55h into the output buffer if no error is detected the keyboard controller checks the keyboard interface and writes the result into the output buffer (OOh=not error, Olh=clock line low, OZh=lock line high, 03h=data line low, 04h=data line high, ffh=general error disables the keyboard enables the keyboard reads input port and transfers the data Into the output buffer reads bit 3-O of input port repeatedly and transfers the data into bit 7-4 of status register until INPB in the status register is set reads bit 7-4 of input port repeatedly and transfers the data into bit 7-4 of status register until INPB in the status register is set reads output port and transfers the data into the output buffer writes the following data byte into the output port writes the following data byte into the output buffer and clears AUXB in the status register wrrtes the following data byte Into the output buffer and sets AUXB in the status register writes the following data byte into the auxiliary device the keyboard controller reads its test input and writes TO into bit 0 and Tl into bit 1 of output buffer pulls low bits 3-O of output port corresponding to low nibble OOh to Ofh of command for 6 ms
disable keyboard enable keyboard read input port read out input port (low)
i
c2h
read out input port (high)
dOh
read output port
dlh d2h
d4h eOh
wnte output port write keyboard output buffer write output buffer of auxiliary device write auxlllary device read test input port
fOh-ffh
send pulses to output
d3h
Port Table 34.4: Controller cotmmds CAT, PS/2)
7
6
5
4
3
2
1
0
Cl-D,: Data bit 7-O L
Figure 34.X: Input nnd output buffer (60111. In the s a m e way, you pass control commands to the keyboard by writing the code of the intended keyboard command into the input buffer of the keyboard controller. The keyboard Fontroller then transfers the command byte to the keyboard, which in turn interprets and exhcutes it. You will find a list of all keyboard commands and their interpretation in the next Section.
1024
Chapter 34
The keyboard controller writes all data that CPU has requested by means of a controller command into the output buffer. If you have pressed a key then the keyboard passes the scan code in the form of an SDU to the keyboard controller, which extracts the scan code byte and writes it into the output buffer. In both cases the keyboard controller issues (via IRQI) a hardware interrupt corresponding’to INT 09h if it has received a byte from the keyboard and written it into the output buffer. The handler of this hardware interrupt can then fetch the character by means of an IN instruction referring to the address 60h, determine the corresponding ASCII code, and put both into the keyboard buffer of the BIOS data area or process the return codes for ACK, etc. accordingly. Details concerning the transfer of scan codes are discussed below. The input and output ports of the keyboard controller not only establish a connection to the keyboard or (in the case of PS/Z) a mouse, but also control other gate chips in the PC or output status information from other devices. Don’t confuse the input and output ports with the input and output buffers. Figure 34.9 shows the structure of the input port, Figure 34.10 that of the output port.
7
6
5
4
3
2
1
0
-1
KBLK: keyboard lock 1 =keyboard not locked O=keyboard locked CIM: colourlmonochrome l=monochrome O=colour AUXD: input data from auxiliary device (PS/2 only) KBDI: input data from keyboard (keyboard data in) reserved: value undefined Figure 34.9: Input port.
7
6
5
4
3
2
1
0
-1
KBDO: output data to keyboard KCLK: keyboard clock AUXB: output buffer of auxiliary device full (PSI2 only) OUTB: output buffer full ACLK: auxiliary device clock (PS/2 only) AXDO: output data to auxiliary device (PS/2 only) GA20: gate for A20 1 =on (A20 enabled) 0=0n SYSR: processor reset l=execute reset O=no reset Figure 34.10: Outpuf port
1025
Keyboards and Mice
You can read the input port by passing the keyboard controller the command read input port via Fort 64h. The keyboard controller then transfers the contents of the input port to the output suffer, from which you may read the byte with an IN instruction referring to port 60h. The most ignificant KBLK bit indicates whether the keyboard is locked or not. On the first ATs, the user tad to set a switch to inform the system about the installed graphics adapter type (colour or nonochrome) for booting. The C/M bit indicates the corresponding switch position if your PC s still equipped with such a switch; otherwise this value is not defined, or provides the same nformation as is stored in the CMOS RAM. Using the two AUXD and KBDI bits you can read he serial data stream of the mouse @‘!3/2 only) and the keyboard. Ixample:
read the input port. CJUT 64h. cOh
; output
command *read input portm to
keyboard
controller
wait: IN al, 64h
; read status register of keyboard controller
TEST al, Olh i check whether byte is available in the output JZ wait
; wait until byte is available
IN al, 60h
; read input Port
buffer
byte from the keyboard
; controller's output
buffer into al
1 The output port of the keyboard controller not only supervises the keyboard via KBDO and ICCLK bits (and on a ES/2 also the mouse via the AXDO and ACM bits), but can additionally 1 ock address line A20 of the 80286 and above via the GA20 bit to emulate the 8086/88 address wrap-around. If you want to access the 64 kbytes of the high-memory area above the 1 Mbyte 1 )order, you must set bit GA20. HIMEMSYS does this automatically when you access this sstorage area. As you know, you don’t have the option on the 80286 to switch the processor back to real mode by simply clearing the PM flag. This is possible on the 80286 only by a processor I eset, which must be carried out by hardware. For this purpose, the SYSR bit is implemented in the keyboard controller’s output port. If you set SYSR to 1 then the 80286 carries out a IJrocessor reset. But the start routine of the AT BIOS recognizes (according to the shutdown cstatus byte in the CMOS RAM) whether a boot process is in progress, or whether a program has issued a processor reset via SYSR to switch the 80286 back to real mode. In the latter case, the I310s start routine returns control immediately back to the calling program, for example to the I-IIMEM.SYS or RAMDRIVESYS drivers which access extended memory to store and read data t here. The OUTB and AUXB bits indicate whether the output buffers for the keyboard or the Inouse (I’S/2 only) are available. !xample:
issue processol
reset.
OUT 64h. dOh ; output command *read output port* to
keyboard
controller
wait: INal,64h
TEST JZ IN
;
read status
register
; wait until byte is
wait al,
60h
keyboard
controller
al,
Olh al
keyboard
buffer into al
; set bit SYSR
OUT 64h. dlh ; output OUT 60h.
available
; read output wrt byte from the controller'8 output
OR
of
al, Olh ; check whether byte is available in the output buffer
command -write outPut port' to
; issue proces*or reset
keyboard
controller
1026
Chapter 34
Receiving Keyboard Characters If you didn’t issue a certain keyboard command for which the keyboard returns some data bytes, you will only receive the make and break codes according to the keys pressed or released. The keyboards have a small buffer memory, which usually holds about 20 bytes. Depending upon which keys you are operating, the buffer can therefore accommodate more or fewer key operations, as the scan code for a new MF II key occupies far more bytes than for the ordinary (CAP key, for example. If the internal keyboard buffer is overflowing because the CPU doesn’t read data from the keyboard controller’s output buffer (for example, if IRQI in the 8259A PIC is disabled), then the keyboard places the value OOh or ffh into the internal buffer to indicate the overflow condition. Table 34.5 shows the return codes of the keyboard.
Code
Meaning
OOh ffh 4labh aah eeh fah fch feh 1 h-58h
overflow error or [ key error keyboard ID of MF II keyboard BAT complete code echo after echo command ACK BAT error resend request make and break codes of keys
Table 34.5: Keyboard return codes If the output buffer of the keyboard controller is empty (that is, bit OLJTB in the status register is cleared), then the keyboard transfers a scan code (or return code) from the internal buffer as a serial bit stream to the keyboard controller. The controller in turn places the character into its output buffer, sets bit OUTB in the status register, and issues a hardware interrupt via IRQI (corresponding to INT 09h). On a PC/XT keyboard the handler should first set the SW1 bit of port B so that port A really contains the scan code from the keyboard and not the data from the configuration DIP switches. As is the case for the other keyboards, the scan code can then be fetched by a simple IN instruction referring to the port address 60h. Thus the character has been removed from the output buffer so that the keyboard may pass the next scan code from its internal buffer. The following example illustrates the principle of character passing between keyboard controller and the CPU. To disable IRQl, bit 1 in the IMR of the 8259A PIC is masked. If you try this small program you may clearly see the amount of make and break codes the MF II keyboard generates for SHIFT and other keys if you operate one of the new MF II keys. Example
detect passed scan codes and display them on the screen until ESC is pressed.
:
J
main 0 ( int status, scan code; outp(0x21, for (;;)
(
0x02);
I' lock IRQl 'I /* endless loop for reading characters '/
Keyboards and Mice
1027
for
(;;) ( status.inp(Ox64);
/* wait until character is available in output buffer l
if ((status h 0x01) ==OxOl break; /* leave wait loop if character in output buffer scan code = in~(Dx60);
I' read scan code from output buffer *I
printf("\t%d', scan code);
/* output scan
if (code==OxOl)
I* leave endless loop if ESC is pressed 'I
0utp~0x21.
break;
0x00):
/
I* read status register l /
I'
*I
code in tab-steps *I
release IRQl *I
exit(O);
Commands for the Keyboard The AT and MF II keyboards implement several commands that you pass via the input buffer of the keyboard controller. But note that, for example, the command turn on/off LEDs is meaningless for an AT keyboard, as this keyboard doesn’t have any LED. Table 34.6 summarizes all keyboard commands for the AT and MF II keyboard.
Code
Command
Description
edh eeh fOh
turns on/off the MF II keyboard LEDs returns a byte eeh sets one of three scan code sets and identifies the present scan code set ldentrfies the keyboard (ACK=AT, ACK+abh+41 h=MF II) sets repetition rate and delay of keyboard
f4h f5h f6h feh
turn on/off LEDs echo set/identify scan codes identify keyboard set repetition rate/ delay enable standard/disable standard/enable resend
ffh
reset
f2h f3h
enables the keyboard sets the standard values and disables the keyboard sets the standard values and enables the keyboard the keyboard transfers the last transmitted character once more to the keyboard controller executes an Internal keyboard reset and afterwards the BAT
Table 34.6: Keyboard commands (AT, PS/2)
The following briefly discusses the most important commands. Generally, the keyboard returns an ACK corresponding to fah after every command except echo and resend. Every character from the keyboard to the controller issues an interrupt via IRQl. Normally, the keyboard drivers process only codes between 0 and 127. Thus, ACK and all other return messages are slurred by the keyboard driver. Only if you suppress the generation of INT 09h, for example by masking IRQl in the IMR of the 8259A PIC, can you actually detect the return messages; other-wise the interrupt snatches away the return byte from you, as the keyboard controller issues the interrupt immediately after receiving the byte. -- Turn On/Off the LEDs (edh): After passing the command the keyboard responds with an ACK to the controller, aborts scanning the scan matrix, and waits for the indicator byte from the controller, which you must also pass to the controller via the input buffer. Figure 34.11 shows the structure of the indicator byte.
1028
Chapter 34
7
6
5
4
3
2
1
0
-1
CPSL:
LED tar CapsLock or ShiftLock l=on
*
0=0n
NUML: LED for NumLock 1 =on
O=Ofi
SCRL: LED for ScrollLock 1 =O”
&Off
Figure 34.11: Indicator byte. Example
:
switch on LED for NunLock,
switch
off
all
others.
OUT 60h, edh ; output command for turning on/off the LEDs wait: IN al, 64h
; read Status
register
TEST al, 02h ; check whether input buffer is empty JNZ wait
; input buffer full thus wait
OUT 60h. 02h ; switch on LED for NunLock
Echo feeh): this command checks the transfer path and the command logic of the keyboard. As soon as the keyboard has received the command it returns the same response byte eeh corresponding to an echo back to the keyboard controller.
-
-
Set/Identify Scan Codes (fOh): this command selects one of three alternate scan code sets of the MF II keyboard; Olh, OZh, and 03h are valid. The standard setup is the scan code set 02h. After outputting the command the keyboard responds with an ACK, and waits for the transfer of the option byte. The values Olh, OZh, and 03h select sets 1, 2, or 3; a value of OOh instructs the keyboard to return, besides the ACK, another byte to the keyboard controller upon receiving the option byte which specifies the active scan code set.
-
Identify Keyboard (f2h): this command identifies the connected keyboard. A PC/XT keyboard without a controller doesn’t respond in any way, that is, a time-out error occurs. An AT keyboard returns only an ACK, but an MF II keyboard returns an ACK followed by the two bytes abh and 41h, which are the low as well as high bytes of the MF II ID word 4labh.
Example
:
identify
keyboard.
int keyb_ident(void) ( int
status,
code,
/* function
returns
indicator:O=PC/XT, l=AT, 2=MF II, 3=errOr '1
ret-code;
outp (0x21, 0X02);
/* lock IRQl =I
olltp (0X60, Oxf2);
/* outp"t command 'I
timeout_wal.tO;
/* Wait loog for ACX ',
status.inp(Ox64,);
/* read statue register
'I
if ((status P 0x01) 1.0x01 ( ret~code.0; >
/' no ACK from keyboard+PC,XT
keyboard *I
Keyboards and Mice
to29
else ( code=inp(OxciO);
I' fetch character 'I
if
/* error 'f
(code
1= Oxfa) (
ret_ccde.rl; else C I' wait loop for 1st ID byte l
timeout_waito; status=inD(Ox64); if
((status
/
/' read status register */
h 0x01) I=OxOl) (
I* no ID byte from keyboard-AT keyboard l
/
ret_code=1; else
(
code=inp(Ox60);
/* fetch 1st ID byte l /
if (code !=Oxab) {
/' error l /
ret_ccde=Ii; else < timeout_wairo;
I* Wait 1OOD for 2nd ID byte */
StEZtUS=iRD(OX64);
I* read status
register '/
if ((status k 0x01) ,= 0x01) ( /* no 2nd ID byte from keyboard+error
l
/
ret_code=6: else ( code=inp(Ox60);
/* fetch 2nd ID byte */
if (code 1=0x41)
( /* error '/
ret_code.'l; else ( ret_code-2; )
1 ) OUtD(oX21,
0X00);
/"
return(ret_code);
1 -
i 1
Set repetition rate/delay (f3h): with this command you may set the repetition rate as well as the delay of an AT or MF II keyboard. After outputting the command the keyboard returns an ACK and waits for the data byte, which you may pass via the input buffer to the keyboard. Figure 34.12 shows the structure of the data byte.
t Example: Bet 1
30 charactera/ and 150~ delay
int max_rate(void) ( int
r
release IRQl */
/* return keyboard identifier 'I
status,
I'
routine for maximum keyboard rate; return code:
ret-code;
outp(ox21,
0x02);
I' lock IRQl'/
outp(Ox60,
Oxf3);
I' OUtPUt comaand */
timeout_waito;
I' wait loop for ACK l
status=inD(Ox64);
/* read staturn regieter '/
/
O.o.k., -&error '/
Chapter 34
1030
7 6 5 4 3 2 1 0 IllI I 0 Dela Rate
Delay:
delay [ms] 01=500ms
00=250ms
10=750ms
11=1000ms
repetition rate [characters/sj Rate: 00000=30.0 00001=26.7 00010=24.0
00011=21.8
00100=20.0
00101=18.5
00110=17.1
00111=16.0
01000=15.0
01010=12.0
01011=10.9
01100=10.0
01101=9.2
01110=8.5
01111=8.0
01001=13.3
10000=7.5
10001=6.7
10010=6.0
10011=5.5
10100=5.0
10101=4.6
10110=4.3
10111=4.0
11000=3.7
11001=3.3
11010=3.0
11011=2.7
11100=2.5
11101=2.3
11110=2.1
10111=2.0
Figure
if
34.12: Repetition rate and delay. ((status
L 0x01) !=OxOl) (
ret_code=-1;
I' no ACK
from keyboard--terror *I
1 else ( outp(Ox60,
OOh);
ref_code.O;
I' outPut data byte l / /'
everything o.k.
"I
) outp(Ox21,
0x00);
return(ret_code);
f' release IRQl *I /* return code */
Resend (feh): if an error has occurred during the course of transferring data between the keyboard and keyboard controller, you can instruct the keyboard with this command to pass the last character once again. Reset (ffh): this command carries out an internal self-test of the keyboard. After receiving the command byte the keyboard first outputs an ACK for this purpose. The keyboard controller must respond by raising the data and clock line to the keyboard to a high level for at least 500 us. Afterwards, the keyboard carries out the in-built BAT (basic assurance test). Upon the BAT’s completion, it transfers a code aah (test passed) or fch (keyboard error) to the controller. You can set the data and clock line to a high level using bits 6 and 7 in the output port.
34.2 Mice and Other Rodents For a long time, mice have been indispensable on Apple computers for using programs there. But on IBM PCs the mouse made its debut only once Windows came onto the market, as handling Windows with the usual keyboard and without a mouse is quite ponderous. O n programs that allow an operation both by hotkeys and by a mouse, well-trained users work faster if they use only the hotkeys (that is, the keyboard) when selecting menu items. Thus, mice are surely not the ultimate solution, but they are at least very useful for graphics-oriented applications or drawing programs. Some mice can confidently be called rats if they exceed a weight of one pound or the required space for their movement is at least as large as your desk! In the following, however, we only discuss mice.
Keyboards and Mice
1031
34.2.1 Structure and Function A mouse is structured quite simply. The central part is a steel ball, coated with gum or plastic, which rotates as the mouse is moved. This movement is transmitted to two small rollers perpendicular to each other, which convert the mouse movement in the X and Y directions into a rotation of two disks, with holes. These disks alternately close or open a photosensor assembly when rotated, that is, the mouse is moved. Thus the number of interruptions and releases of the photosensor assembly is an unambiguous quantity for the amount of the mouse’s movement in the X and Y directions, and the number of these interruptions and releases per second specifies the speed of this movement. Such mechanical mice are used most today. A newer concept is the so-called optical mouse, where sensors on the bottom detect the mouse’s movement on a specially patterned mouse pad. A special patterning is required so that the mouse’s logic can determine the direction and speed; a normal mouse pad would only confuse the mouse.
:
All mice further have two or three buttons. Originally, Microsoft intended three buttons as the standard, but actually implemented only two on its own (Microsoft) mice. Many compatible mice therefore also only have two buttons. The information as to how far the mouse has moved and which buttons have been pressed or released is passed to the PC via a cable or an infrared beam.
e g
34.2.2 Mouse Driver and Mouse Interface Most mice are connected to the serial interface. Via the various control lines, the mouse is then supplied with energy; wireless infrared mice need a battery, of course. When you move your mouse or press or release a button, the mouse generally passes a mouse data packet to the interface, which in turn issues an interrupt. For handling this interrupt a Mouse driver is needed, which intercepts the interrupt for the corresponding serial interface, reads the mouse data packet, and updates internal values that concern the current keyboard status as well as the mouse’s position. Moreover, the mouse driver provides a software interface via mouse interrupt 33h for interpreting these internal values. You will find a list of all INT 33h functions in Appendix J. The mouse driver is not only responsible for servicing the interface interrupt and providing the mentioned values, but also for moving the mouse yointer over the screen. This pointer seems to follow the mouse’s movement. To clear up a common misconception: don’t move the mouse pointer over the screen using the mouse itself; you only continuously issue interrupts when moving the mouse, during the course of which the amount of mouse movement is passed to the interface. The mouse driver detects these positional signals from the mouse and converts them into a movement of the mouse pointer on-screen. For this purpose, it deletes the mouse pointer at the current location, writes the old screen contents.at this location again, reads the screen contents at the new location, and overwrites the location with the mouse pointer. For the mouse driver you can choose from three options: hardware and software mouse pointer in text mode, as well as a graphics mouse pointer in graphics mode. You may define the type
1032
Chapter 34
and shape of the mouse pointer by means of functions 09h and Oah of INT 33h. The hardware mouse pointer is nothing more than the conventional cursor that the mouse driver moves onscreen according to the mouse’s movements. For the software mouse pointer you can select any character; as a standard an inverted space character is defined. Thus in text mode the mouse pointer always has one character corresponding to a video memory word of two bytes (attribute and character code). The functions of INT 33h for reading the mouse pointer positions return the position in units of pixels, that is, in the case of a character box with 8 + 16 pixels for a VGA adapter, the X coordinate multiplied by the values 0, 8, 16,. . ,472. You must first divide these quantities by the X dimension of the character box to determine the row and column of the mouse pointer in text mode. The mouse pointer is not simply output on-screen, but combined bit by bit via the so-called screen and cursor masks of the mouse pointer with the video memory word at the mouse pointer’s location: new video memory word
= (old word AND screen mask) XOR cu,xor mask
The combination is carried out in two steps. First, the mouse driver forms the AND value of the old video memory word and the screen mask. Thus, using the screen mask you can clear individual bits in the video memory word. Second, the XOR value of the AND result and the cursor mask is formed. Example :
old word *Ax corresponding to ASCII code 41h with attribute Olh, thus old word is equal to 4101h; screen mask 4040h. cursor mask Oflfh. new word= (4101h ANTJ 4040h) XOR = (4000) XOR
Oflfh
Oflfh
=4flfh Resulting
mouse
pointer: character 4fh corresponding to x0*, attribute equal to
lfh.
Thus, with the cursor mask you define the character and colour of the mouse pointer. Table 34.7 shows the combination table for this. Screen bit
Screen mask bit
Cursor mask bit
Resulting bit
bit bit
0 0 1 1
0 1 0 1
0 1
bit
bit
bit (unchanged) bit (inverted)
Table 34.7: Combining screen bit, screen and cursor mask
Y OU may clear, set, leave unchanged, or invert individual bits in the video memory word. Figure 34.13 shows the structure of the video memory word for a character in text mode. In Section 34.2.3 and example for using the screen and cursor mask is discussed. In graphics mode the mouse pointer is represented similarly. Also in this case, the mouse driver first forms the AND value of the present screen bit and the screen mask, and afterwards the XOR value of this result and the cursor mask when displaying the mouse pointer. ln graphics
eyboards and Mice
1033
BLNK: blink 1=on o=off BAK,-BA&: background colour (from present palette) INT: intensity l=high intensity O=normal intensity FOR,-FOR,: foreground colour (from present palette) CHFfdHR,: character code
igure 34.13: Video memory word structure for a character in text mode.
node, one pixel is assigned one or more bits; if you are defining the mouse pointer you thus ave to note the number of bits per pixel. Furthermore, the mouse pointer size here is always 6 I 16 pixels, and as standard an arrow is defined, which you may alter by means of function 9h. An example of this is discussed in the next section. dany mouse drivers have problems with a Hercules card in graphics mode. The 720 * 348 esolution, differing from the geometry of standard IBM adapters, as well as the character box f 9 L 14 pixels, often gives rise to a phenomenon whereby the individual points of the graphics louse pointer are widely spread over the whole screen, and therefore don’t form a coherent ?ouse pointer. tesides mice connected to a serial interface, there are further so-called bus mice which come fith an adapter for a bus slot. They access the PC system bus directly, and therefore don’t ccupy a serial interface. The structure and functioning are largely the same as for conventional lice connected to a serial interface; INT 33h provides the same results. On the F’S/2 the mouse gas integrated into the system from the beginning, so that here, besides a keyboard connector, connection for a I%./2 mouse is also implemented as standard.
14.2.3 Programming the Mouse ‘or programming and interrogating the mouse the functions of mouse interrupt 33h are availble. INT 33h is a software interface to the mouse driver to determine the position of the mouse lointer and the number of 16 are also valid. In particular, the segment addresses and the division into screen pages, as well as RAM banks, remain the same in these modes. Text Mode (O-3, 71 The character codes are stored in memory layer 0 and the accompanying attributes in memory layer 1 of the VGA video RAM, as is the case on the EGA. The VGA address transformation logic carries out a certain combination of the actually parallel storage layers so that the organization and structure of the video RAM, as well as the address calculation, are identical for the CPU, as on a CGA or MDA. Unlike the EGA, the VGA can also carry out MDA’s monochrome text mode 7 with an enhanced resolution of 720 $400 pixels corresponding to a character matrix of 9 * 16 pixels. Graphics Mode (Modes 4-6, 13-19) The VGA carries out CGA graphics modes 4-6 and EGA graphics modes 13-16 in the same way as on the original adapters. The organization and structure of the video RAM, as well as the address calculation, are identical to the CGA and EGA in these compatible modes.
Graphtcs Adapters
1095
Additionally, the VGA implements three new VGA modes: graphics modes 17,18 and 19. VGA mode 17 serves mainly for compatibility to the graphics adapter of the I’S/2 model 30, the MCGA (multi colour graphics array). The bits for the individual pixels are located only in layer 0 - the three layers 1, 2 and 3 are not used. Every pixel is assigned one bit, that is, two colours may be displayed. In VGA mode 17,80 bytes per line (50h bytes) are required (640 pixels/8 pixels per byte). Every screen page comprises 40 kbytes (a000h bytes). The address of the byte with the pixel in line i, column j (i = 0 to 479, j = 0 to 639) is therefore: address(i,j)=aDOOOh+50h+j+~NT
(118)
In VGA mode 18 the four bits of a pixel are distributed to the four memory layers, as is the case for the EGA. For setting one pixel with the intended colour you must therefore additionally address the four layers besides the byte. In the high-resolution VGA mode 18 with 16 different colours per line, 80 bytes (50h bytes) are also required (640 pixels/8 pixels per byte). Every screen page comprises 40 kbytes (a000h bytes). Thus the address of the byte with the pixel in line i, column j (i = 0 to 479, j = 0 to 639) is therefore: eddress(i,j)
=aOOOOh+'TOh+j +INl' (i/B)
In VGA mode 19 with 256 colours per pixel, the video RAM is again organized very simply as a linear array, in which one byte corresponds to one pixel. The byte value specifies the colour of the pixel. The bits are not distributed to various memory layers here. This mode requires 320 bytes (140h bytes) per line corresponding to 320 pixels/l pixel per byte. The single screen page thus comprises 64 kbytes (10000h bytes), but only 64 000 bytes are actually used. The remaining 1536 bytes remain free. The address of the pixel in line i, column j (i = 0 to 199, j = 0 to 319) then is: addrsss(i,j)
=aOOOOh+140h.j+i
35.6.6 SVGA SVGA is upwards compatible with CGA, EGA and VGA, so that all features used there remain unchanged here. In particular, the segment addresses and the splitting into monitor pages and video RAM banks remain the same in these modes. Text Mode Organization and layout of the video RAM, and calculation of character addresses and their attributes are, for the CPU, identical to VGA. Only the page in SVGA text mode, due to the expanded resolution to 132 * 60 characters, is correspondingly larger. Graphics Mode (Modes 4-6, 13-19) Owing to its upwards compatibility, SVGA can perform CGA, EGA and VGA graphics modes unchanged. The organization and layout of the video RAM, and calculation of the addresses in
1096
Chapter 35
these compatible modes, are identical to CGA, EGA and VGA, respectively. In the original SVGA modes, as in VGA mode 19, the video RAM is organized very simply as a linear byte field, in which a nibble (16 colours) or a byte (256 colours) corresponds to a pixel. The nibble or byte value indicates the coiour of the point on the screen. Depending on the resolution, this mode requires up to 1280 bytes per line, or 1.25 Mbytes for each page (1280 * 1024 pixels with 256 colours). In mode 105h (1024 * 768 pixels with 256 colours), the offset of the byte with the pixel in line i, column j (i = 0 to 767, j = 0 to 1023) in video segment aOOOh would thus be: 0ffsetti.j) =300h*j+i
For the last point of the screen (i = 767, j = 1023), together with the segment aOOOh, this already lies beyond lM, thus it covers the entire address area reserved for the system and the additional BIOS. In SVGA the problem is solved by the bank structure of the video RAM. Each 64 kbyte sized segment is addressed with the help of a so-called bunk select register. You can see its layout in Figure 35.24. The four least significant bits A19-A16 select the bank, which is blended into the 64 kbyte window after aOOO:OOOOh. Together with other words, A19-A16 represent the four most significant bits of a 20-bit address. In this way, up to 1 Mbyte of video RAM can be addressed. You can address the bank select register at port 3d5h, after you have selected it with the index 35h through the index port 3d4h.
Figure 35.24: The Omk select register of the SVGA.
Example:
Select the bank llh=lOllb through the bank select register. mov dx, 3d4h o"t dx, 35h
; Load the I/O address of the index port to dx Write the index 35h into the index port, that is, select the bank select register at the data port 3d5h mov dx, 3d5h ; Load the I/O address of the data port to dx in al, dx Read old value and al, llllOOOOb ; A19-A16 reset, leave reserved bits unchanged or al, OOOOlOllb ; A19-A16 set to 1011b. leave reserved bits unchanged mov dx, 3d5h ; Load the I/O address of the data Pcxt to dx O"t dx, a1 Write to the bank select register
As before, as many bits are assigned to a pixel as are necessary for the selected depth of colour. Originally (and strictly in accordance with VESA SVGA), only 256 different colours could be represented at the same time. Here, every pixel is assigned a byte. HighColor mode with 256k colours, and TrueColor mode with 16M colours, require 18 and 24 bits respectively for one pixel. They follow each other sequentially in the video RAM. Even with very large video memories (1 Mbyte and more), the video RAM is organized as a linear byte field, in which 64 kbyte or 128 kbyte sized windows are blended into the address area after aOOOOh (the standard video s%merit), as with expanded memory. A further method of opening a window in the large video RAM after aOOO:OOOOh is provided by the subfunctions Olh and 05h of INT lOh, function 4fh (see Appendix K.4). The subfunction Olh indicates a buffer at offset 04h, into which units of this window can be shifted by subfunction
Graphics Adapters
1097
05h. With subfunction 05h, you can then select the start address of the window within the video RAM. It then appears at the usual video address aOOO:OOOOh in the CPU address area. Due to the combined efforts of a window granularity and a 16-bit window address in the granularity unit, almost any sized video RAM is possible. If you select the maximum video window size of 128 kbytes (17 address bits) between aOOO:OOOOh and bOOO:ffffh as granularity, then a video RAM of 8 Gbytes (33 address bits) is possible with the 64k window address in DX (16 address bits). Example:
In the CPU address area after aOOO:OOOOh,
a window in the video RAM should be
opened after the address 768k; the granularity is 1 kbyte. mov ax, 4f05h ; Transfer function code to ax mov bh, OOh
; Set window address
mov dx, 30h
; Window address in units of 1 kbyte
int 10h
; Call function through SVGA interrupt 10h
(3OOh=768d)
35.6.7. Accelerators For accelerators, a direct access to the video memory is either not possible (original 8514/A modes), or unnecessary due to the existing graphics functions. The correct interface to the adapter is the supplied Windows or OS/Z driver. It takes advantage of the graphics accelerator functions. To date, no binding standard has been developed in the area of accelerators (due to the Windows driver interface at a software level, this is also unnecessary), so BIOS and adapter manufacturers are free to do as they please - even when the same graphics accelerator is used. For this reason, a generalized overview regarding the programming of accelerators is not possible. I would like to add that with an 53 adapter, you can access the video RAM in exactly the same way as is the case with SVGAs. Thus, here also, a bank select register is available at port 3d4h (index register; for bank select register: index 35h) and 3d5h (data register); with their help you can blend a 64 kbyte sized segment of the video RAM into the CPU address area after a0000h. As most accelerators are compatible with SVGA, naturally, you can also use the VGA and SVGA functions of the BIOS listed in Appendix K. As hardware-like software, these functions naturally make full use of the accelerators.
35.6.8 Summary Table 35.16 lists together the segments and offsets the characters or pixels in row YW, column cl of the screen page pg in video RAM.
35.7 Graphics Processor Versus Local Bus The two concepts of graphics adapters with graphics processors and local bus have the same target: to increase the speed of picture formation. Here the similarity ends. While a local bus should enable large quantities of data produced by the CPU to be transferred to the video RAM, I graphics processors attempt to do exactly the opposite, namely to transfer only a few commands 1 and parameters. ‘They describe the content of the screen area which should be drawn. Here, the ; production of the large amounts of data necessary for the screen rewrite is performed on the
_
1098
Chapter 35
i cl*’ pg3’
Layers
Mode
Segment Offset
rw”
0. 1
b800h b8OOh bOOOh bOOOh bOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh aOOOh bOOOh
&24 O-39 o-7 IL24 c-79 o-40-199 O-319 0 lx199 O-639 0 O-24 O-79 - Cl99 o-319 o-7 o-3 O-199 O-639 &3 O-3 &349 @639 O-l 0, 2 o-349 C-639 O-l c-3 0479 O-639 - O-479 o-639 o-3 @I99 O-319 - O-399 o-639 - &I79 &639 o-599 O-799 - O-599 O-799 - O-767 cl023 - O-767 O-1023 - O-1023 O-1279 - O-1023 C&1279 - c-347 o-719 O-l -
2. 3 4. 5 6 7 13 14 15 16 17 18 19 100 101 102 103 104 105 106 107 HGCG4' " I) 31 a1
800h*pgt50h*rw+OZhtcl lOOOh~pg+aOh*rw+O2h~cl 2000h&v MOD 2)+ 50h*INT(rw/2)+INT(cl/4) 2000h&v MOD 2)+,50h.INT(rwR)+INT(cW8) aOh*nw+OZh*cl 2000h*pg+25h*cl+INT (w/8) 4000h*pg+50h*cl+INT (r&8) 8000h*pgt 50h*cl+INT (rw!8) 8000h*pg+ 50h*cl+INT @w/8) 50h*cl+INT (NY/~) 50h*cl+INT @n/S) 140h*cl+nv 280h*cl+rw 280h+cl+rw 190h*cl+rw 320h*cl+rw 200h+cl+rw 400h*cl+w/ 280h*cl+rw 500h*cl+rw 8000h*pg + 2000h*(nw MOD 4)+ 90*INT(rw/4)+INT(cl/8)
line number column number screen page number graphics mode of the Hercules card
Table 35.16: Addresses in video RAM
adapter, the quantity of data to be transferred is typically reduced by a factor of 100 and, therefore, the transfer time is considerably less. Modem graphics adapters with a PC1 and VLB interface often combine both concepts. Here, the accelerator can be regarded as a simple graphics processor. With the local bus concept, as before, the CPU must calculate the values for each pixel and SO is constantly interrupted by operations such as the periodic timer interrupt, memory refreshing, mouse movement, etc. A relatively protracted task switch in a multitasking operating system such as OS/2 or Windows NT, for example, can slow down the sensitive picture formation. In addition, no intelligent CPU is necessary for the production of the pixel data; a simple but highly specialized and, therefore, faster chip can accomplish this just as well. Also, with local bus the CPU is overloaded with quite trivial jobs, thus blocking more complicated calculations which require a higher intelligence. Although in the beginning many graphics adapters for local bus systems included an external 32.bit interface, they addressed the video RAM internally, however, with only 16 or even 8 bits. The performance values were correspondingly disappointing. The reason was that manufacturers wanted to use the better known graphic chips available at the time, for example the ET4000 from Tseng with its 16-bit technology. For ISA adapters it was entirely sufficient, but when installed on a local bus graphics adapter it proved to be a restriction. In the meantime, it has been replaced by 32-bit, 64-bit and even 12%bit successors which make full use of local bus advantages with a 32-bit width. Lastly, the computing speed
Graphics Adapters
1099
of the CPU limits picture formation. For example, not just repeated MOV instructions are necessary for the output of a square, but comparisons and conditional jump instructions are also necessary to define the borders of the square. The switching between different memory levels or banks also slows down the picture formation. Graphics processors and also accelerators can relieve much of the monotonous work from the CPU. To produce a square filled with a uniform colour, for example, it is only necessary to fill a seemingly large area of the video RAM with a uniform data pattern. To this end, the CPU transfers only the coordinates of two comer points and the corresponding colour value. Owing to very high clock speeds in such special processors (60 MHz and more), simple graphics patterns such as lines, squares and circles are produced very quickly. Here, any lower data transfer rate between the CPU and the graphics adapter does not play a significant role, because the graphics processor requires much longer to produce the pixel data than the data transfer to the graphics processor takes. Graphics processors have their first real advantage in multitasking operating systems, when task switches, refresh cycles of the main memory or hardware interrupts occur. While the CPU handles these requests, the graphics processor forms the picture in the background, undisturbed, and in parallel with the CPU operation. A further problem is that the video RAM is accessible only during specific short time slots. The RAM chips on the adapter can only be either read or written to. If the CRTC chip of the adapter continuously reads the video RAM during the formation of a line, an access by the CPU is not possible even with a local bus. The processor must wait for the next line or screen return. For this, the adapter simply deactivates the RDY bus signal for as long as necessary, causing the CPU to insert the associated number of wait cycles, In principle, this also affects adapters with graphics processors. Only here, the accesses by the CRTC controller and the graphics processor can be coordinated such that the graphics processor, for example, writes to a RAM bank of the video memory which the CRTC controller is not currently reading for the formation of the current raster line. This is not possible with an external CPU, because the CPU does not know which address is currently being read and therefore which, in principle, are not accessible. Here there is also a clear advantage for adapters with graphics processors. A way around this problem is through the use of dual port memory in the video RAM. Here, the video memory is blocked during the refresh cycles; the reading of data by the CRTC controller no longer blocks the access. When the local bus was introduced, the ISA graphics adapter with dual port memory was frequently as good as local bus graphics adapters containing normal DRAM components in some benchmark tests. This, however, has changed drastically now. The combination of powerful accelerators with fast VRAM chips and the PC1 bus has left all ISA products far behind. Section 35.8 discusses a typical example of a modern graphics accelerator: S3’s Trio64V+. The last, frequently neglected, but nonetheless deciding factor concerns the driver for a graphics adapter and how it is required to tie into the Windows or OS/Z system and represent a clearly defined programmable interface. While the programming of such a driver is quite simple for a few intelligent graphics cards (after all, only a very limited selection of adapter functions are available), the more intelligent accelerator cards have considerably higher requirements. For example, the programmer must weigh up whether or not it is worth while making full use of the graphics functions of the accelerator chips for a specific function, or using the conventional (direct access) method using video RAM. Thus, it is not surprising that the quality of the drivers supplied frequently varies, even though this point contributes to performance by a factor of at least two.
1100
Chapter 35
35.8 A Modern Graphics Accelerator - Trio64V+ The Trio64V+ is a typical example of a modem graphics accelerator. The following is a short list of its most important characteristics: -
64-bit data bus between memory and graphics chip, 24-bit TrueColor RAMDAC with up to 135 MHz pixel rate, streams processor for overlapping various input sources, support of MPEG chips (53 Scenic/MX2) through a dedicated bus (Scenic Highway), 1 to 4 Mbytes DRAM or ED0 video RAM, CRT control registers as the 6845, full VGA support in VGA-compatible modes.
35.8.1 Terminals and Signals A short description of the various signal groups which connect the chip to the outside world may be helpful in understanding the function of the Trio64V+. Figure 35.25 shows the pin layout. The chip comes in a PQFP with 208 terminals and is obviously much more complex than the 6845 (Figure 35.4). Because of lack of space all signals are discussed in groups. PCI Interface
- - - The signals AD31-ADO, C/ BES-C/ BEO, DEVSEL, FRAME, IRDY, IDSEL, INTA, PAR, RESET, SCLK, STOP and TRDY form the interface to the PC1 bus. The meaning of these signals is given in Section 24.11. Because the TrioblV+ does not represent a PC1 busmaster, the - control signals m-m, LOCK, PHLD, PHLDA and W-REQO for bus arbitration are not implemented. Table 35.17 shows the PC1 header information to identify the Trio64V+ against the BIOS. Clock Signals XIN, XOUT (External In, External Out) For the use of an external clock generator, the XIN terminal is supplied with the clock signal; for the use of an external oscillator crystal, this crystal will be connected between XIN and XOUT. Video RAM Interface The signals of this interface control the video DRAM S. In addition to the standard page mode DRAM S, the Trio64V+ can also use ED0 DRAMS. The following lists briefly the control signals of the DRAM interface.
CAS7-CASO (0) These eight column address strobe signals indicate the transfer of valid column addresses. If the feature connector is enabled, m-a remain deactivated.
Graphics Adapters
figure 35.25: Terminal layout of the
Trio64V+.
Register
Offset
Width
Value
Comment
Vendor Device Subclass code Base class code
OOh 02h Oah Obh
1 word 1 word 1 byte 1 byte
5333h 8811h OOh 03h
vendor ID for 53 device ID for Trio” VGA device2’ video controllerzl
‘) all Trios have thts ID; a further differentiation IS done by means of a chip ID register which IS not part of the PCI header 21 Table 24.3
Table 35.17: PC1 header information of the Trio64V+
1102
Chapter 35
MAS-MAO (0) The nine memory address signals carry the nine address bits of the row ( RASx active) or column address ( CASx active). Together they form an l&bit address which can access 256 kbytes. If we add the eight CASx signals (according to three address bits and the two RASx signals according to one address bit) we get a maximum addressable video RAM of 2r8+‘+’ = 2” = 4 Mbytes.
RAS1-RASO (0) The two row address strobe signals indicate valid row addresses. WE (0)
For an active write enable signal WE the issued memory access is a write access.
The two output enable signals OE cause the memory chips to output data. Depending on the presence of fast page mode ( OEl always high) or ED0 DRAM and depending on the activation _of the feature connector ( OEl = OEO) DRAM memory chips are controlled according to their capabilities for data outputs. OEl is multiplexed with RASl (note that in fast or hyper page mode RASl is only required for a page change). PD63-PDO (0) The 64 display memory pixel data signals form the 64-bit data bus to the video RAM. Therefore, the Trio64V+ is a 64-bit graphics controller; it processes the pixel data stored in video RAM in sections of 8 bytes and thus has an external data bus with the same width as, for example, the Pentium or Pentium Pro. Parts of the high-order dword PD63-PD32 are used by the feature connector. Video interface AR, AG, AB (0) The there Analog Red, Green, Blue terminals provide the RGB signal for the monitor. BLANK, VFCBLANK (I/O, I/O) The signals blank the screen or receive a signal from the feature connector to blank the screen. VFCBLANK is the BLANK signal for Trio64V-compatible VAFC operation (without 0
*I binary coded decimal
Function 03h - Set Time (Real-time Clock) This function sets the time of the real-time clock chip MC146818.
Register
Call value
Return value
AH CL CH DL DH Carry
03h minute*’ hour*’ daylight saving (l=yes, O=no) second*)
OOh
error if < > 0
*I blnaly coded decimal
Function 04h - Read Date (Real-time Clock) This function reads the date from the CMOS RAM in the real-time clock chip MC146818.
Register
Call value
AH CL CH DL DH Carry
04h
*’ bmary coded decimal
Return value year*’ century*’ day*’ month*’ error if < > 0
BIOS Clock Interrupt lah and Functions 83h/86h of INT 15h
1153
Function 05h - Set Date (Real-time Clock) This function sets the date in the CMOS RAM of the real-time clock chip MC146818. Register
Call value
AH
05h
CL CH DL DH Carry
year*’ century*’ day*’ month*’
l
Return value
error if < > 0
’ bmary coded dectmal
Function 06h - Set Alarm Time (Real-time Clock) This function sets the alarm time of the real-time clock chip MC146818. If the alarm time is reached, the MC146818 issues an interrupt 4ah. Before setting a new alarm time you have to clear an active alarm time via function 07h. Register
Call value
AH CL
06h minute*) hour*’ second*)
CH DH Carry l
Return value
error if < > 0
I bmary coded decimal
Function 07h - Clear Alarm Time (Real-time Clock) this function clears an active alarm time and has to be called before setting a new alarm time. Register
Call value
AH Carry
07h
Return value error If < > 0
E.2 Wait Functions 83h and 86h of BIOS Interrupt INT 15h i
Function 83h - Set or Clear Wait Time Interval
t If AL = OOh this function sets the high bit of a byte in main memory at a user-defined address L when the programmed time interval has expired. After a call to this function, the calling program continues at once. After expiry of the wait time interval, the real-time clock issues an interrupt. The wait time interval has to be specified in units of one microsecond, but because
1154
Appendix
E
of the usually programmed real-time clock frequency of 1024 Hz the actual time resolution is about 976 ps, that is, l/1024 Hz. If AL = Olh the active wait time is disabled. Subfunction OOh - Set Wait Time Interval Register
Call value
Return value
AH AL cx DX BX ES Carry
83h OOh time interval (high)” time interval (low)” offset of target byte” segment of target byte”
OOh register B of MC146818
error if < > 0
‘) bit 7 of target byte will be set after exp~ry of tmv? mterval
Subfunction Olh - Set Wait Time Interval Register
Call value
AH AL
83h Olh
Return value
Function 86h - Wait Until Time Interval Has Elapsed This function suspends execution of the calling program until the programmed time interval has elapsed. Afterwards, the program execution continues. The wait time interval must be specified in units of one microsecond, but because of the usually programmed real-time clock frequency of 1024 Hz, the actual time resolution is about 976 ps, that is, l/1024 Hz. Register
Call value
AH cx DX Carry
86h time interval (high)” time Interval (low)”
” I” lls
Return value
error if < > 0
F BIOS Interrupt INT 13h If you are using the BIOS Interrupt 13h, which is available for floppy drives as well as for hard disk drives, you should observe the following rules: With functions that refer to hard disk drives, bits 6 and 7 of the sector register CH represent bits 8 and 9 of a lo-bit cylinder number; the remaining eight bits of the cylinder number are passed in CH; therefore cylinder numbers O-1023 are possible. For read, verify, or write operations you have to provide a buffer which is large enough to accommodate all sectors to be read, compared, or written. The drive count starts with OOh; for hard disk drives, additionally bit 7 is set so that here the drive number count starts with 80h. The error codes are returned in the AH register and simultaneously stored in the BIOS data area at 40:41h (floppy disk) and 40:74h (hard disk), respectively. The first floppy drive A: has the drive number OOh, the second drive the number Olh. The first hard disk is assigned number 80h, the second number 81 h. For every sector to be read, verified, or written you have to provide 512 bytes. To read three sectors, for example, a buffer comprising 1536 bytes is required. If you have to format four sectors, for example, you have to pass four format buffers.
F.l The Functions Function OOh - Initialize (Floppy/Hard Disk) This function initializes the floppy controller and the drive and eventually aborts the current function. Upon completion of this function, controller and drive are in a well-defined state. Register
Call value
Return value
AH DL Carry
OOh drive”
error code” error if < > 0
I) floppy disk drws: OOh
Function Olh - Read Status (Error Code) of Last Floppy or Hard Disk Operation (Floppy/Hard Disk) This function determines the termination status of the last hard disk or floppy drive operation. The status code returned in register ah has the same format as immediately after termination of an operation. The function is useful if you don’t want to determine the status upon completion of an operation, and the content of ah with the status byte has already been destroyed by other instructions.
Appendix F
1156
Register
Call value
Return value
AH DL Carry
Olh drive*’
error code”
I)
error If < > 0
floppy disk drwes OOh
Function 02h - Read Sectors (Floppy/Hard Disk)
One or more sectors are read from floppy/hard disk into the read buffer. The buffer must be large enough to accommodate all read sectors. If that is not the case, function 02h overwrites data in main memory, and a system crash is the result. Register
Call value
Return value
AH AL CH CL DH DL ES BX Carry
02h number of sectors to read track/cylinder sector head drive segment of read buffer offset of read buffer
error code*’
error if < > 0
Function 03h - Write Sectors (Floppy/Hard Disk) This function writes one dr more sectors from the write buffer in main memory onto the floppy or hard disk. The buffer contains all data to be written. Note that the data transfer is carried out with 512 byte blocks only. If your buffer is only partially filled with write data, function 03h transfers the other, unintended data onto disk, too, until all sectors programmed via the al
register are written. Register
Call value
Return value
AH AL CH CL DH DL ES BX Carry
03h number of sectors to write track/cylinder sector head drive segment of write buffer offset of write buffer
error code*’
error If < > 0
BIOS interrupt INT 13h
1157
Function 04h - Verify Sectors (Floppy/Hard Disk) This function compares the contents of the verify buffer in main memory with the contents of one or more sectors on the floppy or hard disk, or determines whether one or more sectors can be found and read, and whether they return a valid CRC code. In the last case no data is compared.
AH AL CH CL DH DL ES BX Carry
Call value
Return value
04h number of sectors to verify track/cylinder sector head drive segment of verify buffer offset of verify buffer
error code*’
error if < > 0
*) see F.2
Function 05h - Format Track or Cylinder (Floppy/Hard Disk) This function formats the sectors of one track or one cylinder. On an AT you have to fix the medium type with function 17h or 18h first. For the formatting operation, a format buffer is necessary which contains the format information for every sector to format. If you want to format several sectors in one instance, the format buffer must be large enough to accommodate the format information for all sectors. The controller writes the information in the format buffer into the ID-field of the respective sector, and uses it to determine the correct sector afterwards when reading or writing data. Call value
Return
AH AL CH CL DH DL ES BX Carry
OSh number of sectors per track track/cylinder sector number head drive segment of format buffer” offset of format buffer”
error code”
” see *’ see i
value
Register
error if < > 0 F 2 F 4
1
Function 06h - Format and Mark Track Bad (Hard Disk)
t
This function
marks a track with more than one bad sector entirely as bad so that this track is not used for further data recording. The function is only valid for an XT hard disk controller.
1158
Appendix F
Register
Call value
Return value
AH AL CH CL DH DL Carry
06h Interleave cylinder sector head drive
error code*’
l
error if < > 0
’ see F.2
Function 07h - Format Drive (Hard Disk) This function formats the drive beginning with the specified start cylinder. The function is only valid for an XT hard disk controller. Register
Call value
Return value
AH AL CH CL DH DL Carry
07h interleave cylinder sector head drive
error code*’
error if < > 0
Function 08h - Determine Drive Parameters (Floppy Drive) This function determines the geometric parameters of a floppy drive. The data are extracted from a BIOS table and reflect the geometry of the installed drive, but not that of the inserted data medium. Register
Call value
Return value
AH BH BL CH CL DH DL ES DI Carry
C)Bh
error code” 0 drive type2’ number of cylinders - 1 sectors per track - 1 number of heads - 1 number of drives parameter table segment parameter table offset error if < > 0
drive
” O=hard disk. 1=360 kbyte, 2=1.2 Mbyte, 3=720 kbyte, 4=1.44 Mbyte
BIOS Interrupt INT 13h
1159
Function 08h - Determine Drive Parameters (Hard Disk) This function determines the geometric parameters of a hard disk drive. Register
Call value
Return value
AH AL CH CL DH DL ES DI Carry
08h
error code” 0 number of cylinders - 1 sectors per track - 1 number of heads - 1 number of drives parameter table segment parameter table offset error if < > 0
drive
‘1 see F.2 21 O=hard disk, 1=360 kbytes, 2=1.2 Mbytes, 3=720 kbytes. 4=1&l Mbytes
Function 09h - Specify Drive Parameters (Hard Disk) This function specifies and adapts the geometric parameters of a hard disk drive. The respective parameters are stored in a table (see F.3) whose far address is hold by the pseudo-interrupt vectors 41h and 46h, respectively. After a call to this function, the BIOS uses the values stored in the respective table. Register
Call value
Return value
AH DL Carry
09h drive
error code*’ number of drives error if < > 0
*I see F.2
Function OAh - Extended Read (Hard Disk) This function reads one or up to 127 sectors together with their ECC check bytes from the hard disk into the read buffer in main memory. The controller’s ECC logic does not carry out any ECC correction, but transfers the data as it is read from disk. You can then check whether the controller’s ECC logic has calculated the ECC bytes correctly when writing the sector.
Appendix F
1160
Register
Call value
Return value
AH AL CH CL DH DL ES 6X Carry
OAh number of sectors to read cylinder sector head drive read buffer segment” read buffer offse?
error code”
a
error if < > 0
the read buffer must comprise 516 bytes for each sector to be read (512 sector bytes plus 4 check bytes)
Function OBh - Extended Write (Hard Disk)
This function writes one or up to 127 sectors together with their ECC check bytes from the write buffer in main memory onto the hard disk. The controller’s ECC logic does not generate ECC bytes on its own, but writes the passed ECC bytes without any change into the ECC field of the sector. You can then, for example, generate intended and incorrect ECC data for checking the ECC function of the controller in a subsequent read operation. Register
Call value
Return value
AH AL CH CL DH DL ES BX Carry
OBh
error code”
number of sectors to write cylinder sector head drive write buffer segment*’ write buffer offset”
error if < > 0
‘1 see F.2 ‘) the whte buffer must comprtse 516 bytes (512 sector bytes plus 4 check bytes) for each sector to be written. the check bytes are not calculated by the controller, but are wntten dtrectly from the buffer
Function OCh - Seek (Hard Disk)
This function moves the read/write head to a certain track or cylinder and activates it. Register
Call value
Return value
AH cx DH DL Carry
OCh cyltnder head drive
error code*’
l
I see F 2
error if < > 0
BIOS
1161
Interrupt INT 13h
Function ODh - Hard Disk Reset (Hard Disk) This function resets the addressed drive. Register
Call value
Return value
AH DL Carry
ODh drive
error code*’ error if < > 0
‘1 see F.2
Function OEh - Read Buffer (Hard Disk) This function transfers 512 bytes from the controller’s sector buffer into the read buffer in main memory. No data is read from the volume. The function mainly checks the data path between controller and main memory. Register
Call value
Return value
AH DL ES BX Carry
OEh drive read buffer segment read buffer offset
error code*’
l
‘c
error if < > 0
I see F.2
Function OFh - Write Buffer (Hard Disk) This function transfers 512 bytes from the write buffer in main memory into the controller’s sector buffer. No data is written onto the volume. The function mainly checks the data path between controller and main memory.
;
Register
Call value
Return value
AH DL ES BX Carry
OFh drive write buffer segment write buffer offset
error code*’
error if < > 0
Function 10h - Test Drive Ready (Hard Disk)
i This function determines whether the hard disk is ready, and if not, determines the error status.
Appendix F
1162
Register
Call value
Return value
AH DL Carry
10h drive
error code*’
l
error if < > 0
I see F.2
Function llh - Calibrate Drive (Hard Disk) This function moves the read/write head to track 0. You can use this function, for example, for recalibrating the drive after a seek error. This is required especially for hard disk drives with a stepper motor. Register
Call
AH DL Carry
llh drive
l
value
Return value error code*’ error if < > 0
) see F 2
Function 12h - Check Controller RAM (Hard Disk) This function checks the controller RAM, and investigates controller errors. Register
Call value
Return value
AH DL Carry
12h drive
error code*’ error if < > 0
*a see F.2
Function 13h - Drive Diagnostics (Hard Disk)
The controller checks the drive and determines the error status, if necessary. Register
Call value
Return value
AH DL Carry
13h drive
error code*’ error If < > 0
Function 15h - Determine Drive/DASD Type (Floppy Drive, AT and PS12 only)
This function investigates which kind of data volume the addressed drive uses (DASD = Direct Access Storage Device). You can determine whether or not the drive recognizes a disk change,
BIOS Interrupt INT 13h
1163
artd how many 512 byte blocks or sectors the volume comprises. If the function has been completed successfully, the AH register contains a drive type indicator. Additionally, the two CX and DX registers hold the high and low-order word, respectively, of a 32-bit quantity, which indicates the number of data blocks or sectors on the volume.
Register
Call value
Return value
AH DL cx DX Carry
15h drive
type*’ data blocks/sectors (high-byte) data blocks/sectors (low-byte) error if c > 0
*I OOh=no dnve installed, 01 h=drive without connection for dtsk change, OZh=drlve with connection for disk change, 03h=hard disk
Function 16h - Determine Disk Change (Floppy Drive, AT and l’s/2 only) This function determines, via line 34 of the interface cable, whether the disk has been changed. For that purpose the BIOS reads an internal controller register.
Register
Call value
Return value
AH DL Carry
16h dnve
change flag*’
l
error if < > 0
’ OOh=no change. Olh=invalld drive number, 06h=dlsk changed
Function 17h - Fix Floppy Disk Format (Floppy Drive) This function fixes the controller-drive data transfer rate by means of the disk format. This is necessary, for example, to use a 360 kbyte floppy disk in a 1.2 Mbyte high density drive. A PC/ XT BIOS supports 320/360 kbyte drives only, so that here this function is not required.
Register
Call value
Return value
AH AL Carry
17h disk format”
type or error code” error if < > 0
” drrve type or see F.2 ‘I 1=320/360 kbyle disk in 320/360 kbyte drive 2=320/360 kbyte disk I” 1.2 Mbyte drive 3=1 2 Mbyte disk in 12 Mbyte drive 4=720 kbyte disk in 720 kbyte drive
Appendix F
1164
Function 18h - Fix Floppy Disk Format (Floppy Drive) This function fixes the disk type for formatting. This is necessary, for example, to format a 360 kbyte floppy disk in a 1.2 Mbyte high density drive. Register
Call value
keturn value
AH CH CL DC DI ES Carry
18h number of tracks sectors per track drive number
type or error code”
parameter table offset2’ parameter table segment” error If < > 0
‘) OOh=no error, Ochxnedium unknown, 80h=no disk In drive 3 see F . 5
Function 19h - Park Read/Write Heads (Hard Disk) This function moves the read/write heads to a certain cylinder and parks them. Register
Call value
Return value
AH DL Carry
19h drive
error code*’ error if < > 0
F.2 Error Codes Error code (AH value)
Meaning
OOh Olh 02h 03h 04h 05h 07h 08h 09h 1Oh
no error invalid function number address mark not found disk write-protected sector not found unsuccessful reset erroneous initialization DMA overflow DMA segment overflow read error data read error, KC correctton successful controller error track not found no drive response BIOS error unknown error
llh
2Oh 40h 80h BBh FFh
Floppy
Valid for Hard disk
yes yes yes
yes yes yes
yes
yes yes yes no
Valid for
no no no yes yes yes no
yes yes yes no no
yes yes yes
1165
BIOS Interrupt INT 13h
F.3 Hard Disk Drive Parameter Table AT-Controller
XT-Controller
Precompensation Max. EGG Burst Lenath Control Byte’) Standard Time-out Valuez) Time-out Value for Formattin@ rime-out Value for Drive Checks Reserved
OGh Number of Cylinders
(Word)
O*h Number of Heads 03h Reserved
(8Vte) (Word)
wh Landing Zone for Head Parking
(Word)
f Bvte) (Byte] (Bvie] (Byte)
t vt) (4 BzeI)
Ofh ” Bit 0..2: Drive Option Bit 3..5: Null Bit 6 l=ECC Retries Disabled Bit 71 l=Seek Retries Disabled ?’ in Timer Ticks
Oeh Number of Sectors per Track 0fh Reserved ” Btt 0..2: Reserved Bit 3: l=more than 8 Heads Bit 4: Reserved Bit 5: l=Defect List at MaxCylinder+l Bit 6: l=ECC Retries Disabled Bit 7: l=Seek Retries Disabled
F.4 Format Buffer Offset
Size
Contents
OOh Olh 02h 03h
byte byte byte
track of sector to format head of sector to format sector number number of bytes per sector*)
byte
l ) 0=128. 1~256, 2~512, 3=1024 -
F.5 Floppy Disk Parameter Table The parameter table is located in the DOS data area at address 5022h Offset
Size
Content
OOh Olh 02h 03h 04h 05h 06h 07h 08h 09h OAh
byte
first specification byte” second specification byte” number of timer pulses until drive motor I; off number of bytes per secto?’ sectors per track4’ gap length in bytes’ data length in byte6’ gap length for formattmg” fill byte for formatting*’ head settle time after seek [ms]” motor start time in l/8 seconds
byte byte byte byte byte byte byte byte byte byte
(Byte) (Byte\
Appendix F
1166
r
l1 Bit 7..4: Step Rate [ms] Entry
.
Bit 3..0: Head Unload Time [ms]
Data Transfer Rate
%
2h
2, Bit 7..1: Head Load Time Im! $1
Bit 0: O=Dala Transfer Via DMA 1 -Data Transfer Not Via DMA Data
31 0=128. 1=256, 2=512, 3=1024 “) 08h=8 Sectors/Track. 09h=9 Sectors/Track 15h=l5 Sectors/Track, 18h=l8 Sectors/Track 51 lbh for 1.2 Mbyte and 1.44 Mbyte, 2ah else 5) Don't Care, Mostly ffh ‘I 50h for 360 kbyten20 kbyte, 54h for 1.2 Mbyte 6ch for 1.44 Mbyte ‘I Standard: f6h Corresponding to ‘+’ 91 Standard: Ofh
Transfer Rates:
250 J&it/s:
360 kbyte 5'1~: floppy in 360 kbyte drive 720 kbyte 3'/1' floppy in 1.44 Mbyte drive 720 kbyte 3'h" floppy in 720 kbyte drive
300 !&it,s:
720 kbyte 3*/z" floppy in 720 kbyte drive 360 kbyte 5l/a" floppy in 1.2 Mbyte drive
500 kbit/s:
1.2 Mbyte 51/s" floppy in 1.2 !4byta drive 1.44 Mbyte 5'1.' floppy in 1.44 Ubyte
drive
G Floppy Disk Controllers G.l The Commands The specifications cylinder, head, sector number, and sector size are called the sector identification.
Before you can transfer a command byte or read a status byte in the result phase, you must read bit MRQ in the main status register to determine whether the data register is ready to receive or supply a byte. All command and status bytes are transferred via the data register (port 3f7h or 377h). The transfer of the read data or data to be written between main memory and controller is normally done via DMA; for this you have to program the DMA controller before the transfer of a command. Read and write commands concern all sectors from the start sector up to the end of the track; you can abort the read or write operation earlier by setting the count value of the DMA controller such that the DMA chip issues a TC signal after the desired number of sectors, or by setting the command byte track length/max. sector number to a value which indicates the last sector to be handled. If you set the multiple track bit M the controller executes the specified command not only for the programmed head but for the other head (i.e. for the opposite disk side) too; after the end of the track corresponding to tl& programmed head, the controller continues with the beginning of the track on the other disk side. After command completion the status registers ST0 to ST3 contain status information which helps you to confirm the correct execution of the command, or to determine the cause of an error. In advance of a read, write, or format operation, you first have to fix the drive format. The commands are divided into data transfer commands, control commands and extended commands, which are available on an AT or E/2.
G.l.l List of Valid Commands - Data transfer commands read sector read deleted sector write sector write deleted sector read complete track format track 1167
Appendix G
1168 Control commands read identification calibrate drive check interrupt status fix drive data check drive status seek invalid command
- Extended commands verify determine controller version seek relative register summary
G.1.2 Data Transfer Commands Read Sector (x6h) This command reads one or more sectors with a valid data address mark from disk and transfers the data into main memory. Command Phase Sit 7 6 5 4 3 2 1 0 We MFSOOllO 0 ,xxxx X HD DR,Db Cylinder 2 Head 3 Sector Number 4 Sector Size 5 Track Length/Max. Sector Number 6 Length of GAP3 7 6
M:
I
Data Lenoth multi track
operation
O=carry Out single track 0Qerati.X. OQeratiOn FM or MFM recording method l=MPM (standard) 0-E-M skip mode S: l=akiQ deleted data address marks O=do not skip head number (always equal head address in byte 3) HD: DRl, DR2: drive lO=drive 2 (C) ll.drive 3 (D) OO=drive 0 (A) Ol=drive 1 (El) cylinder, head, sector number: address of first sector to read 7=16 kbytea I=256 bytes 2=512 bytes ... sector size: 0=128 bytes track length/max. sector number: number of sectors per track or max. sector number, for which the command shall be carried out length of DAP 3: standard value=42, minimal value=32 (5 l/4") or standard value=27 (3 l/2") length of data to read in bytes (only valid if sector size-00). else equal ffh data length: l=carry out cylinder
F:
Floppy Disk Controllers
1169
Result Phase 0
7
STO, STl, ST2: status cylinder,
register 0 to 2 (see 0.2)
head, sector number, sector size: sector identification according to Table
G.1
Read Deleted Sector (xch) This command reads one sector with a deleted data address mark from disk and transfers the data into main memory. Sectors with a correct data address mark cannot be accessed by means of this command. Command Phase
Sector Number
M:
multi track operation
P:
FM or MEW
HD:
head number (always equal
1.carr-y out cylinder cqeration l&PI4
O=carry out single track operation
recording method
(standard)
O-FM head address in byte 3,
DRI. DRO: drive OO=drive 0 (A) cylinder,
Olxlrive 1 (El,
lO=drive
2 (C)
llzdrive 3 (D)
head, sector number: address of first sector to read
sector size: 0.128 bytes track lengthlmax.
1=256
bytes
2=512
sector number: number of
bytes
...
7=16
kbytes
sectors De= track or max. sector number, for which
the command shall be carried out length of GAP 3: standard data
length:
value.42, minimal
value-32
(5 l/4") or standard value.27 (3
length of data to reed in bytes (only valid if sector
size=001.
l/2")
else equal ffh
,
Appendix G
1170 Result Phase
STO, STl, ST2: status register 0 to 2 (see G.2) cylinder, bead, sector number, sector size: sector identification according to Table 0.1
Write Sector (x5h) This command transfers the data to be written from main memory to the controller, and writes one or more sectors with valid data address marks onto the disk. Command Phase
track operation Ozcarry out single track operation l=carry out cylinder operation FM or MFM recording method P: O=FM I=MFM (standard) head number (always equal head address in byte 3) Em: DRl, DRO: drive llzdrive 3 (D) lO=drive 2 (C) Ol.drive 1 (B) OO=drive 0 (A) cylinder, head, sector number: address of first sector to write 2=512 bytes 7=16 kbytes 1.256 bytes ... sector size: O-128 bytes track langth/max. sector number: number of sector8 per track or max. sector number. for which the command shall be carried o"t length of GAP 3: standard value=42, minimal value=32 (5 l/4') or standard value=27 (3 112") length of data to write inbytes (only valid if sector size=OO), else ewal ffb data length:
H:
multi
Floppy Disk Controllers
1171
Result Phase 7
5 6
/
Sector Number Sector Size
/
I
STO, STl, ST2: status register 0 to 2 (see G.2) cylinder,
head,
sector
number, sector
size:
sector
identification
according
to
Table
G.l
Write Deleted Sector (x9h) This command transfers the data to be written from main memory to the controller, and writes one or more sectors onto disk. Simultaneously, the data address mark of the sector concerned is deleted so that this sector can be accessed only by the read deleted sector command. Command Phase
M:
multi track
operation
l=carry out
cylinder
O=carry o"t
FM or MPM
SD:
head number (always equal head address in byte 3)
1.MK.l DRl,
recording
operation
F:
single
track
operation
method
(standard)
O=FM
DRO: drive OO=drive 0 (A)
Ol=drive 1 (B)
lO.drive 2 (C)
ll=drive 3 (D)
cylinder, head, sector number: address of first sector to write sector size: 0=128 track length/mu.
bytes
1.256
bytes
2=512
bytes
sector number: number of sectors
...
7=16 kbytea
9er track or max. sector number, for which
the command shall be carried out length of GAP 3: standard valuez42, data
length:
minimal value.32 (5 l/4') or standard value-27 (3
length of data to write in bytes (only valid if sector
l/2')
size=OO), else equal ffh
Appendix G
1172
Result Phase 7
0
0
1
ST0 ST1
2 3
ST2 Cylinder
4 5 F!
Head Sector Number Smtor Sk
‘
STO, STI, ST2: status register 0 to 2 (see G.2) cylinder, head, sector number, sector size: sector identification according to Table G.1
Read Track (x2h) This command reads the data of one complete track, starting with the first sector after the index address mark (IDAM), sector by sector without attention to the logical sector number which is given in the ID address mark. The track is regarded as a contiguous data block, and multi-track operations are not allowed; the command is limited to one single disk side. The read operation starts as soon as a signal on the IDX line indicates the passing of the index hole, that is, the beginning of the track. Note that the available read buffer in main memory is large enough to accommodate all sectors of the track continuously. The sector specification in the command phase is ignored. Command Phase
2
(
Cylinder
FM or MEW recording method l-MFM (standard) O=FM skip mode s: l=skip deleted data address marks O=do not skip IiD: head number (always ecrual head address in byte 3) DRI, DRO: drive OO.drive 0 (A) ll=drive 3 (D) Ol=drive 1 (B) lO.drive 2 (C) cylinder, head, Bettor number: address of first sector to read, but sector number is ignored here 2=512 bytes ... 7=16 kbytes eector size: 0.128 bytes 1.256 bytes track lengthlmx. sector number: number of sectors per track OX IMX. sector number, for which the command shall be carried out length of GAP 3: standard value=42, minimal value.32 (5 l/4") or etandard value.27 (3 112") data length: length of data to read in bytes (only valid if sector size=OO), else ewal ffh P:
1173
Floppy Disk Controllers Result Phase
‘sio
1
1 7 I
ST1 ST7
STl. S T 2 : statue r e g i s t e r 0 t o 2 (Bee 0.2)
S T O .
cylinder, head, sector
n&r, sector size: sector
identification according to Table 0.1
Format Track (xdh)
This command formats one track. For each sector of the track to be formatted you have to provide a 4 byte format buffer which holds the sector identification of the corresponding sector (see Figure G.l). Note that you specify a sufficiently large and continuous format buffer for all sectors of the track. Before issuing the command you have to program the DMA control so that the controller can read the format buffer data successively via DMA channel 2. Alternatively you can transfer the format data by means of interrupt-driven data exchange; the controller issues a hardware interrupt before formatting each sector. The handler then may transfer the 4 byte format information for the sector to be formatted next. The formatting starts after the drive has indicated the beginning of the track by providing a signal on the line IDX at the time the index hole passes through the photosensor. The sectors are formatted continuously until the drive indicates again the passage of the index hole by a signal on the IDX line. For the formatting process the length of GAP is larger than is the case for reading or writing data. The bytes cylinder, head, sector number, and sector size don’t have any meaning in the result phase here, but you have to read them out before you can program a new command. Command Phase
P:
FM or MFM l&G'l.,
head number (always egual head address in byte 31
HD: DRl,
recording method
(atandard)O.FM
DRO: drive OO=drive
BeCtOr
size:
track
length:
0 (A)
1 (a)
l-256 bytea
lO=drive 2 (C) 2~512
bytes
11xdrive 3 (D) ...
7.16 kbytee
number of sectors per track
length of GAP 3: standard fill byte:
Ol=drive
0~128 bytee
value=BO (5 l/4') or 84 (3 l/2')
byte to fill the sector's data area of the sectors (standard.Of6h corresponding
to
"*"j
.
Appendix G
1174 Result Phase
STO, STl, ST2: statue register 0 to 2 (see G.2) cylinder, head, sector
OQh Olh
Track
02h
Sector Number
03h
sector size11 t
number,
sector size: invalid values, but have to be read in advance of a new command
Head
'I 0=128 bytes, 1=256 bytes, 2=512 bytes,..., 7=16 kbytes
Figure G.1: Format buffer for one sector.
G.1.3 Control Commands Read Sector Identification (xah) This command reads the sector identification of the first ID address mark which the controller is able to detect. Thus you can determine the current position of the read/write head. If the controller cannot read any ID address mark between two pulses on the IDX line (that is, after a complete disk revolution), it issues an error message. The bytes cylinder, head, sector number and sector size in the result phase characterize the read sector identification. Command Phase Sit 7 6 5 4 3 2 1 0 Byte 00F001010 IXXXXX Hn rm.nR
P:
FM or MFM lAWI
recording
method O.pM
(standard)
HD:
heed number (always equal head address in byte
ml, DRO:
drive OO-drive 0 (A)
Ol.drive 1 (B)
3)
lO.drive 2 (C)
ll.drive 3 (D)
1175
Floppy Disk Controllers Result Phase
5 6
j
1
Sector Number Size
Sector
I
STO, STl, ST2: status register 0 to 2 (see 0.2) cylinder, head, sector number, sector size: erector
identification
read
Calibrate Drive (x7h) This command moves the read/write head to cylinder 0. If a seek error occurred in the course of a sector access you can move the head to an absolute cylinder to calibrate the drive again. The command doesn’t implement a result phase, but issues an interrupt after completion. Immediately afterwards you should use the command check interrupt status to determine the status information of the calibration operation. The controller executes the command by setting the DIR signal to 0, passing the drive 79 step pulses at most, and checking the signal TRKO of the drive after each step pulse. If the signal is active (that is, the head is on track 01, the controller sets bit SE in status register 0 and aborts the command. If the signal TRKO is not active even after 79 step pulses, the controller sets bits SE and EC in status register 0 and terminates the command. To calibrate the drive you may have to issue several calibration commands. That’s especially true for floppy drives which handle more than 80 tracks. After completion of the command you should always determine, by means of a command check interrupt status, whether the head is correctly positioned over track 0. After power-up a calibration command is necessary to initialize the head position correctly. Command Phase
DRl, DRO:
drive OO.drive 0 (A)
Okdrive 1 (B)
lO.drive 2 (C)
ll=drive 3 (D)
Check Interrupt Status fx8hl This command returns status information about the controller state in the result phase if the controller has issued an interrupt. Interrupts are issued:
1176
Appendix G
at the beginning of the result phase of the commands read sector read deleted sector write sector write deleted sector read track format track read sector identification verify after completion of the following commands without the result phase calibrate drive seek seek relative for data exchange between main memory and controller, when interrupt-driven data exchange is effective and the controller doesn’t use DMA. The command resets the interrupt signal and determines the source of the interrupt via status register STO. If you issue the command and no interrupt is pending, the status register ST0 returns a value 80h corresponding to the message invalid command. Command Phase Bit Byte
7
6
5
4
3
2
1
0
n
n
n
1
n
n
n
Result Phase 7 0 I 1 1
STCI: etatUQ
0
ST0 Current Cylinder
register
current cylinder:
c
0 (see 0.2)
current
position
of
read/write
head
Fix Drive Data (x3h)
With this command you pass the controller mechanical control data for the connected drives. Note that the effective values are also dependent on the selected data transfer rate. With a PC/XT controller the values are fixed, because the data transfer rate cannot be programmed and doesn’t vary in this case. The command doesn’t have a result phase.
Floppy Disk Controllers
1177
Command Phase
m-r Step Rate [ms]
Entry Oh
:“h
eh fh
NDM: Non-DMA Mode O=Data Transfer Via DMA l=Data Transfer Not Via DMA
Head Unload Time [ms]
Data Transfer Rate 1M 5GOk300k250 8.0 7.5 7.0 _,. 1 .o 0.5
16 26.7 15 25.0 14 23.3 .., 2 3.3 Ill.7
32 30 26 4’ 7
Head Load Time [ms]
Check Drive Status (x4h) In the result phase the command provides status information concerning the state of the connected drives. Command Phase
head number DRl, DRO: drive OO=drive 0 (A)
HD:
Ol=drive 1 (B)
lO=drive 2 (C)
llzdrive 3 (D)
Result Phase
ST3: status regieter 3 with drive information (see G.2)
Park Read/Write Head (xfh) This command moves the read/write head to the park cylinder. For command execution the controller compares the current cylinder number with the programmed number, sets the direction signal (DIR) for the drive accordingly, and issues step pulses until both cylinder numbers coincide. The command has no result phase; you should therefore verify the head position immediately after command completion with the clzeck interrupt status command.
Appendix G
1178 Command Phase
head
IiD:
number
DR1, DRO: drive Okdrive 1 (8) lO=drive 2 (C) OO.drive 0 (A) cylinder: cylinder where the head should be moved to
ll=drive 3 (D)
Invalid Command If you specify an invalid opcode, the controller switches to a standby state and sets bit 7 of status register ST3. The same applies if you issue a check interrupt status command and no interrupt is pending. Command Phase
Result Phase 7 Lo
I
0 ST0
c
STO: status register 0 with entry 80h (see 0.2)
G.1.4 Extended Commands Verify fx16h) This command reads one or more sectors with valid data address marks from disk, calculates the CRC check sum, and compares the calculated and the read CRC values to check the internal consistency of the data. The command therefore behaves like a read command without data transfer to main memory. Thus the command cannot be aborted by a TC signal from the DMA controller. On the other hand, you must set bit EC to ~. By means of the ID fields sectors or complete tracks can be marked as bad. After the formatting, the controller clears the BSY bit and issues a hardware interrupt 76h via IRQ14. For a format operation in native mode you must therefore know the physical drive geometry very well. On a drive with zone recording, for example, it is absolutely necessary that you know all the borders of the individual zones and the number of sectors of each zone. You must transfer this number to the sector count register. When formatting in translation mode the controller writes only the sector data filled with the byte values 6ch; the ID marks are not changed. That is actually not a real formatting operation, because the structure of the volume is not changed. For example it is not possible to adjust the interleave value by this. For the number of sectors per track in this case you have to specify the logical sector number per track; the borders of the zone recording are insignificant. With other values most controllers respond with an error message ID mark not found and abort the formatting operation. In some cases, the controller can do something unpredictable. Low level formatting of IDE hard disks in translation mode is therefore very critical, and the normally very useful tools such as DiskManager or PCTools are of no value. To overwrite all sectors with 512 bytes 6ch has the same effect as a format operation in translation mode.
Hard Disk Drive Controllers
1193
Command Phase
~ Cylinder
LSB
(114h) C7 Cc C, C4 C, C, C, Co
c,-c,:
cylinder number (lo-bit binary number) drive l=SlaVe O=master HD,-HD,: head number (binary number) OOOO=head 0 OOOl=head 1 OOlO.head 2 mv:
...
llll=head 15
Result Phase
NEW: l=data address mark not found
;
o=no error ABT: command abortion l=comnd aborted O=conmand completed HID: l=ID mark not found o=no error C,-C,, .9-S,, HD,-HD,: sector identification of sector formatted last DRV: drive l=SZlFWt? o.master BSY: busy l=drive is busy O=drive not busy RDY: ready l-drive ready O=not ready DRQ: data l=can be transferred 0x10 data access possible ERR: error l=error register contains additional error information x: unused. invalid
Seek (7xh) This command moves the read/write heads to the programmed track and selects the addressed head. Immediately after transfer of the command code, the controller sets the BSY bit and executes the seek. If the seek is completed correctly, the controller clears the BSY bit, sets the SKC bit, and issues a hardware interrupt 76h via IRQl4. Note that the disk need not be format! ted for carrying out the command correctly. In translation mode the passed logical cylinder i, number is converted to a physical cylinder number, and the head is moved to the physical 1 cylinder.
1194
Appendix H
Command Phase
Cylinder
L S B (114h) CT CS C5 Cd
C
~~
q-c,:
cylinder
DR":
drive
HD,-HD,:
head number (binary number)
X:
unused,
number
(lo-bit
binary number)
0=Ul?.L3ter
l=SlaVe
OOOO=head 0
OOOl=head 1
OOlO=bead 2
...
llll-head 15
invalid
Result Phase
~1
KM:
l=data
address mark uot
found
NTO: l=track 0 not found AB'I':
command
o=no
error
abortion
l=command NID: l=ID
o=no error
aborted
O-command
mark not found
C,-C,, HD,-HD,:
O=no
completed
error
track identification of sector where head is moved to
DEW: drive 1=s1ave
0=IlEltlter
BSY: busy l=drive RDY:
l=drive SW:
is busy
O.drive not busy
ready
Osnot
ready ready
seek 1=comp1ete
O=in
progress
DRQ: data l=can ERR: X:
be
transferred
O.no data access
possible
error l=error
register contains additional error
unused,
invalid
information
Diagnostics (90h) This command starts the controller-internal diagnostics routine to check the controller ektronics. The CPU can issue this command if the BSY bit is cleared. The RDY bit concerns only the drives, and is insignificant for the diagnostics command because only the conlroller and not the mechanical drives are checked. Note that the diagnostic information is returned in the error register (lflh). The meaning of the individual error bits differs from the normal case;
Hard Disk Drive Controllers
1195
furthermore, after the completion of the diagnostics command the ERR bit in the status register (lf7h) is always equal to 0. The seven low-order bits in the error register contain a binary diagnostics code for the master drive; the high-order bit indicates a summary error code for the slave drive. Command Phase AT Task File j615141312/1/0
Result Phase
slave
diagnostics
code
Ozslave o.k. or slave not present l=error of slave in at least one diagnostics function binary
master
diagnostics
code
l=master drive o.k. Z=fomatting
circuit error in master drive
3=buffer error in master drive 4=ECC
logic error in master drive
S.microprocessor
error in master drive
6zinterface circuit error in master
drive
busy l=drive is busy
O-drive
not busy
data l=can unused,
be transferred
O=no
data access possible
invalid
Set Drive Parameters (91h)
This command sets the logical geometry of the addressed drive. In the sector count register you specify the number of logical sectors per logical track, and in the drive/head register you specify the number of logical heads of the drive. In translation mode the translation logic of the controller then translates the logical geometry to the real physical geometry of the drive. In translation mode, the drive uses this logical geometry to carry out those commands involving a disk access. The number of logical cylinders of the drive is an automatic result of the request that the number of all logical sectors cannot be larger than the physical sectors actually present. A change of the logical geometry of a hard disk, where data is already stored, inevitably results in a complete data loss as the changed geometry destroys the logical structure of the file system.
1196
Appendix H
Command Phase
DR” :
drive
IID,-I-ID,:
number of logical heads of the drive
Ormaster
1.s1ave
Result Phase AT Task File Register
busy O.drive not busy
l-drive is busy ready kdrive
O.not
ready
ready
PRO : data l.can XC:
be
unused,
transferred
O=no data access
possible
invalid
H.1.3 Optional Commands The following three commands are supported by most of the IDE hard disk drives, although they are optional commands and were not implemented in the original AT controller. Read Sector Buffer (e4h) This command reads out the contents of the controller’s sector buffer. Immediately after you have passed the command code, the controller sets the BSY bit and prepares the buffer for a read operation by the CPU. Afterwards, the BSY bit is cleared, the DRQ bit is set, and the controller issues a hardware interrupt 76h via IRQ14. The CPU now can read the sector buffer and transfer the data to main memory. The command serves mainly for checking the data path between controller and main memory. You can use the command in self-programmed diagnostic routines to determine the source of drive faults. Command Phase AT Task File Register Command Drive/Head 7 DR”:
(lfi’h)
Bit 71615j4j312jljO 1 0 0 1 1 1 0 0
(lf6h)
I
o
drive l.RlWJE
o=master
I
DRV
x
x
x
x
Hard Disk Drive Controllers
1197
Result Phase AT Task File Register (lf7h)
St&IS
BSY:
busy l=drive
RDY:
7\6j5,4B;3,2,110 ESYFDY x xDi=iQxxx
is busy
Ocdrive not busy
ready
O.not
ready l=drive
DRQ:
data
X:
unused,
l=can
be transferred
ready O=no
data access possible
invalid
Write Sector Buffer (e8h) This command writes data into the controller’s sector buffer. Immediately after passing the command code the controller sets the BSY bit and prepares the buffer for a write operation by the CPU. Afterwards, the BSY bit is cleared, the DRQ bit is set, and the controller issues a hardware interrupt 76h via IRQ14. The CPU can now transfer data from main memory to the sector buffer. The command serves mainly for checking the data path between controller and main memory. You can use the command in self-programmed diagnostic routines to determine the source of drive faults. Command Phase AT Task File Register Command (lf7h) flf6hl Drive/Head
DRV:
Bit /6/51413121110 : 1 1 0 1 0 0 0 I n ,n!=wx x x x
drive l=slWJe
o=master
Result Phase AT Task File Register Status
BSY :
busy
RDY :
ready
DRO :
data
F.:
unused,
(lf7h)
Bit 716151413/211/O [BSYFfDY x x DRQ x x x
l=dsive ia busy l=drive
ready
l.can be transferred
Czdrive O=not
not busy
ready O=no data acce88
possible
invalid
Identify Drive (ech)
,
This command reads parameters and other information from the addressed drive. Immediately after you have passed the command code, the controller sets the BSY bit, loads the information into the sector buffer, and prepares the buffer for a read operation by the CPU. Afterwards, the
Appendix H
1198
BSY bit is cleared, the DRQ bit is set, and the controller issues a hardware interrupt 76h via IRQ14. The interrupt handler has to read all 256 data words of 16 bits each out of the sector buffer. The structure df the 512 byte information is shown in Table H.l. Command Phase AT Task File Register C o m m a n d (lf7h) (Iffih~ Drive/Head DR”:
, Bit 716/5/413/21110 1 1 1 0 1 1 0 0
, n , -I ‘I II II Y
drive
l.SlaVe
o=master
Result Phase AT Task File Bit Register 7~6~5~4~3~2~1~0 flffh~ IBSYF(DY x XDlwX x x status xx: busy l.drive is busy RDY: ready l=drive ready DRQ: data l.can be transferred unused, invalid X:
O.drive not busy O=not ready O-no data access possible
Word’)
Meaning
00 01 02
configuratior? number of physical cylinders reserved number of heads number of unformatted bytes per physical track number of unformatted bytes per sector number of physical sectors per track reserved for manufacturer ASCII serial number buffer type (01 h=one-way, OZh=bidirectional, 03h=cache buffer) buffer size/5 12 number of ECC bytes which are transferred in read/write-long operation ASCII identification of controller firmware ASCII model number bit 0..7: number of sectors between two Interrupts (multiple sector reads/writes only), bit 8.. 15: reserved bit 0: 1=32-bit VO, O=no 32-bit VO. bit 7..1: reserved bit 0..7: reserved, bit 8: l=DMA, O=no DMA, bit 9: l=LBA, O=no LBA reserved
03 04 05 06 07-09 lo-19 20 21 22 23-26 27-46 47 48 49 50
Table H.1: Sector buffer identific&on irrformation
1199
Hard Disk Drove Controllers
Word”
Meaning
51 52 53 54 55 56 57-58 59 60-61 62 63 64-127 128-159 160-255
bit O..J: reserved, brt 6.15: PI0 cycle time (0=600 ns, 1=380 ns, 2=240 ns, 3=180 ns) bit 0..7: reserved, bit 8..15: DMA cycle time CO=960 ns, 1=380 ns, 2=240 ns, 3=150 ns) reserved number of logical cylinders number of logical heads number of logrcal sectors per logical track bytes per logical sector bit 0..7: number of sectors between two interrupts, brt K.15: reserved sectors addressable in LBA mode single DMA: bit O..J=supported modes, brt 8..15=active mode multiple DMA: bit O..7=supported modes, bit 8..15=active mode reserved manufacturer resewed
‘) 16-bit words ” bit structure: 0
reserved
1
l=hard-sectored dwe
2
1 -soft-sectored
3
l=FiLL/ARLL
4
l=head w/Itch delay=1 5 ps
format
5
l=power-down mode implemented
6
l=hard disk
7
l=removable storage device drwe (usually CD-ROM)
8
l=internal data transfer rate < 5 Mbits/s
9
1=5 Mbits/s < data transfer rate < 10 Mbits/s
10
l=data transfer rate > 10 Mbits/s
11
l=rotatlon deviation > 0 5% (notebook)
12-15
reserved
Table H.l: cont.
; H.1.4 Optional IDE Commands s I In the following table you will find all IDE commands, together with the hex command codes, s which are optional according to the newest IDE interface specification. 1 1 Command code F Command check for active, Idle, standby, sleep identify drive idle idle Immediate read sector buffer read sector with DMA (with retry) read sector with DMA (without retry) read multiple sectors set features set multiple mode
I
98 e5 ec 97 e3 95 el e4 ca c9 c4 ef c6
Appendix H
1200
Command
Command code
set sleep mode set standby mode standby immediate , write sector buffer write sector with DMA (with retry) write sector with DMA (wtthout retry) write multtple sectors wnte same sector write verification achnowledge medium change lock drive door unlock drive door avallable for manufacturer reserved
99 e6 96 e2 94 e0 e8 ca cb c5 e9 3c db de df 9a. cO-c3, 8&8f, f5-f all other codes
H.2 SCSI Commands All reserved fields have to be set to 0. LSB characterizes the least significant byte, MSB the most significant byte of a multiple byte quantity. The command codes are uniformly 6, 10 or 12 bytes long. A command is executed in the following phases: transfer of the command code from the initiator to the target in the command phase + transfer of the parameters and/or data from the initiator to the target in the data-out phase -+ transfer of the result data from the target to the initiator in the data-in phase + transfer of the status from the target to the initiator in the status phase + transfer of messages from the target to the initiator in the message phase. LUN indicates the logical unit within one target or one logical unit which is connected to the target. Examples are two hard disks (LLJNs) which are connected to one SCSI controller (target). SCSI manages all drives by means of so-called logicnl blocks which are contiguous and equal in size. It is the job of the target to convert the logical block address, for example in the case of a hard disk, into physical cylinder, head, and sector numbers. The error codes are provided at two levels: as a status key which indicates the error group; and the status code with a detailed error description.
Hard Disk Drive Controllers
1201
H.2.1 Summary of Listed Commands Detailed are only the required and the most important optional SCSI commands for disk drives (hard disks). An extensive discussion of all SCSI device classes would go far beyond the scope of this book. If you are interested in programming scanners, CD-ROMs and other devices, I have to direct you to the (now widely available) specialized literature on these topics. For disk drives only 6 and 10 byte commands are available. - 6-byte commands test unit ready (OOh) rezero unit (Olh) request sense (03h) format unit (04h) reassign blocks (07h) read (08h) write (Oah) seek (Obh) inquiry (12h) mode select (15h) reserve (16h) release (17h) mode sense (lab) start/stop (lbh) send diagnostic (ldh) - IO-byte commands read capacity (25h) read (28h) write (2ah) seek (2bh) write and verify (2eh) verify (2fh) read defect data (37h) write buffer (3bh) read buffer (3ch) read long (3eh) write long (3fh) change definition (40h) mode select (55h) mode sense (5ah)
i
The following table lists all SCSI-II commands for the ten device classes disk drive, tape drive, printer, processor device, WORM, CD-ROM, scanner, optical storage device, media changer and communication device. The column with the detailed SCSI commands is emboldened.
1202
Appendix H
Command
Length”
Code
Class” DD TD Pr PD WO CD S C OS MC Co
Test Unit Ready RewindlRezero Unit Request Sense Format/Format Unit Read Block Limits Reassign Blocks Read Write Seek Read Reverse Write FilemarkGynchronize Buffer Space Inquiry Venfy Recover Buffered Data Mode Select Reserve Release
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
OOh Olh 03h 04h 05h 07h 08h Oah Obh Ofh 10h llh 12h 13h 14h 15h 16h 17h 18h 19h lah lbh lch ldh leh 24h 25h 26h 28h 2ah 2bh 2ch 2dh 2eh 2fh 30h 32h 33h 34h 35h 36h 37h 38h 39h 3ah 3bh 3ch 3dh 3eh 3fh
x x x x x 0 X M- 0 x x x x x X-O--
COPY Erase Mode Sense Load/Unload/Scan/Stop/Start Receive Diagnostic Results Send Diagnostic Present/Allow Medium Removal Set Window Get Window Read Capacity Read Write Seek Erase Read Updated Block Write and Verify Verify Search Data High Search Data Equal Search Data Low Set Limits Synchronize Cache Lock/Unlock Cache Read Defect Data Medium Scan Compare Copy and Venfy Write Buffer Read Buffer Update Block Read Long Write Long
x xx x x 0 -0 0 x xx x x --O--
M X M M M M--MO M M - 0 --OM0 X MO 0 0 - 0 M X 0 xxx0 0 -0 MX 0 - 0 - O O-OMMOMMMM--MM-_ _ _ M X O M O M X M M M M--x x x x x x x x x x M O M M M M--MOOMMM---0 x 0 - 0 0 00 0 0 x x x - x xxxox x x - x xxxo0 0 00 0 0 00 - M X M M M M--0 x 0 - 0 0 00 0 0 0 0 o0 0 00 - 0 0 00 0 0 00 0 0 x x x x x x x x x x 00--o 0 -0 0 M__-MMX__- - -- O_ X-_-X X-X_x - - - x x -x 0 o - - o M-_-M - _ - O O - _ - O
x x x 0 - o x 0 -0 0 _-O__-O_ _ --O__
o - - - o 0 -0 0 -0 o - - - o 0 -0 0 -0 0 -0 0-_--
0 0 0 0 0 0 0 -
-0 -0 -0 -0 -0 -0 -0 - O
-
-
0 0 0 0 0 0
- - o - 0 00 - ooo-0 00 0 0 o o o -0 - -
0 -0 o---o
0 -0 - --(_
0 0 0 0
0 0 0 0
00 00 00 00 -
Hard Disk Drive Controllers
1203
Command
Length”
Code
Class” DD TD Pr PD WO CD S C OS MC Co
Change Definition .Write Same Read Sub-Channel Read Tot Read Header Play Audio Play Audio MSF Play Audio Track index Play Track Relative Pause/Resume Mode Select Mode Sense Move Medium/Play Audio Exchange Medium Read Play Track Relative Write Erase Write and Verify Verify Set Limits Request Volume Address Read Defect Data
10
40h 41h 42h 43h 44h 45h 47h 48h 49h 4bh 55h 5ah a5h a6h a8h a9h aah ach aeh afh b3h b5h b7h
0 0 0 0 0 ()____ - - -_ - -- - - -_ - - -- - -- -- -
0 0 0 0 0 _---o---0 -_ 0 -()---0 -_ _ () -()_-__ IJ -- -
0 0 -
oo-- - - --
0 -
0 00 0 0 0 00 0 0 0 -x --_ 0 o - o 0 o----
-
-
0 -
-
10 10 10 10
10 10 10 10 10 10 10 12 12 12 12 12 12 12 12 12 12 12
0 0 -
-
-
0 0 -
0 0 0 - - - - ---_-
- o -0
-
- - o o - o o - o - - 0 - -0 -
o _ _
‘) length of SCSI command in bytes (6-byte. lo-byte, or 12.byte command) ” DD=disk drive, TD=tape
drive, Pr=printer, PD=processor dewce, WO=WORM. CD&D-ROM, Sc=Scanner, O&optical
storage. MC=medla changer, Cmommunicatlon device X=requested,
O=optional,
M=manufacturer-specific
H.2.2 (i-byte Commands Test Unit
Ready (OOh)
With this command you can determine whether the addressed target drive is ready. If so, the target completes the command with the status everything o.k. A request sense command returns only a no status. Note that the status key is valid only after an extended request sense command to determine the cause of a not-ready state of the addressed drive.
1204
Appendix H
LUN:
logical unit number 0 to 7
P:
flag l=return messages with flag, if L=l
I,:
O=messages
without flag
link l&inked
commands
O=single
commands
Rezero Unit (Olh) This command moves the target back to the zero position, that is, mostly to the beginning of the drive. On a hard disk this means that the read/write heads are moved to track 0. Bit
7
6
5
0 0 LUN
0
4
3
211
0
0
1
we 0 1 2 4
I I
5
I
3
0
0
0
Reserved Reserved Resewed Reserved 1
RPaPNaii
LUN:
logical
P:
flag
IA:
link
unit
1 F/I
number
0 to 7
Lreturn messages with flag, if L=l l-linked
commands
Ozmessages
without flag
O=single comnds
Request Sense (03h) This command instructs the target to return status data about the last executed command to the initiator. The target aborts the transfer of the status data if all available bytes have been transmitted to the initiator, or if the allocation length is exhausted. Note that the status data is only valid with a message check status for the preceding command as long as the target has not received any further command. The target transfers status data to the initiator during the course of a data-in phase. The status data consist of an &byte header and additional status bytes in accordance with the preceding command and the error. Only with a sufficient allocation length can you be sure that all status bytes are transferred by the target (specify a value of 255 here ). Whether and, if so, how many additional status bytes the target transfers in an extended form depends upon the entry ndditional stat~ts let@ in byte 7 of the status data. Table H.2 shows an example for the status, where the physical location of the error is indicated. The returned status information is very extensive, so it is not detailed here.
Hard Disk Drive Controllers
LW: logical unit number 0 to 7 Allocation Length: number of bytes which the initiator
reserves for the target's status data
054 status bytes (SCSI-I)
0~0 status bytes (SCSI-I I ) number of status bytes to transfer
1...255: P:
flag
L:
link
l-return messages with flag, if L=l l.linked
commands
O-single
O=messages
without flag
cormnands
Header m
VAL: Class: Error Code: Status Key: Logical Block Address: AdditIonal Status Length:
valid l=logical block address (byte 3-6) valid &CBA not valid error class. for extended status class equal to 7 for extended status equal to 0 error group, see H.2.4 identification of the block where the error occurred number of additional status bytes
Additional Status Bytes (Example) Byte 8 9 10 11 12 13 14 15 val 16 17 16 19 20
L2’
1
CCdellt Command Dependent Command Dependent Command Dependent Command Dependent Additional Status Code Extended Status Code FRU Status Code Dependent Status Code Dependent Ftetnes Physical Cylmder (MSB) Physical Cylinder (LSB) Physical Head Physical Sector
Tnble H.2: St&us
1206
Appendix H
Format Unit (04h) This command formats the whole drive by writing all ID and sector data fields. You must specify the block size and the geometric drive parameters, such as sectors per track, etc., in advance by a mode select cornman) OOh-real mode, Olh=16:16 protected mode, 02h=16:32 protected mode, 03h=00:32 protected mode xl OOh=SS provides Information about code and data in the buffer (Get) Olh=SS provides Information about addltional data in the buffer (Get) OZh=SS accepts an array with pointers to addttlonal data in the buffer (Set) 4, subfunctlon=OOh:
number of addItIonal data areas (Get)
subfunctlon=Ol h: amount of information about addItIonal data areas (Get) subfunctlork03h:
number of pointers to addmonal data areas (Set)
” Buffer structure: Subfunction=OOh.
content
Offset
Size
OOh
dword
Itnear 32.bit base address of the code segment
04h
dword
code segment l!mlt
08h
dword
offset of entry point
Och
dword
linear 32.bit base address of the data segment
10h
dword
data segment llmlt
14h
dword
offset of data area
SubfunctforkOlh (one entry for each addItIona/ data segmentiOffset
Size
content
OOh
dword
linear 32-bit base address of the data segment
04h
dword
data segment limit
08h
dword
offset of data area
Subfunction=OZh (one entry for each additiona/
Offset
Size
OOh
dword
32.btt offset
04h
dword
selector
08h
dword
reserved
content
data segment).
1305
PCMCIA Socket Services
INT lah, Function alh - GetAccessOffsets In a buffer, this function provides the offsets of an adapter-specific access routine to PCMCIA cards which allow an access to the card memory only through a register, that is, I/O ports (the usual method is mapping windows into the system memory). The calling program must supply a buffer. Register
Call value
Return value
AH AL BH cx
alh adapter mode” number of offset?
error code”
;D, ES Carry
Offset buffer Segment buffer
number of offsets41
error if 0 0
” see M 3 *I OOh=real mode, 01 h=16’16 protected mode, 02h=16.32 protected mode, 03h=00:32 protected mode 3) requested number of offsets ‘) avalable number of offsets
INT lah, Function aeh - VendorSpecific A call of this function leads in a defined way to a vendor-specific function. Vendors are allowed to implement the function in any way. With the exception of AH, AL and Carry, the use of all registers is vendor-specific, too. Register AH AL Carry
Call value
Return value
aeh adapter
error code” error if 0 0
M.3 Error Codes Code
Name
Description
OOh Olh 02h 03h 04h O6h 07h
SUCCESS BAD-ADAPTER BAD-ATTRIBUTE BAD-BASE BAD_EDC BAD_lRQ BAD-OFFSET
function completed successfully invalid adapter address Invalid attnbute invalid base address of system memory invalid EDC generator invalid IRQ level Invalid PCMCIA card offset
Appendix M
1306
Code
Name
Description
08h 09h Oah Obh Odh Oeh Ofh llh 12h 14h 15h 16h 17h 18h
BAD-PAGE READ-FAILURE BAD-SIZE BAD-SOCKET BAD-TYPE BAD_VCC BAD_VPP BAD-WINDOW WRITE-FAILURE NO-CARD BAD_FUNCTION BAD-MODE BAD-SPEED BUSY
invalid page error whrle reading invalid size invalid socket invalid window or interface type invalid Vcc level index invalid Vppl or Vpp2 level Index invalid window error while wrttrng no PCMCIA card in the socket invalid function mode not supported invalid speed socket or PCMCIA card busy
M.4 PCMCIA Card Services Summarized -
The Card Services provide a system-near interface for PCMCIA slots; calls to the card services are therefore system-dependent. For several processes in a system, the card services administer PCMCIA accesses to avoid access conflicts. For DOS in real mode (or real mode ROM BIOS), the card services are called through INT lah, function afh ([ah] = afh).
M.4.1 Card Services Functions Code
Name
Description
OOh Olh 02h 03h 04h 05h
CloseMemory CopyMemory DeregisterClient GetClientlnfo GetConfigurationlnfo GetFirstPartition
closes a memory card area copres data of a PCMCIA card removes a clrent from the list of registered clrents provides information about a clrent returns the confrguratron of a socket/PCMCIA card returns informatron about the frrst partitron of the card in a
06h
GetFirstRegron
07h 08h 09h Oah Obh
GetFirstTuple GetNextPartitron GetNextRegion GetNextTuple GetCardServrceslnfo
Och Odh Oeh
GetStatus GetTupleData GetFirstClient
returns socket returns returns returns returns returns etc.) returns returns returns
socket
information about the first region of the card In the the first tuple of the specified type information about the next partrtion of the card information about the next region of the card the next tuple of the specrfred type CS Information (number of logrcal sockets, vendor the present status of a PCMCIA card and the socket the content of the last passed tuple the first cirent handle of the registered clrents -.-
1307
PCMCIA Socket Services
c Code
Name
Description
Ofh
RegisterEraseQueue
10h llh 12h
RegisterClrent ResetCard MapLogSocket
13h
MapLogWrndow
14h
MapMemPage
15h
MapPhySocket
16h
MapPhyWtndow
17h 18h 19h lah lbh lch ldh leh lfh 2Oh 2lh
ModifyWrndow OpenMemory ReadMemory RegisterMTD Release10 ReleaselRQ ReleaseWtndow ReleaseConfrguration Request10 RequestlRQ RequestWindow
22h 23h 24h 25h 26h 27h 28h 29h 2ah 2bh
RequestSocketMask ReturnSSEntry WriteMemory DeregisterEraseQueue CheckEraseQueue ModifyConftguratron RegisterTImer SetRegIon GetNextClient ValidateUS
2ch 2dh 2eh 2fh 30h 31h 32h 33h 34h 35h 36h
RequestExclustve ReleaseExclusive GetEventMask ReleaseSocketMask RequestConfrguratron SetEventMask AddSocketServrces ReplaceSocketServrces VendorSpecrfrc AdjustResourcelnfo AccessConfrguratronRegrster
registers the erase queue of a client being serviced by the card services registers a clrent for service by the card services resets the PCMCIA card In a socket maps a logical socket under card services to the physical adapter and socket values under socket services maps a window handle under card servrces to the physical adapter and window values under socket services maps a memory area of a PCMCIA card to a page in a window maps physrcal adapter and socket values under socket services to a logical socket under card services maps physical adapter and window values under socket services to a window handle under card services modifies the attributes or access speed of a wtndow opens a memory card area reads data from a PCMCIA card via a memory handle registers a memory technology driver MTD releases the previously requested I/O addresses releases previously requested IRQs releases previously requested system memory block resets the socket configuration to memory-only interface requests l/O addresses for a socket requests IRQ for a socket request the mapping of a system memory block to a memory area of a PCMCIA card requests callback upon a socket status change (event) returns the entry pornt Into socket services wrttes data via a memory handle onto a PCMCIA card removes a previously registered erase queue informs about new queue entries modifies a socket and PCMCIA card confrguration registers a trmer for issuing a callback (Events) sets the properties of a PCMCIA card area returns the client handle for the next registered clrent validates the card rnformatron structure (CIS) of a PCMCIA card requests the exclusrve use of a PCMCIA card In a socket releases the exclusive use of a card In a socket returns the bit map mask for Issuing an event releases the previously defined event mask for a socket configures the PCMCIA card rn a socket changes the event mask adds a new 55 handler below the socket servrce level replaces an existrng socket service handler by a new one (vendor-dependent) reads or adjusts the available resources accesses a PCMCIA configuratton regrster
1308
Appendix M
M.4.2 Events Events are reported by the socket services to the clients. Usually, events are status changes of a socket or the inserted card.
,
Code
Event
Description
0th 02h 03h 04h 05h 06h 07h
BATTERY-DEAD BATTERY-LOW CARD-LOCK CARD-READY CARD-REMOVAL CARD-UNLOCK EJECTION_COMPLETE
08h
EJECTION-REQUEST
09h
INSERTION_COMPLETE
Oah
INSERTION_REQUESl
Bbh Och Odh Oeh Ofh 10h
PM-RESUME PM-SUSPEND EXCLUSIVE_COMPLETE EXCLUSIVE_REQUEST RESET_PHYSlCAL RESET-REQUEST
llh 14h 15h 16h 17h
CARD-RESET CLIENT-INFO TIMER-EXPIRED SS_UPDATED WRITE-PROTECT
40h 80h 81h 82h
CARD_INSERTION RESET-COMPLETE ERASE-COMPLETE REGISTRATlON_COMPLETE
battery dead, data lost battery low, data still o.k. mechanical - lock has locked the inserted card RDY/BSY signal has changed from busy to ready card has been removed from a PCMCIA socket mechanical lock has released the inserted card card has been ejected from the socket by an automatic ejection device card should be ejected from the socket by an automatrc ejection device card has been inserted into the socket by an automatic insertion device card should be inserted into the socket by an automatic insertion device power management should power-up socket and card power management should power-down socket and card client has been granted an exclusrve access to a PCMCIA card client attempts to gc$ an exclusive access to a PCMCIA card hardware reset for a PCMCIA card in a socket client has requested a hardware reset for a PCMCIA card in a socket hardware reset for the card in a socket completed client should return information timer expired socket support via socket services has been changed write-protect status for the PCMCIA card which IS Inserted rn the socket has changed a PCMCIA card has been inserted reset rn the background complete erase in the background complete registration in the background complete
1309
PCMCIA Socket Services
M.4.3 Error Codes Code
Name
Description
OOh Olh 02h 03h 04h 05h 06h 07h 08h 09h Oah Obh Och Odh Oeh Ofh 10h llh 12h 13h ,14h ,15h J6h ;17h 18h :19h lah ‘1 bh +ch ‘ldh lleh .lfh ,2Oh 2lh
SUCCESS BAD-ADAPTER BAD_ATTRIBUTE BAD-BASE BAD_EDC -
function completed successfully adapter invalid attribute invalid system memory base invalid EDC generator Invalid reserved IRQ level invalid PCMCIA card offset invalid page invalld error while reading invalid size invalld socket reserved window or interface type invalid Vcc level Index invalid Vppl or Vpp2 level index invalid reserved wlndow invalid error while writing reserved no PCMCIA card in socket function not supported mode not supported speed invalid socket or PCMCIA card busy undefined error occurred medium write-protected function argument length invalid one or more function arguments invalid configuration already locked resource already in use no more of the requested items no more resources handle invalid
BAD_IRQ BAD-OFFSET BAD-PAGE READ-FAILURE BAD-SIZE BAD-SOCKET BAD_lYPE BAD_VCC BAD_VPP BAD-WINDOW WRITE-FAILURE NO-CARD UNSUPPORTED_FUNCTION UNSUPPORTED_MODE BAD-SPEED BUSY GENERAL-FAILURE WRITE-PROTECTED BAD_ARG_LENGTH BAD_ARGS CONFIGURATION_LOCKED IN-USE NO_MORE_ITEMS OUT_OF_RESOURCE BAD-HANDLE