1 INTRODUCTION
1
Virtual machine Mn, with machine language Ln
Level 3
Virtual machine M3, with machine language L3 ...
113 downloads
2160 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
1 INTRODUCTION
1
Virtual machine Mn, with machine language Ln
Level 3
Virtual machine M3, with machine language L3
…
Level n
Level 2
Level 1
Level 0
Programs in Ln are either interpreted by interpreter running on a lower machine, or are translated to the machine language of a lower machine
Virtual machine M2, with machine language L2
Programs in L2 are either interpreted by interpreters running on M1 or M0, or are translated to L1 or L0
Virtual machine M1, with machine language L1
Programs in L1 are either interpreted by an interpreter running on M0, or are translated to L0
Actual computer M0, with machine language L0
Programs in L0 can be directly executed by the electronic circuits
Figure 1-1. A multilevel machine.
Level 5
Problem-oriented language level Translation (compiler)
Level 4
Assembly language level Translation (assembler)
Level 3
Operating system machine level Partial interpretation (operating system)
Level 2
Instruction set architecture level Interpretation (microprogram) or direct execution
Level 1
Microarchitecture level Hardware
Level 0
Digital logic level
Figure 1-2. A six-level computer. The support method for each level is supported is indicated below it (along with the name of the supporting program).
*JOB, 5494, BARBARA *XEQ *FORTRAN
FORTRAN program
*DATA
Data cards
*END
Figure 1-3. A sample job for the FMS operating system.
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Year 1 Name Made by Comments 1 1 1 1 1 1834 1 Analytical Engine 1 Babbage 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 First attempt to build a digital computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1936 Z1 Zuse First working relay calculating machine 1 1 1 1 1 1943 1 COLOSSUS 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 British gov’t 1 First electronic computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1944 Mark I Aiken First American general-purpose computer 1 1 1 1 1 1946 1 ENIAC I 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Eckert/Mauchley 1 Modern computer history starts here 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1949 EDSAC Wilkes First stored-program computer 1 1 1 1 1 1951 1 Whirlwind I 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 M.I.T. 1 First real-time computer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1952 IAS Von Neumann Most current machines use this design 1 1 1 1 1 1960 1 PDP-1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First minicomputer (50 sold) 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1961 1401 IBM Enormously popular small business machine 1 1 1 1 1 1962 1 7094 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 Dominated scientific computing in the early 1960s1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Burroughs 1 First machine designed for a high-level language 1 1963 1 B5000 1 1 1 1 1 1964 1 360 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 First product line designed as a family 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1964 6600 CDC First scientific supercomputer 1 1 1 1 1 1965 1 PDP-8 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First mass-market minicomputer (50,000 sold) 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1970 PDP-11 DEC Dominated minicomputers in the 1970s 1 1 1 1 1 1974 1 8080 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Intel 1 First general-purpose 8-bit computer on a chip 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1974 CRAY-1 Cray First vector supercomputer 1 1 1 1 1 1978 1 VAX 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 DEC 1 First 32-bit superminicomputer 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1981 IBM PC IBM Started the modern personal computer era 1 1 1 1 1 1985 1 MIPS 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 MIPS 1 First commercial RISC machine 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1987 SPARC Sun First SPARC-based RISC workstation 1 1 1 1 1 1990 1 RS6000 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 IBM 1 First superscalar machine 1
Figure 1-4. Some milestones in the development of the modern digital computer.
Memory
Control unit
Arithmetic logic unit
Input
Output Accumulator
Figure 1-5. The original von Neumann machine.
CPU
Memory
Console terminal
Paper tape I/O
Other I/O
Omnibus
Figure 1-6. The PDP-8 omnibus.
2 222222222222222222222222222222222222222222222222222222222222222222222222222 12 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Model 30 1 Model 40 1 Model 50 1 Model 65 1 Property 1 1 1 1 1 1 performance 1 3.5 1 10 21 222222222222222222222222222222222222222222222222222222222222222222222222222 12Relative 1 1 1 1 12Cycle 1 1 1 1 1 time (nsec) 1000 625 500 250 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 memory (KB) 64 256 512 2 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Maximum 1 1 256 1 1 1 12Bytes 1 1 1 1 fetched per cycle 1 2 4 16 2222222222222222222222222222222222222222222222222222222222222222222222222221 1 1 1 1 1 1 number of data channels 1 3 3 4 6 12Maximum 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 1-7. The initial offering of the IBM 360 product line.
100000000
16M 64M
10000000
1M
Transistors
1000000 100000
256K
4K
10000 1000
4M
64K 16K 1K
100 10 1 1965
1970
1975
1980
1985
1990
Figure 1-8. Moore’s law predicts a 60 percent annual increase in the number of transistors that can be put on a chip. The data points given in this figure are memory sizes, in bits.
1995
22222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222 1 Price ($) 1 1 Type Example application 1 1 1 1 Disposable computer 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 Greeting cards 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Embedded computer 10 Watches, cars, appliances 1 1 1 1 Game computer 2 1 2222222222222222222222222222222222222222222222222222222222222222222222 1 100 1 Home video games 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 Desktop or portable computer 1 Personal computer 1K 1 1 1 1 Server 10K Network server 2 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Collection of Workstations 1 100K 1 Departmental minisupercomputer 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 Mainframe 1 1 1M 1 Batch data processing in a bank 2 1 2222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Supercomputer 10M Long range weather prediction 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 1-9. The current spectrum of computers available. The prices should be taken with a grain (or better yet, a metric ton) of salt.
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Date 1 MHz Transistors 1 1 Memory 1 1 Chip Notes 1 1 1 1 1 1 1 4004 2,300 1 640 1 First microprocessor on a chip 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4/1971 1 0.108 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 8008 4/1972 0.108 3,500 16 KB First 8-bit microprocessor 1 1 1 1 1 1 1 8080 2 1 6,000 1 64 KB 1 First general-purpose CPU on a chip 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4/1974 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 6/1978 1 1 8086 5-10 1 29,000 1 1 MB 1 First 16-bit CPU on a chip 1 1 1 1 1 1 1 8088 5-8 1 29,000 1 1 MB 1 Used in IBM PC 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 6/1979 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 80286 2/1982 8-12 134,000 16 MB Memory protection present 1 1 1 1 1 1 1 80386 4 GB 1 First 32-bit CPU 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 10/1985 1 16-33 1 275,000 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 80486 4/1989 25-100 1.2M 4 GB Built-in 8K cache memory 1 1 1 1 1 1 1 Pentium 3.1M 1 4 GB 1 Two pipelines; later models had MMX 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 3/1993 1 60-233 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pentium Pro 1 3/1995 1 150-200 1 5.5M 1 4 GB 1 Two levels of cache built in 1 1 1 1 1 1 1 Pentium II 1 5/1997 1 233-400 1 7.5M 1 4 GB 1 Pentium Pro plus MMX 12222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1
Figure 1-10. The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHz is 1 million cycles/sec.
Pentium II
10M
Pentium
1M
Transistors
80286 100K
Moore's law
8080 4004 1K 8008
10K
80486
Pentium Pro
80386
8086 8088
100 10 1 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 Year of introduction
Figure 1-11. Moore’s law for CPU chips.
2 COMPUTER SYSTEMS ORGANIZATION
1
Central processing unit (CPU)
Control unit Arithmetic logical unit (ALU)
I/O devices
Registers
…
…
Main memory
Disk
Printer
Bus
Figure 2-1. The organization of a simple computer with one CPU and two I/O devices.
A+B
A
Registers
B
A
B
ALU input register ALU input bus
ALU
A+B
ALU output register
Figure 2-2. The data path of a typical von Neumann machine.
public class Interp { static int PC; static int AC; static int instr; static int instr3type; static int data3loc; static int data; static boolean run3bit = true;
// program counter holds address of next instr // the accumulator, a register for doing arithmetic // a holding register for the current instruction // the instruction type (opcode) // the address of the data, or −1 if none // holds the current operand // a bit that can be turned off to halt the machine
public static void interpret(int memory[ ], int starting3address) { // This procedure interprets programs for a simple machine with instructions having // one memory operand. The machine has a register AC (accumulator), used for // arithmetic. The ADD instruction adds am integer in memory to the AC, for example // The interpreter keeps running until the run bit is turned off by the HALT instruction. // The state of a process running on this machine consists of the memory, the // program counter, the run bit, and the AC. The input parameters consist of // of the memory image and the starting address. PC = starting 3address; while (run3bit) { instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr3type = get3instr3type(instr); // determine instruction type data3loc = find3data(instr, instr3type); // locate data (−1 if none) if (data3loc >= 0) // if data3loc is −1, there is no operand data = memory[data 3loc]; // fetch the data execute(instr 3type, data); //execute instruction } } private static int get3instr3type(int addr) { ... } private static int find3data(int instr, int type) { ... } private static void execute(int type, int data){ ... } }
Figure 2-3. An interpreter for a simple computer (written in Java).
S1
S2
S3
S4
S5
Instruction fetch unit
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
(a) S1:
1
S2:
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
1
2
3
4
5
6
1
2
3
4
5
6
7
8
9
S3: S4: S5: 1
2
3
4 5 Time (b)
…
Figure 2-4. (a) A five-stage pipeline. (b) The state of each stage as a function of time. Nine clock cycles are illustrated.
S1
Instruction fetch unit
S2
S3
S4
S5
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
Instruction decode unit
Operand fetch unit
Instruction execution unit
Write back unit
Figure 2-5. (a) Dual five-stage pipelines with a common instruction fetch unit.
S4 ALU
ALU S1
S2
S3
Instruction fetch unit
Instruction decode unit
Operand fetch unit
S5 LOAD
Write back unit
STORE
Floating point
Figure 2-6. A superscalar processor with five functional units.
Control unit Broadcasts instructions
8 × 8 Processor/memory grid Processor Memory
Figure 2-7. An array processor of the ILLIAC IV type.
Local memories
Shared memory CPU
CPU
CPU
CPU
Shared memory CPU
CPU
CPU
CPU
Bus (a)
Bus (b)
Figure 2-8. (a) A single-bus multiprocessor. (b) A multicomputer with local memories.
Address
Address
1 Cell
Address
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
16 bits
7
7
(c)
8
12 bits
9
(b)
10 11 8 bits (a)
Figure 2-9. Three ways of organizing a 96-bit memory.
2222222222222222222222222222222222 12222222222222222222222222222222222 1 Bits/cell 1 Computer 1 1 1 Burroughs B1700 1 21 222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 IBM PC 8 1 DEC PDP-8 1 1 12 21 222222222222222222222222222222222 1 1 IBM 1130 16 12222222222222222222222222222222222 1 1 1 DEC PDP-15 1 1 18 21 222222222222222222222222222222222 1 1 XDS 940 24 12222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 Electrologica X8 27 1 1 1 XDS Sigma 9 32 21 222222222222222222222222222222222 1 1 12222222222222222222222222222222222 1 1 Honeywell 6180 36 1 CDC 3600 1 1 48 21 222222222222222222222222222222222 1 1 CDC Cyber 60 12222222222222222222222222222222222 1 1 Figure 2-10. Number of bits per cell for some historically interesting commercial computers.
Address
Little endian
Big endian
Address
0
0
1
2
3
3
2
1
0
0
4
4
5
6
7
7
6
5
4
4
8
8
9
10
11
11
10
9
8
8
12
12
13
14
15
15
14
13
12
12
Byte
Byte 32-bit word
32-bit word
(a)
(b)
Figure 2-11. (a) Big endian memory. (b) Little endian memory.
Big endian
Transfer from big endian to little endian
Little endian
0
J
I
M
4
S
M
I
T
8
H
0
0
12
0
16
0
M
I
J
J
I
M
T
I
M
S
S
M
I
T
4
0
0
0
H
H
0
0
0
8
12
21 0
0
0
0
0
0 21 12
16
4
0
0
0
0
1
M
I
J
0
T
I
M
S
4
0
0
0
0
H
8
0
0 21
0
0
0 21
0
1
0
0
1
(a)
4
(b)
4
Transfer and swap
1 (c)
(d)
Figure 2-12. (a) A personnel record for a big endian machine. (b) The same record for a little endian machine. (c) The result of transferring the record from a big endian to a little endian. (d) The result of byte-swapping (c).
0
4 16
22222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222 Word size 1 Check bits 1 Total size 1 Percent overhead 1 1 1 1 1 1 8 4 12 50 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 122222222222222222222222222222222222222222222222222222 1 1 1 1 16 5 21 31 1 1 1 1 1 32 6 38 19 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 64 7 71 11 122222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 128 8 136 6 22222222222222222222222222222222222222222222222222222 1 1 1 1 1 256 9 265 4 122222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222 11 11 11 11 512 10 522 2 1 Figure 2-13. Number of check bits for a code that can correct a single error.
A 0 1
1
C
A
A
0
0
1
0 1
1
0
1 1
1
C
0 Parity bits
B (a)
1
0 0
B
C
Error
(b)
0 B (c)
Figure 2-14. (a) Encoding of 1100. (b) Even parity added. (c) Error in AC.
Memory word 1111000010101110 0 1
0 2
1 3
0 4
1 5
1 6
1 7
0 8
0 0 0 0 1 0 1 1 0 1 1 1 0 9 10 11 12 13 14 15 16 17 18 19 20 21
Parity bits
Figure 2-15. Construction of the Hamming code for the memory word 1111000010101110 by adding 5 check bits to the 16 data bits.
Main memory CPU Cache
Bus Figure 2-16. The cache is logically between the CPU and main memory. Physically, there are several possible places it could be located.
4-MB memory chip Connector Figure 2-17. A single inline memory module (SIMM) holding 32 MB. Two of the chips control the SIMM.
Registers Cache
Main memory
Magnetic disk
Tape
Optical disk
Figure 2-18. A five-level memory hierarchy.
Intersector gap or ect 1s
ta bits 6 da 409
ble am e Pr
Track width is 5–10 microns
E C C
Direction of arm motion
Width of 1 bit is 0.1 to 0.2 microns
Dire c Preamb le
Read/write head
tion
of d
isk
40 96 da ta
rot ati on
bit s C
C
E
Disk arm
Figure 2-19. A portion of a disk track. Two sectors are illustrated.
Read/write head (1 per surface) Surface 7 Surface 6 Surface 5 Surface 4 Surface 3 Direction of arm motion Surface 2 Surface 1 Surface 0
Figure 2-20. A disk with four platters.
222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222 1 LD 5.25′′ 1 HD 5.25′′ 1 LD 3.5′′ 1 HD 3.5′′ 1 Parameters 1 1 1 1 1 1 Size (inches) 5.25 5.25 3.5 3.5 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Capacity (bytes) 360K 1.2M 720K 1.44M 1 1 Tracks 1 1 1 1 1 40 80 80 80 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Sectors/track 9 15 9 18 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Heads 1 1 1 1 1 2 2 2 2 222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 Rotations/min 300 360 300 300 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Data rate (kbps) 250 500 250 500 1 1 1 1 1 1 1 1 Type 222222222222222222222222222222222222222222222222222222222222 1 Flexible 1 Flexible 1 Rigid 1 Rigid 1 Figure 2-21. Characteristics of the four kinds of floppy disks.
222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222 1 Data bits 1 Bus MHz 1 MB/sec 1 Name 1 1 1 1 1 SCSI-1 8 5 5 21 22222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222 1 1 1 1 SCSI-2 8 5 5 1 Fast SCSI-2 1 1 1 1 8 10 10 21 22222222222222222222222222222222222222222222222222222 1 1 1 1 Fast & wide SCSI-2 16 10 20 1222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Ultra SCSI 1 1 1 1 16 20 40 1222222222222222222222222222222222222222222222222222222 1 1 1 1 Figure 2-22. Some of the possible SCSI parameters.
(a)
(b)
Strip 0
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 0
Strip 1
Strip 2
Strip 3
Strip 0
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 8
Strip 9
Strip 10
Strip 11
Bit 1
Bit 2
Bit 3
Bit 4
Bit 5
Bit 6
Bit 7
RAID level 0
(c)
RAID level 2
Bit 1
Bit 2
Bit 3
Bit 4
Parity
(d)
(e)
(f)
RAID level 1
RAID level 3
Strip 0
Strip 1
Strip 2
Strip 3
P0-3
Strip 4
Strip 5
Strip 6
Strip 7
P4-7
Strip 8
Strip 9
Strip 10
Strip 11
P8-11
Strip 0
Strip 1
Strip 2
Strip 3
P0-3
Strip 4
Strip 5
Strip 6
P4-7
Strip 7
RAID level 4
Strip 8
Strip 9
P8-11
Strip 10
Strip 11 RAID level 5
Strip 12
P16-12
Strip 13
Strip 14
Strip 15
P16-19
Strip 12
Strip 17
Strip 18
Strip 19
Figure 2-23. RAID levels 0 through 5. Backup and parity drives are shown shaded.
Spiral groove
Pit Land
2K block of user data
Figure 2-24. Recording structure of a Compact Disc or CD-ROM.
…
Symbols of 14 bits each
42 Symbols make 1 frame Frames of 588 bits, each containing 24 data bytes
… Preamble
Bytes 16
98 Frames make 1 sector Data
ECC
2048
288
Mode 1 sector (2352 bytes)
Figure 2-25. Logical data layout on a CD-ROM.
Printed label Protective lacquer Reflective gold layer Dye layer
Dark spot in the dye layer burned by laser when writing
1.2 mm Polycarbonate Direction of motion
Photodetector
Substrate
Lens Prism Infrared laser diode
Figure 2-26. Cross section of a CD-R disk and laser (not to scale). A silver CD-ROM has a similar structure, except without the dye layer and with a pitted aluminum layer instead of a gold layer.
Polycarbonate substrate 1 0.6 mm Single-sided disk
Semireflective layer
, , , ,
Aluminum reflector
Adhesive layer
Aluminum reflector
0.6 mm Single-sided disk
Polycarbonate substrate 2
Figure 2-27. A double-sided, dual layer DVD disk.
Semireflective layer
SCSI controller Sound card
Modem
Card cage Edge connector Figure 2-28. Physical structure of a personal computer.
Monitor
CPU
Memory
Video controller
Keyboard
Floppy disk drive
Hard disk drive
Keyboard controller
Floppy disk controller
Hard disk controller
Bus
Figure 2-29. Logical structure of a simple personal computer.
Memory bus
SCSI bus
SCSI scanner
SCSI disk
Sound card
Main memory
PCI bridge
CPU cache
SCSI controller
Printer controller
Video controller
ISA bridge
Network controller PCI bus
Modem
ISA bus
Figure 2-30. A typical modern PC with a PCI bus and an ISA bus. The modem and sound card are ISA devices; the SCSI controller is a PCI device.
Horizontal scan Grid Screen
Electron gun
Spot on screen Vacuum Vertical deflection plate
Vertical retrace Horizontal retrace (a)
(b)
Figure 2-31. (a) Cross section of a CRT. (b) CRT scanning pattern.
Liquid crystal Rear glass plate Rear electrode
ÃÁCAÃÁCA
Rear polaroid
Front glass plate Front electrode Front polaroid
y Dark
z
Bright
Light source
Notebook computer (b) (a)
Figure 2-32. (a) The construction of an LCD screen. (b) The grooves on the rear and front plates are perpendicular to one another.
Character
Attribute Analog video signal
CPU
Main memory
Video board A2B2C2
Monitor Video RAM
ABC
Bus
Figure 2-33. Terminal output on a personal computer.
CPU
Serial I/O card Memory UART RS-232-C connector
Terminal
Telephone line (analog) ABC ABC
Modem
Modem Keyboard
Some signals: Protective ground (1) Transmit (2) Receive (3) Request to send (4) Clear to send (5) Data set ready (6) Common return (7) Carrier detect (8) Data terminal ready (20)
Figure 2-34. Connection of an RS-232-C terminal to a computer. The numbers in parentheses in the list of signals are the pin numbers.
Pointer controlled by mouse Window
Menu
Cut Paste Copy
Mouse buttons Mouse
Rubber ball
Figure 2-35. A mouse being used to point to menu items.
(a)
(b)
Figure 2-36. (a) The letter ‘‘A’’ on a 5 × 7 matrix. (b) The letter ‘‘A’’ printed with 24 overlapping needles.
Rotating octagonal mirror
Laser
Drum sprayed and charged Light beam strikes drum Drum
Toner Scraper Discharger Heated rollers Blank paper
Stacked output Figure 2-37. Operation of a laser printer.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 2-38. Halftone dots for various gray scale ranges. (a) 0–6. (b) 14–20. (c) 28–34. (d) 56–62. (e) 105–111. (f) 161–167.
(a)
Voltage
V2
0
1
0
0
1
Time 0 1
1
0
0
0
1
0
0
V1 High amplitude
Low amplitude
High frequency
Low frequency
(b)
(c)
(d)
Phase change
Figure 2-39. Transmission of the binary number 01001011000100 over a telephone line bit by bit. (a) Twolevel signal. (b) Amplitude modulation. (c) Frequency modulation. (d) Phase modulation.
ISDN terminal Digital bit pipe T
U NT1
ISDN telephone
ISDN terminal
ISDN alarm
Customer's equipment
ISDN exchange
To carrier's internal network
Carrier's equipment
Figure 2-40. ISDN for home use.
3 THE DIGITAL LOGIC LEVEL
1
+VCC +VCC +VCC Vout V1
Collector
Vout
Vout Vin
V2
V1
V2
Emitter
Base (a)
(b)
(c)
Figure 3-1. (a) A transistor inverter. (b) A NAND gate. (c) A NOR gate.
NOT A
X
A
NAND X
B A 0 1
X 1 0
(a)
NOR
A
X
B A 0 0 1 1
B 0 1 0 1 (b)
X 1 1 1 0
AND
A
X
B A 0 0 1 1
B 0 1 0 1 (c)
X 1 0 0 0
OR
A
X
B A 0 0 1 1
B 0 1 0 1 (d)
X 0 0 0 1
A 0 0 1 1
B 0 1 0 1
X 0 1 1 1
(e)
Figure 3-2. The symbols and functional behavior for the five basic gates.
A B C
A B C
A 1 A 4
5
B
ABC
ABC
2 A 0 0 0 0 1 1 1 1
B 0 0 1 1 0 0 1 1
C 0 1 0 1 0 1 0 1
(a)
M 0 0 0 1 0 1 1 1
8
B 6 ABC C 3 C 7
ABC
(b)
Figure 3-3. (a) The truth table for the majority function of three variables. (b) A circuit for (a).
M
A
A
A
A (a)
A A
AB
A+B
B B
A AB
A
A+B
B B (b)
(c)
Figure 3-4. Construction of (a) NOT, (b) AND, and (c) OR gates using only NAND gates or only NOR gates.
AB
A B
AB + AC
A
A(B + C)
B AC
C
B+C
C
A
B
C
AB
AC
AB + AC
A
B
C
A
B+C
A(B + C)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
1
0
0
1
0
0
1
1
0
0
0
0
1
1
0
1
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
1
0
1
1
1
0
1
1
1
1
1
1
0
1
0
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(a)
(b)
Figure 3-5. Two equivalent functions. (a) AB + AC. (b) A(B + C).
Name
AND form
OR form
Identity law
1A = A
0+A=A
Null law
0A = 0
1+A=1
Idempotent law
AA = A
A+A=A
Inverse law
AA = 0
A+A=1
Commutative law
AB = BA
A+B=B+A
Associative law
(AB)C = A(BC)
(A + B) + C = A + (B + C)
Distributive law
A + BC = (A + B)(A + C)
A(B + C) = AB + AC
Absorption law
A(A + B) = A
A + AB = A
De Morgan's law
AB = A + B
A + B = AB
Figure 3-6. Some identities of Boolean algebra.
AB
=
A+B
A+B
(a)
AB
=
(c)
=
AB
(b)
A+B
A+B
=
AB
(d)
Figure 3-7. Alternative symbols for some gates: (a) NAND. (b) NOR. (c) AND. (d) OR.
A
A
B
XOR
0
0
0
0
1
1
1
0
1
A
1
1
0
B
B
(a)
(b)
A
A
B
B
A
A
B
B (c)
(d)
Figure 3-8. (a) The truth table for the XOR function. (b)-(d) Three circuits for computing it.
A
B
F
A
B
F
A
B
F
0V
0V
0V
0
0
0
1
1
1
0V
5V
0V
0
1
0
1
0
1
5V
0V
0V
1
0
0
0
1
1
5V
5V
5V
1
1
1
0
0
0
(a)
(b)
Figure 3-9. (a) Electrical characteristics of a device. (b) Positive logic. (c) Negative logic.
(c)
VCC 14
13
12
11
10
9
8
1
2
3
4
5
6
7
Notch
GND
Figure 3-10. An SSI chip containing four gates.
Pin 8
D0 D1 D2 D3 F
D4 D5 D6 D7 A A B B C C
A
B
C
Figure 3-11. An eight-input multiplexer circuit.
VCC
D0
D0
D1
D1
D2
D2
D3
F
D4
D3
D5
D5
D6
D6
D7
D7
A B C (a)
F
D4
A B C (b)
Figure 3-12. (a) An MSI multiplexer.. (b) The same multiplexer wired to compute the majority function.
D0
D1
A
B
A
D2
A
D3
B
D4
B C
C C
D5
D6
D7
Figure 3-13. A 3-to-8 decoder circuit.
EXCLUSIVE OR gate A0 B0
A1 B1 A=B A2 B2
A3 B3 Figure 3-14. A simple 4-bit comparator.
A
If this fuse is blown, B is not an input to AND gate 1.
B 12 3 2 = 24 input signals
L
24 input lines
0
1
49
0
1 6 outputs If this fuse is blown, AND gate 1 is not an input to OR gate 5.
50 input lines
5
Figure 3-15. A 12-input, 6-output programmable logic array. The little squares represent fuses that can be burned out to determine the function to be computed. The fuses are arranged in two matrices: the upper one for the AND gates and the lower one for the OR gates.
D0
D1
D2
D3
D4
D5
D6
D7
S0
S1
S2
S3
S4
S5
S6
S7
C
Figure 3-16. A 1-bit left/right shifter.
Exclusive OR gate A
B
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
1
Sum Carry A
Sum
B
Carry
Figure 3-17. (a) Truth table for 1-bit addition. (b) A circuit for a half adder.
Carry in Carry Carry Sum in out
A
B
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
1
0
1
0
1
1
1
0
0
1
1
1
1
1
1
A
Sum
B
Carry out (a)
(b)
Figure 3-18. (a) Truth table for full adder. (b) Circuit for a full adder.
Logical unit
Carry in
AB INVA A ENA B ENB
A+B
Output
B Sum
Enable lines
F0
Full adder
F1
Decoder
Carry out
Figure 3-19. A 1-bit ALU.
F1 F0
A7 B7
A6 B6
A5 B5
A4 B4
A3 B3
A2 B2
A1 B1
A0 B0
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
1-bit ALU
O7
O6
O5
O4
O3
O2
O1
O0
Carry in
Carry out
Figure 3-20. Eight 1-bit ALU slices connected to make an 8bit ALU. The enables and invert signals are not shown for simplicity.
INC
C1
Delay
C2
(a)
(b)
A B C (c) Figure 3-21. (a) A clock. (b) The timing diagram for the clock. (c) Generation of an asymmetric clock.
S
0
1
Q
S
0
0
Q
1
1 R
0
0 0
0 (a)
Q
R
1
0 (b)
Q
A
B
NOR
0
0
1
0
1
0
1
0
0
1
1
0
(c)
Figure 3-22. (a) NOR latch in state 0. (b) NOR latch in state 1. (c) Truth table for NOR.
S Q Clock Q R Figure 3-23. A clocked SR latch.
D Q
Q
Figure 3-24. A clocked D latch.
d ∆
a
b
b AND c d
c
(a)
c
b
a Time (b)
Figure 3-25. (a) A pulse generator. (b) Timing at four points in the circuit.
D Q
Q
Figure 3-26. A D flip-flop.
D
Q
CK
(a)
D
Q
CK
(b)
D
Q
CK
(c)
Figure 3-27. D latches and flip-flops.
D
Q
CK
(d)
VCC 13
14
12
11
10
D
Q
2
Q
CK Q PR
CK Q PR
1
8
CLR
CLR D
9
3
4
5
6
7 GND
(a) VCC 20
19
Q
2
D
17
D
16
15
Q
Q
14
D
13
D
12
CK CLR
CK CLR
CK CLR
CLR CK
CLR CK
CLR CK
CLR CK
D
3
Q
4
D
Q
5
6
D
7
Q
8
11
Q
CK CLR
Q
1
18
D
9
10 GND
(b)
Figure 3-28. (a) Dual D flip-flop. (b) Octal flip-flop.
Data in I2 I1 I0 Write gate
Word 0 select line
A1 A0
Word 1 select line
Word 2 select line
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
D Q
D Q
D Q
CK
CK
CK
Word 0
Word 1
Word 2
Word 3
CS • RD
CS O1
RD
O2 O3 OE
Output enable = CS • RD • OE
Figure 3-29. Logic diagram for a 4 × 3 memory. Each row is one of the four 3-bit words. A read or write operation always reads or writes a complete word.
Data in
Data out
Control (a)
(b)
(c)
(d)
Figure 3-30. (a) A noninverting buffer. (b) Effect of (a) when control is high. (c) Effect of (a) when control is low. (d) An inverting buffer.
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18
512K 3 8 Memory chip (4 Mbit)
D0 D1 D2 D3 D4 D5 D6 D7
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
4096K 3 1 Memory chip D (4 Mbit)
RAS CAS
CS WE OE
CS WE OE
(a)
(b)
Figure 3-31. Two ways of organizing a 4-Mbit memory chip.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Byte 1 1 1 1 Type 1 Category 1 Erasure 1 alterable 1 Volatile 1 1 Typical use 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 SRAM 1 Read/write 1 Electrical 1 Yes 1 Yes 1 Level 2 cache 1 1 DRAM 1 Read/write 1 Electrical 1 Yes 1 Yes 1 Main memory 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 ROM 1 Read-only 1 Not possible 1 No 1 No 1 Large volume appliances 1 1 PROM 1 Read-only 1 Not possible 1 No 1 No 1 Small volume equipment 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 EPROM 1 Read-mostly 1 UV light 1 No 1 No 1 Device prototyping 1 1 EEPROM1 Read-mostly 1 Electrical 1 Yes 1 No 1 Device prototyping 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 112222222222222222222222222222222222222222222222222222222222222222222222222222222 Flash 11 Read/write 11 Electrical 11 No 11 No 11 Film for digital camera 11
Figure 3-32. A comparison of various memory types.
Addressing Data Bus control
Bus arbitration Coprocessor
Typical MicroProcessor
Status
Interrupts
Symbol for clock signal
Miscellaneous
Φ +5v
Symbol for electrical ground
Power is 5volts
Figure 3-33. The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins are used. For a specific CPU, a number will be given to tell how many.
CPU chip Buses Registers
Memory bus
Bus controller
I/O bus
ALU
On-chip bus
Memory
Disk
Modem
Figure 3-34. A computer system with multiple buses.
Printer
222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Master Slave Example 1 1 1 1 CPU 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Memory 1 Fetching instructions and data 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 CPU I/O device Initiating data transfer 1 1 1 1 CPU 2 1 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 Coprocessor 1 CPU handing instruction off to coprocessor 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Memory 1 DMA (Direct Memory Access) 1 I/O 1 1 1 1 Coprocessor 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 CPU 1 Coprocessor fetching operands from CPU 1
Figure 3-35. Examples of bus masters and slaves.
20-Bit address 20-Bit address
Control
20-Bit address Control 8088
80286
4-Bit address 80386 Control 8-Bit address
4-Bit address
Control
Control Control (a)
(b)
(c)
Figure 3-36. Growth of an address bus over time.
Read cycle with 1 wait state T1 Φ
T2
T3
TAD
ADDRESS
Memory address to be read
TDS DATA
Data TM
MREQ
TMH
TML TRH
RD TDH
TRL WAIT
Time (a)
Symbol
Parameter
Min
TAD
Address output delay
TML
Address stable prior to MREQ
Max
Unit
11
nsec
6
nsec
TM
MREQ delay from falling edge of Φ in T1
8
nsec
TRL
RD delay from falling edge of Φ in T1
8
nsec
TDS
Data setup time prior to falling edge of Φ
TMH
MREQ delay from falling edge of Φ in T3
8
nsec
TRH
RD delay from falling edge of Φ in T3
8
nsec
TDH
Data hold time from negation of RD
5
0
nsec
nsec
(b)
Figure 3-37. (a) Read timing on a synchronous bus. (b) Specification of some critical times.
ADDRESS
Memory address to be read
MREQ
RD
MSYN
DATA
Data
SSYN
Figure 3-38. Operation of an asynchronous bus.
Bus request Bus grant Arbiter Bus grant may or may not be propagated along the chain
1
2
3
4
5
3
4
5
I/O devices (a)
Arbiter
Bus request level 1 Bus request level 2 Bus grant level 2 Bus grant level 1
1
2 (b)
Figure 3-39. (a) A centralized one-level bus arbiter using daisy chaining. (b) The same arbiter, but with two levels.
Bus request Busy +5v Arbitration line
In Out
In Out
In Out
In Out
In Out
1
2
3
4
5
Figure 3-40. Decentralized bus arbitration.
T1
T2
T3
T4
T5
T6
Data
Data
Data
Φ
ADDRESS
DATA
Memory address to be read
Count
Data
MREQ RD WAIT BLOCK
Figure 3-41. A block transfer.
T7
INT INTA
CPU
RD WR A0 CS
8259A Interrupt controller
D0-D7
IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7
Clock Keyboard
Disk
+5 v
Figure 3-42. Use of the 8259A interrupt controller.
Printer
14.0 cm
Pentium II SEC cartridge
512 KB unified L2 cache
Pentium II processor
6.3 cm
16 KB level 1 instruction cache
To local bus
16 KB level 1 data cache
Contact
1.6 cm
Figure 3-43. The Pentium II SEC package.
Bus arbitration
Request
BPRI# LOCK# Misc# A# ADS# REQ# Parity#
Error
Misc#
Snoop
Misc#
Response
RS# TRDY# Parity#
Data
D# DRDY# DBSY# Parity#
RESET# 3 Interrupts
33 5
5 3
VID 4
5 3
Compatibity 11
Pentium II CPU
Diagnostics 3
3
Initialization 2 Power management
64 7 Miscellaneous 8
27
35
Φ Power
Figure 3-44. Logical pinout of the Pentium II. Names in upper case are the official Intel names for individual signals. Names in mixed case are groups of related signals or signal descriptions.
Bus cycle T1
T2
T3
T4
T5
T6
T7
T8
T9
Req
Error
Snoop
Resp
Data
Req
Error
Snoop
Resp
Data
Req
Error
Snoop
Resp
Req
Error
Snoop
Resp
Req
Error
Snoop
Req
Error
Snoop
Req
Error
T10
T11
T12
Φ Transaction 1 2 3 4 5 6 7
Data Data Resp
Data Resp
Snoop
Data Resp
Data
Figure 3-45. Pipelining requests on the Pentium II’s memory bus.
Pin 1 Index
Figure 3-46. The UltraSPARC II CPU chip.
18
Tag address Tag valid
Level 2 cache tags
Bus arbitration
5
Memory address
35
Address parity 25
Tag data
4
Tag parity
Address valid UltraSPARC II CPU Wait
20
Data address Reply
Data address valid Level 2 cache data
UPA interface to main memory
4
Level 1 caches 128
Data
16
Parity
5 Control
UDB II memory buffer
Memory data
128
Memory ECC
16
Figure 3-47. The main features of the core of an UltraSPARC II system.
Programmable I/O lines
16
MicroJava 701 CPU Level 1 caches
PCI bus
Flash PROM
I
Main memory
D Memory bus
Figure 3-48. A microJava 701 system.
Motherboard
PC bus connector
PC bus
Plug-in Contact board Chips
CPU and other chips
New connector for PC/AT
Edge connector
Figure 3-49. The PC/AT bus has two components, the original PC part and the new part.
Local bus
Cache bus
Level 2 cache
Memory bus
PCI bridge
CPU
Main memory PCI bus
SCSI
USB
ISA bridge
IDE disk
Graphics adaptor
Available PCI slot
Monitor Mouse
Modem
Keyboard
ISA bus
Sound card
Printer
Available ISA slot
Figure 3-50. Architecture of a typical Pentium II system. The thicker buses have more bandwidth than the thinner ones.
PCI device
PCI device
PCI device
Figure 3-51. The PCI bus uses a centralized bus arbiter.
GNT#
REQ#
GNT#
REQ#
GNT#
REQ#
GNT#
REQ#
PCI arbiter
PCI device
22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Signal 1 Lines 1 Master 1 Slave 1 Description 1 1 1 1 1 1 CLK 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Clock (33 MHz or 66 MHz) 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 32 1 1 1 Multiplexed address and data lines 1 AD × × 1 PAR 1 1 1 1 Address or data parity bit 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 C/BE 4 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bus command/bit map for bytes enabled 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Indicates that AD and C/BE are asserted 1 FRAME# 1 1 1 × 1 1 1 1 1 1 IRDY# 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Read: master will accept; write: data present 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Select configuration space instead of memory 1 IDSEL 1 1 × 1 DEVSEL# 1 1 1 Slave has decoded its address and is listening 1 1 1 × 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 TRDY# 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Read: data present; write: slave will accept 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Slave wants to stop transaction immediately 1 STOP# 1 1 × 1 1 1 1 1 1 PERR# 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Data parity error detected by receiver 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Address parity error or system error detected 1 SERR# 1 1 1 REQ# 1 1 1 Bus arbitration: request for bus ownership 1 1 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 GNT# 1 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bus arbitration: grant of bus ownership 1 1122222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 11 Reset the system and all devices 11 RST# 1 11 (a) 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Sign 1 Lines 1 Master 1 Slave 1 1 Description 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 × 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 REQ64# 1 1 1 Request to run a 64-bit transaction 1 ACK64# 1 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Permission is granted for a 64-bit transaction 1 1 AD 1 32 1 1 1 1 × Additional 32 bits of address or data 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 PAR64 1 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Parity for the extra 32 address/data bits 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Additional 4 bits for byte enables 1 C/BE# 4 1 × 1 1 1 1 1 1 LOCK 1 1 × 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Lock the bus to allow multiple transactions 1 1 1 1 SBO# 1 1 1 Hit on a remote cache (for a multiprocessor) 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 SDONE 1 1 1 1 Snooping done (for a multiprocessor) 1 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 INTx 4 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Request an interrupt 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 IEEE 1149.1 JTAG test signals 1 JTAG 5 1 1 1 1 1 1 1 1 1 1 M66EN 1 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Wired to power or ground (66 MHz or 33 MHz) 1 (b)
Figure 3-52. (a) Mandatory PCI bus signals. (b) Optional PCI bus signals.
Bus cycle Read
T1
Idle
T2
T3
T4
White
T5
T6
T7
Φ Turnaround AD C/BE#
Address Read cmd
Data Enable
Address
Data
Write cmd
Enable
FRAME# IRDY# DEVSEL# TRDY#
Figure 3-53. Examples of 32-bit PCI bus transactions. The first three cycles are used for a read operation, then an idle cycle, and then three cycles for a write operation.
Time (msec) 1
0
2
3
Idle Frame 1
Frame 0
Frame 2
Packets from root
Packets from root SOF
SOF
IN
DATA ACK
Frame 3
SOF
SOF OUT DATA ACK From device
Data packet from device
SYN PID PAYLOAD CRC
SYN PID PAYLOAD CRC
Figure 3-54. The USB root hub sends out frames every 1.00 msec.
8
CS A0-A1
2 8255A Parallel I/O chip
WR RD RESET D0-D7
8
8
8 Figure 3-55. An 8255A PIO chip.
Port A
Port B
Port C
RAM at address 8000H
PIO at FFFCH
, ,
EPROM at address 0
0
4K 8K 12K 16K 20K 24K 28K 32K 36K 40K 44K 48K 52K 56K 60K 64K
Figure 3-56. Location of the EPROM, RAM, and PIO in our 64K address space.
A0 Address bus A15
CS
CS
2K 3 8 EPROM
2K 3 8 RAM
CS PI0
(a) A0 Address bus A15
CS
CS
2K 3 8 EPROM
2K 3 8 RAM
CS PI0
(b)
Figure 3-57. (a) Full address decoding. (b) Partial address decoding.
4 THE MICROARCHITECTURE LEVEL
1
MAR
To and from main memory
Memory control registers
MDR
PC
MBR
SP
LV
Control signals Enable onto B bus
CPP
Write C bus to register TOS
OPC C bus
B bus H A
ALU control
B
6
N Z
ALU
Shifter
Shifter control 2
Figure 4-1. The data path of the example microarchitecture used in this chapter.
2222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222 F 1 F 1 ENA 1 ENB 1 INVA 1 INC 1 Function 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 0 0 A 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 0 0 B 3 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 A 1 0 1 12222222222222222222222222222222222222222222222222 3 1 1 0 1 B 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 A+B 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 A+B+1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 A+1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 B + 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 B−A 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 B−1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 −A 1 0 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 A AND B 1 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 A OR B 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 0 1 0 0 0 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 −1 1 1 0 1 0 1 1 1 0 Figure 4-2. Useful combinations of ALU signals and the function performed.
Registers loaded instantaneously from C bus and memory on rising edge of clock
Shifter output stable
Cycle 1 starts here
Clock cycle 1
∆w
∆x
Set up signals to drive data path Drive H and B bus
∆y
Clock cycle 2
New MPC used to load MIR with next microinstruction here
∆z
ALU and shifter
MPC available here
Propagation from shifter to registers
Figure 4-3. Timing diagram of one data path cycle.
32-Bit MAR (counts in words) Discarded 0 0
32-Bit address bus (counts in bytes)
Figure 4-4. Mapping of the bits in MAR to the address bus.
Bits
9
3
NEXT_ADDRESS
Addr
J M P C
J A M N
8 J A M Z
JAM
S L L 8
9
3
4
S F0 F1 E E I I H O T C L S P M M W R F R P O P V P C D A R E E N N N N I T R R T A C A C S P A B V C 1 A E D H
ALU
C
Mem
B bus
B
B bus registers 0 = MDR 1 = PC 2 = MBR 3 = MBRU 4 = SP
Figure 4-5. The microinstruction format for the Mic-1.
5 = LV 6 = CPP 7 = TOS 8 = OPC 9 -15 none
Memory control signals (rd, wr, fetch) 3 4 4-to-16 Decoder
MAR MDR
MPC
9
PC O
8
MBR SP
512 × 36-Bit control store for holding the microprogram
8
LV
JMPC
CPP
Addr
J
ALU
C
MIR M B
TOS JAMN/JAMZ
OPC H
B bus
2 1-bit flip–flop
N
6 ALU control
High bit
ALU
Control signals Enable onto B bus
Z Shifter C bus
2 Write C bus to register
Figure 4-6. The complete block diagram of our example microarchitecture, the Mic-1.
Address
Addr
JAM
0x75
0x92
001
Data path control bits JAMZ bit set
…
0x92
…
0x192
One of these will follow 0x75 depending on Z
Figure 4-7. A microinstruction with JAMZ set to 1 has two potential successors.
SP LV SP
LV SP LV
a3 a2 a1 (a)
108 104 100
b4 b3 b2 b1 a3 a2 a1
c2 c1 b4 b3 b2 b1 a3 a2 a1
(b)
(c)
SP
LV
d5 d4 d3 d2 d1 a3 a2 a1 (d)
Figure 4-8. Use of a stack for storing local variables. (a) While A is active. (b) After A calls B. (c) After B calls C. (d) After C and B return and A calls D.
, , , SP
SP
LV
a2 a3 a2 a1
(a)
LV
a3 a2 a3 a2 a1
(b)
SP
LV
a2 + a3 a3 a2 a1 (c)
SP LV
a3 a2 a2 + a3 (d)
Figure 4-9. Use of an operand stack for doing an arithmetic computation.
Current Operand Stack 3
SP
Current Local Variable Frame 3 LV Local Variable Frame 2 Constant Pool
Local Variable Frame 1
Method Area
CPP
Figure 4-10. The various parts of the IJVM memory.
PC
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Hex 1 Mnemonic Meaning 1 1 1 1 0x10 1 BIPUSH byte 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push byte onto stack 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x59 DUP Copy top word on stack and push onto stack 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 0xA7 1 GOTO offset 1 Unconditional branch 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push their sum 1 0x60 1 IADD 1 1 1 1 0x7E IAND Pop two words from stack; push Boolean AND 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x99 1 IFEQ offset 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and branch if it is zero 1 1 0x9B 1 IFLT offset 1 Pop word from stack and branch if it is less than zero 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 offset Pop two words from stack; branch if equal 0x9F IF 3 ICMPEQ 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x84 1 IINC varnum const 1 Add a constant to a local variable 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x15 1 ILOAD varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push local variable onto stack 1 1 0xB6 1 INVOKEVIRTUAL disp 1 Invoke a method 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x80 1 IOR 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push Boolean OR 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0xAC IRETURN Return from method with integer value 1 1 1 1 0x36 1 ISTORE varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and store in local variable 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x64 ISUB Pop two words from stack; push their difference 1 1 1 1 index Push constant from constant pool onto stack 0x13 LDC 3 W 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Do nothing 1 0x00 1 NOP 1 1 1 1 0x57 1 POP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Delete word on top of stack 1 0x5F 1 SWAP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Swap the two top words on the stack 1 1 0xC4 1 WIDE 1 Prefix instruction; next instruction has a 16-bit index 1 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-11. The IJVM instruction set. The operands byte, const, and varnum are 1 byte. The operands disp, index, and offset are 2 bytes.
Stack after INVOKEVIRTUAL Caller's LV Caller's PC Space for caller's local variables
Stack before INVOKEVIRTUAL Pushed parameters
Caller's local variable frame
Parameter 3 Parameter 2 Parameter 1 OBJREF Previous LV Previous PC
SP
Caller's local variables Parameter 2 Parameter 1 Link ptr (a)
SP
Stack base after INVOKEVIRTUAL
Stack base before INVOKEVIRTUAL LV
Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variables Parameter 2 Parameter 1 Link ptr (b)
Figure 4-12. (a) Memory before executing INVOKEVIRTUAL. (b) After executing it.
LV
Stack before IRETURN Return value Previous LV Previous PC
SP
Caller's local variables Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variable frame
Caller's local variables Parameter 2 Parameter 1 Link ptr (a)
Stack base before IRETURN LV
Stack after IRETURN Return value Previous LV Previous PC
Stack base after IRETURN
SP
Caller's local variables Parameter 2 Parameter 1 Link ptr
LV
(b)
Figure 4-13. (a) Memory before executing IRETURN. (b) After executing it.
i = j + k; if (i == 3) k = 0; else j = j − 1;
(a)
1 ILOAD j // i = j + k 2 ILOAD k 3 IADD 4 ISTORE i 5 ILOAD i // if (i < 3) 6 BIPUSH 3 7 IF3ICMPEQ L1 8 ILOAD j // j = j − 1 9 BIPUSH 1 10 ISUB 11 ISTORE j 12 GOTO L2 13 L1: BIPUSH 0 14 ISTORE k 15 L2: (b)
0x15 0x02 0x15 0x03 0x60 0x36 0x01 0x15 0x01 0x10 0x03 0x9F 0x00 0x0D 0x15 0x02 0x10 0x01 0x64 0x36 0x02 0xA7 0x00 0x07 // k = 0 0x10 0x00 0x36 0x03 (c)
Figure 4-14. (a) A Java fragment. (b) The corresponding Java assembly language. (c) The IJVM program in hexadecimal.
0
j 1
k j 2
j+k 3
j 8
1 j 9
j–1 10
11
4
j 5
3 j 6
7
12
0 13
14
15
Figure 4-15. The stack after each instruction of Fig. 4-14(b).
222222222222222222222222222 1222222222222222222222222222 1 DEST = H 1 1 DEST = SOURCE 2 22222222222222222222222222 1 1 33 1222222222222222222222222222 1 DEST = H 3 33333333 1 1 DEST = SOURCE 21 22222222222222222222222222 1 1 DEST = H + SOURCE 21 22222222222222222222222222 1 DEST = H + SOURCE + 1 1 21 222222222222222222222222221 DEST = H + 1 1 21 22222222222222222222222222 1 DEST = SOURCE + 1 1 21 22222222222222222222222222 1 DEST = SOURCE − H 21 222222222222222222222222221 1 DEST = SOURCE − 1 21 22222222222222222222222222 1 1 DEST = −H 21 22222222222222222222222222 1 DEST = H AND SOURCE 1 21 22222222222222222222222222 1 DEST = H OR SOURCE 1 21 222222222222222222222222221 DEST = 0 1 21 22222222222222222222222222 1 DEST = 1 1 21 22222222222222222222222222 1 DEST = −1 12222222222222222222222222221 Figure 4-16. All permitted operations. Any of the above operations may be extended by adding ‘‘ 0); // Close the files. CloseHandle(inhandle); CloseHandle(outhandle);
Figure 6-40. A program fragment for copying a file using the Windows NT API functions. This fragment is in C because Java hides the low-level system calls and we are trying to expose them.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 UNIX 1 API function Meaning 1 1 1 1 CreateDirectory 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 mkdir 1 Create a new directory 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 RemoveDirectory rmdir Remove an empty directory 1 1 1 1 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 FindFirstFile 1 opendir 1 Initialize to start reading the entries in a directory 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 readdir 1 Read the next directory entry FindNextFile 1 1 1 1 MoveFile Move a file from one directory to another 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 11 chdir 11 Change the current working directory 11 112222222222222222222222222222222222222222222222222222222222222222222222222222222 SetCurrentDirectory
Figure 6-41. The principal Win32 API functions for directory management. The second column gives the nearest UNIX equivalent, when one exists.
Standard MS-DOS information File name name Security
MFT entry for one file MFT header
Master file table
Figure 6-42. The Windows NT master file table.
Data
A
A
A
A
Original process
A
Children of A
A
Grandchildren of A
Figure 6-43. A process tree in UNIX.
2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Thread call Meaning 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 pthread3create 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a new thread in the caller’s address space 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3exit Terminate the calling thread 1 1 1 pthread3join 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Wait for a thread to terminate 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Create a new mutex 1 pthread3mutex3init 1 1 1 pthread3mutex3destroy Destroy a mutex 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3mutex3lock 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Lock a mutex 1 1 pthread3mutex3unlock 1 Unlock a mutex 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread 3 cond 3 init Create a condition variable 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 pthread3cond3destroy 1 Destroy a condition variable 1 21 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 pthread3cond3wait 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Wait on a condition variable 1 1 pthread3cond3signal 1 Release one thread waiting on a condition variable 1 12222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1
Figure 6-44. The principal POSIX thread calls.
7 THE ASSEMBLY LANGUAGE LEVEL
1
2222222222222222222222222222222222222222222222222222222222222222222 1 1 Programmer-years to 1 Program execution 1 12222222222222222222222222222222222222222222222222222222222222222222 1 produce the program 1 time in seconds 1 1 1 1 1 Assembly language 50 33 1 1 1 1 10 1 100 1 1 High-level language 1 1 1 1 1 1 Mixed approach before tuning 1 1 1 1 1 Critical 10% 1 1 90 1 1 1 1 Other 90% 9 10 1 1 1 1 1 33 1 33 1 1 1 Total 10 1 100 1 1 1 1 1 1 1 1 Mixed approach after tuning 1 1 1 1 1 1 Critical 10% 6 30 1 1 1 Other 90% 9 1 10 1 1 1 1 33 33 1 1 1 1 1 Total 15 1 40 1 12222222222222222222222222222222222222222222222222222222222222222222 1
Figure 7-1. Comparison of assembly language and high-level language programming, with and without tuning.
Opcode Operands Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222 FORMULA: MOV EAX,I ; register EAX = I ADD EAX,J ; register EAX = I + J MOV N,EAX ;N=I+J I J N
DW DW DW
3 4 0
; reserve 4 bytes initialized to 3 ; reserve 4 bytes initialized to 4 ; reserve 4 bytes initialized to 0 (a)
Label Opcode Operands Comments 2222222222222222222222222222222222222222222222222222222222222222222222 FORMULA MOVE.L I, D0 ; register D0 = I ADD.L J, D0 ; register D0 = I + J MOVE.L D0, N ;N=I+J I J N
DC.L DC.L DC.L
3 4 0
; reserve 4 bytes initialized to 3 ; reserve 4 bytes initialized to 4 ; reserve 4 bytes initialized to 0 (b)
Opcode Operands Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222 FORMULA: SETHI %HI(I),%R1 ! R1 = high-order bits of the address of I LD [%R1+%LO(I)],%R1 ! R1 = I SETHI %HI(J),%R2 ! R2 = high-order bits of the address of J LD [%R2+%LO(J)],%R2 ! R2 = J NOP ! wait for J to arrive from memory ADD %R1,%R2,%R2 ! R2 = R1 + R2 SETHI %HI(N),%R1 ! R1 = high-order bits of the address of N ST %R2,[%R1+%LO(N)] I: J: N:
.WORD 3 .WORD 4 .WORD 0
! reserve 4 bytes initialized to 3 ! reserve 4 bytes initialized to 4 ! reserve 4 bytes initialized to 0 (c)
Figure 7-2. Computation of N = I + J. (a) Pentium II. (b) Motorola 680x0. (c) SPARC.
222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pseudoinstr 1 Meaning 1 1 1 SEGMENT Start a new segment (text, data, etc.) with certain attributes 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 End the current segment 1 ENDS 1 1 1 ALIGN 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Control the alignment of the next instruction or data 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Define a new symbol equal to a given expression 1 EQU 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 DB 1 Allocate storage for one or more (initialized) bytes 1 DD 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Allocate storage for one or more (initialized) 16-bit halfwords 1 1 DW 1 Allocate storage for one or more (initialized) 32-bit words 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 DQ 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Allocate storage for one or more (initialized) 64-bit double words 1 1 PROC 1 Start a procedure 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 ENDP 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 End a procedure 1 1 MACRO 1 Start a macro definition 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 ENDM End a macro definition 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 PUBLIC Export a name defined in this module 1 1 1 EXTERN 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Import a name from another module 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Fetch and include another file 1 INCLUDE 1 1 1 IF 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 Start conditional assembly based on a given expression 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 ELSE Start conditional assembly if the IF condition above was false 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 ENDIF 1 End conditional assembly 1 COMMENT 1 Define a new start-of-comment character 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 PAGE 1 Generate a page break in the listing 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 END 222222222222222222222222222222222222222222222222222222222222222222222222222 1 Terminate the assembly program 1
Figure 7-3. Some of the pseudoinstructions available in the Pentium II assembler (MASM).
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
SWAP
MACRO MOV EAX,P MOV EBX,Q MOV Q,EAX MOV P,EBX ENDM SWAP SWAP
(a)
(b)
Figure 7-4. Assembly language code for interchanging P and Q twice. (a) Without a macro. (b) With a macro.
2222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Item Macro call Procedure call 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 When is the call made? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 During assembly 1 During execution 1 1 Is the body inserted into the object 1 Yes 1 No 1 1 program every place the call is 1 1 1 1 1 1 1 made? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Is a procedure call instruction 1 No 1 Yes 1 1 inserted into the object program 1 1 1 1 1 1 1 and later executed? 21 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Must a return instruction be used 1 No 1 Yes 1 12222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 after the call is done? 1 1 1 1 1 How many copies of the body ap1 One per macro call 1 1 1 pear in the object program? 12222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1
Figure 7-5. Comparison of macro calls with procedure calls.
MOV MOV MOV MOV
EAX,P EBX,Q Q,EAX P,EBX
MOV MOV MOV MOV
EAX,R EBX,S S,EAX R,EBX
CHANGE
MACRO P1, P2 MOV EAX,P1 MOV EBX,P2 MOV P2,EAX MOV P1,EBX ENDM CHANGE P, Q CHANGE R, S
(a)
(b)
Figure 7-6. Nearly identical sequences of statements. (a) Without a macro. (b) With a macro.
Label Opcode Operands Comments Length ILC 222222222222222222222222222222222222222222222222222222222222222222222222222 MARIA: MOV EAX,I EAX = I 5 100 MOV EBX, J EBX = J 6 105 ROBERTA: MOV ECX, K ECX = K 6 111 2 117 IMUL EAX, EAX EAX = I * I 3 119 IMUL EBX, EBX EBX = J * J 3 122 IMUL ECX, ECX ECX = K * K 2 125 MARILYN: ADD EAX, EBX EAX = I * I + J * J 2 127 ADD EAX, ECX EAX = I * I + J * J + K * K STEPHANY: JMP DONE branch to DONE 5 129
Figure 7-7. The instruction location counter (ILC) keeps track of the address where the instructions will be loaded in memory. In this example, the statements prior to MARIA occupy 100 bytes.
222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222 1 Value 1 1 Symbol Other information 1 1 1 1 MARIA 1222222222222222222222222222222222222222222222222 1 100 1 1 1222222222222222222222222222222222222222222222222 1 ROBERTA 1 111 1 1 MARILYN 1 125 1 1 222222222222222222222222222222222222222222222222 1 1 1 1 STEPHANY 1 129 1 1222222222222222222222222222222222222222222222222 1 Figure 7-8. A symbol table for the program of Fig. 7-7.
2222222222222222222222222222222222222222222222222222222222222222222222 1 1 First 1 Second 1 Hexadecimal 1 Instruc- 1 Instruc- 1 1 Opcode 1 operand 1 operand 1 1 tion 1 tion 1 opcode 1 1 1 1 1 1 1 length class 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 AAA — — 37 1 6 1 ADD 1 EAX 1 immed32 1 1 1 1 05 5 4 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 ADD 01 2 19 1 reg 1 reg 1 1 1 1 21 222222222222222222222222222222222222222222222222222222222222222222222 1 AND 1 EAX 1 immed32 1 1 1 1 25 5 4 21 222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 AND 21 2 19 12222222222222222222222222222222222222222222222222222222222222222222222 1 reg 1 reg 1 1 1 1 Figure 7-9. A few excerpts from the opcode table for a Pentium II assembler.
public static void pass3one( ) { // This procedure is an outline of pass one of a simple assembler. boolean more3input = true; // flag that stops pass one String line, symbol, literal, opcode; // fields of the instruction int location 3counter, length, value, type; // misc. variables final int END3STATEMENT = −2; // signals end of input location 3counter = 0; initialize 3tables( );
// assemble first instruction at 0 // general initialization
while (more 3input) { line = read3next3line( ); length = 0; type = 0;
// more3input set to false by END // get a line of input // # bytes in the instruction // which type (format) is the instruction
if (line 3is3not3comment(line)) { symbol = check3for3symbol(line); // is this line labeled? if (symbol != null) // if it is, record symbol and value enter3new3symbol(symbol, location 3counter); literal = check3for3literal(line); // does line contain a literal? if (literal != null) // if it does, enter it in table enter3new3literal(literal); // Now determine the opcode type. −1 means illegal opcode. opcode = extract3opcode(line); // locate opcode mnemonic type = search3opcode3table(opcode); // find format, e.g. OP REG1,REG2 if (type < 0) // if not an opcode, is it a pseudoinstruction? type = search3pseudo3table(opcode); switch(type) { // determine the length of this instruction case 1: length = get3length3of3type1(line); break; case 2: length = get3length3of3type2(line); break; // other cases here } } write 3temp3file(type, opcode, length, line);// useful info for pass two location 3counter = location 3counter + length;// update loc3ctr if (type == END3STATEMENT) { // are we done with input? more3input = false; // if so, perform housekeeping tasks rewind 3temp3for3pass3two( ); // like rewinding the temp file sort3literal 3table( ); // and sorting the literal table remove3redundant3literals( ); // and removing duplicates from it } } }
Figure 7-10. Pass one of a simple assembler.
public static void pass3two( ) { // This procedure is an outline of pass two of a simple assembler. boolean more 3input = true; // flag that stops pass one String line, opcode; // fields of the instruction int location3counter, length, type; // misc. variables final int END3STATEMENT = −2; // signals end of input final int MAX3CODE = 16; // max bytes of code per instruction byte code[ ] = new byte[MAX 3CODE]; // holds generated code per instruction location3counter = 0;
// assemble first instruction at 0
while (more3input) { // more3input set to false by END type = read3type( ); // get type field of next line opcode = read3opcode( ); // get opcode field of next line length = read3length( ); // get length field of next line line = read3line( ); // get the actual line of input if (type != 0) { // type 0 is for comment lines switch(type) { // generate the output code case 1: eval3type1(opcode, length, line, code); break; case 2: eval3type2(opcode, length, line, code); break; // other cases here } } write3output(code); // write the binary code write3listing(code, line); // print one line on the listing location3counter = location3counter + length;// update loc3ctr if (type == END3STATEMENT) {// are we done with input? more3input = false; // if so, perform housekeeping tasks finish3up( ); // odds and ends } } } Figure 7-11. Pass two of a simple assembler.
Andy Anton Cathy Dick Erik Frances Frank Gerrit Hans Henri Jan Jaco Maarten Reind Roel Willem Wiebren
14025 31253 65254 54185 47357 56445 14332 32334 44546 75544 17097 64533 23267 63453 76764 34544 34344
0 4 5 0 6 3 3 4 4 2 5 6 0 1 7 6 1
(a)
Hash table
Linked table
0
Andy
14025
Maarten
23267
1
Reind
63453
Wiebren
34344
2
Henri
75544
3
Frances
56445
Frank
14332
4
Hans
44546
Gerrit
32334
5
Jan
17097
Cathy
65254
6
Jaco
64533
Willem
34544
7
Roel
76764
Dick
54185
Anton
31253
Erik
47357
(b)
Figure 7-12. Hash coding. (a) Symbols, values, and the hash codes derived from the symbols. (b) Eight-entry hash table with linked lists of symbols and values.
Source procedure 1
Source procedure 2
Source procedure 3
Object module 1
Translator
Object module 2
Linker
Executable binary program
Object module 3
Figure 7-13. Generation of an executable binary program from a collection of independently translated source procedures requires using a linker.
Object module B 600 500
CALL C
Object module A 400
400
300
CALL B
300
200
MOVE P TO X
200 100
100 0
MOVE Q TO X
BRANCH TO 200
0
BRANCH TO 300
Object module C 500 400
CALL D Object module D 300
300 200
MOVE R TO X
MOVE S TO X
100
100 0
200
BRANCH TO 200
0
BRANCH TO 200
Figure 7-14. Each module has its own address space, starting at 0.
1900 1800
1900 MOVE S TO X
1700 1600
1500
Object module D
BRANCH TO 200
1500
CALL D
1000
MOVE R TO X
1300
BRANCH TO 200
1100 1000
CALL C
MOVE Q TO X
Object module B
800
700
600
600 BRANCH TO 300
400
CALL B
300
MOVE P TO X
200
100 0
CALL 1600
MOVE R TO X
Object module C
BRANCH TO 1300
CALL 1100
900
700
500
BRANCH TO 1800
1200
900 800
Object module D
1400 Object module C
1200
1100
MOVE S TO X
1700 1600
1400
1300
1800
500
Object module A
MOVE Q TO X
Object module B
BRANCH TO 800
400
CALL 500
300
MOVE P TO X
Object module A
200 BRANCH TO 200
100
BRANCH TO 300
0
Figure 7-15. (a) The object modules of Fig. 7-14 after being positioned in the binary image but before being relocated and linked. (b) The same object modules after linking and after relocation has been performed. Together they form an executable binary program, ready to run.
End of module Relocation dictionary
Machine instructions and constants
External reference table Entry point table Identification Figure 7-16. The internal structure of an object module produced by a translator.
2200 2100
MOVE S TO X
2000 1900 1800
Object module D
BRANCH TO 1800
CALL 1600
1700 1600
MOVE R TO X
Object module C
1500 1400
1300
BRANCH TO 1300
CALL 1100
1200
1100
MOVE Q TO X
Object module B
1000
900 800
BRANCH TO 800
700
CALL 500
600
MOVE P TO X
Object module A
500 400
BRANCH TO 300
0
Figure 7-17. The relocated binary program of Fig. 7-15(b) moved up 300 addresses. Many instructions now refer to an incorrect memory address.
, ,,
A procedure segment
CALL EARTH
The linkage segment rect Indi ssing e Invalid address r add E A R T H
CALL FIRE
Invalid address A I R
Linkage information for the procedure of AIR
Invalid address F I R E
Name of the procedure is stored as a character string
CALL AIR
CALL WATER CALL EARTH
Indirect word
w
Invalid address A T E R
CALL WATER
(a)
A procedure segment
CALL EARTH
The linkage segment rect Indi ssing Address of earth re add E A R T H
To earth
Invalid address A I R
CALL FIRE CALL AIR
F
CALL WATER
Invalid address I R E
Invalid address W A T E R
CALL EARTH
CALL WATER
(b)
Figure 7-18. Dynamic linking. (a) Before EARTH is called. (b) After EARTH has been called and linked.
User process 1
User process 2
DLL Header A B C D
Figure 7-19. Use of a DLL file by two processes.
8 PARALLEL COMPUTER ARCHITECTURES
1
P
P
P
P
P P
Shared memory
P P P
P
P (a)
P
CPU
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P P
P
P
P
(b)
Figure 8-1. (a) A multiprocessor with 16 CPUs sharing a common memory. (b) An image partitioned into 16 sections, each being analyzed by a different CPU.
M
P
M
P
M
P
M
P
M
M
M
M
Private memory
P
P
P
P
CPU
Messagepassing interconnection network
P
P
P
P
M
M
M
M
(a)
P
P
M
P
P
M
P
P
M
P
P
M
P
P
P
P
CPU P
Messagepassing interconnection network
P P P
P
P
P
P
(b)
Figure 8-2. (a) A multicomputer with 16 CPUs, each with each own private memory. (b) The bit-map image of Fig. 8-1 split up among the 16 memories.
Machine 1
Machine 2
Machine 1
Machine 2
Machine 1
Machine 2
Application
Application
Application
Application
Application
Application
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Language run-time system
Operating system
Operating system
Operating system
Operating system
Operating system
Operating system
Hardware
Hardware
Hardware
Hardware
Hardware
Hardware
Shared memory
Shared memory
Shared memory
(a)
(b)
(c)
Figure 8-3. Various layers where shared memory can be implemented. (a) The hardware. (b) The operating system. (c) The language runtime system.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 8-4. Various topologies. The heavy dots represent switches. The CPUs and memories are not shown. (a) A star. (b) A complete interconnect. (c) A tree. (d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.
Input port
CPU 1
Output port A
B
C
D
End of packet
Middle of packet
Four-port switch
CPU 2
Front of packet
Figure 8-5. An interconnection network in the form of a fourswitch square grid. Only two of the CPUs are shown.
CPU 1
Entire packet
Input port
Four-port switch
Output port
A
B
A
B
A
B
C
D
C
D
C
D
CPU 2 Entire packet
Entire packet (a)
(b)
(c)
Figure 8-6. Store-and-forward packet switching.
CPU 1 B
C
D
,
,
A
CPU 3
CPU 2
Four-port switch
Input port Output buffer
CPU 4
Figure 8-7. Deadlock in a circuit-switched interconnection network.
60 N-body problem 50
Linear speedup
Speedup
40
Awari
30
20
10 Skyline matrix inversion 0
0
10
20
30 40 Number of CPUs
50
60
Figure 8-8. Real programs achieve less than the perfect speedup indicated by the dotted line.
n CPUs active
…
Inherently sequential part
Potentially parallelizable part
1 CPU active
f
1–f
f
1–f
fT
(1 – f)T/n
T (a)
(b)
Figure 8-9. (a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel.
CPU
Bus (a)
(b)
(c)
(d)
Figure 8-10. (a) A 4-CPU bus-based system. (b) A 16-CPU bus-based system. (c) A 4-CPU grid-based system. (d) A 16CPU grid-based system.
P1 P1
P2
Work queue
P3
P1
P2
P3 P1
Synchronization point
P1
P3
P5
P4
P2 P2
P2
P6
P3 P7
P8
Process
Synchronization point P9
(a)
(b)
(c)
(d)
Figure 8-11. Computational paradigms. (a) Pipeline. (b) Phased computation. (c) Divide and conquer. (d) Replicated worker.
P3
222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Physical 1 Logical 1 (hardware) 1 1 1 (software) Examples 21 22222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multiprocessor 1 Shared variables 1 Image processing as in Fig. 8-1 1 1 Multiprocessor 1 Message passing 1 Message passing simulated with buffers in memory 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 Multicomputer 1 Shared variables 1 DSM, Linda, Orca, etc. on an SP/2 or a PC network 1 1 Multicomputer 1 Message passing 1 PVM or MPI on an SP/2 or a network of PCs 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 8-12. Combinations of physical and logical sharing.
2222222222222222222222222222222222222222222222222222222222222222222222222 1 Instruction 1 Data 1 1 1 1 streams 1 streams 1 Name 1 1 Examples 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 SISD 1 Classical Von Neumann machine 1 1 1 1 Multiple 1 SIMD 1 Vector supercomputer, array processor 1 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Multiple 12222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 MISD 1 Arguably none 1 1 Multiple 1 Multiple 1 MIMD 1 Multiprocessor, multicomputer 1 1 2222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1
Figure 8-13. Flynn’s taxonomy of parallel computers.
Parallel computer architectures
SISD
SIMD
MISD
(Von Neumann)
MIMD
?
Vector processor
Array processor
UMA
Bus
Multiprocessors
COMA
Switched
Multicomputers
NUMA
CC-NUMA
Shared memory
NC-NUMA
MPP
Grid
COW
Hypercube
Message passing
Figure 8-14. A taxonomy of parallel computers.
Input vectors
Vector ALU
Figure 8-15. A vector ALU.
222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222 1 1 Operation Examples 1 1 1 = f (B ) f = cosine, square root A i 1 i 1 2 22222222222222222222222222222222222222222222222 1 1 1 1222222222222222222222222222222222222222222222222 1 1 f2 = sum, minimum Scalar = f2 (A) 1 1 1 Ai = f3 (Bi, Ci ) 1222222222222222222222222222222222222222222222222 1 f3 = add, subtract 1 Ai = f4 (scalar, Bi ) 1 f4 = multiply Bi by a constant 1 1222222222222222222222222222222222222222222222222 Figure 8-16. Various combinations of vector and scalar operations.
2 2222222222222222222222222222222222222222222222222222222 1 1 1 12Step Name Values 2222222222222222222222222222222222222222222222222222222 1 1 1 12 11 1 − 9.212 × 10 1 Fetch operands 1.082 × 10 21 2222222222222222222222222222222222222222222222222222222 1 1 1 12 12 1 1 12 2222222222222222222222222222222222222222222222222222222 1 2 Adjust exponent 1.082 × 10 − 0.9212 × 10 1 1 1 1 12 3 21 2222222222222222222222222222222222222222222222222222222 1 1 Execute subtraction 1 0.1608 × 10 11 4 12 2222222222222222222222222222222222222222222222222222222 1 Normalize result 1 1.608 × 10 1 Figure 8-17. Steps in a floating-point subtraction.
222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Cycle 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 Step 1 2 3 4 5 6 7 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Fetch operands B1 , C1 1 B2 , C2 1 B3 , C3 1 B4 , C4 1 B5 , C5 1 B6 , C6 1 B7 , C7 1 1 1 1 B1 , C1 1 B2 , C2 1 B3 , C3 1 B4 , C4 1 B5 , C5 1 B6 , C6 1 Adjust exponent 1 21 22222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 + C B + C B + C B + C B + C Execute operation1 B 1 1 2 2 1 3 3 1 4 4 1 5 5 1 1 1 1 1 1 11 11 11 B1 + C1 11 B2 + C2 11 B3 + C3 11 B4 + C4 11 Normalize result 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 8-18. A pipelined floating-point adder.
A
B
S
64 24-Bit holding registers for addresses
8 24-Bit address registers
ADD
8 64-Bit scalar registers
64 64-Bit holding registers for scalars
8 64-Bit vector registers
ADD
ADD
ADD
BOOLEAN
MUL
BOOLEAN
SHIFT
RECIP.
SHIFT
MUL Address units
64 Elements per register
T
POP. COUNT Scalar integer units
Scalar/vector floatng-point units
Vector integer units
Figure 8-19. Registers and functional units of the Cray-1
CPU 2 Write 200 1
Write 100
x
Read 2x
Read 2x
3
W100
W100
W200
W200
R3 = 100
R4 = 200
R3 = 200
W200
W100
R3 = 200
R4 = 200
R3 = 100
R4 = 200
R3 = 200
R4 = 100
R4 = 200
R4 = 200
R3 = 100
(b)
(c)
(d)
4 (a)
Figure 8-20. (a) Two CPUs writing and two CPUs reading a common memory word. (b) - (d) Three possible ways the two writes and four reads might be interleaved in time.
Write
CPU A
1A
CPU B
1B
2A
CPU C
1C
1D 1E
2B
2C
3A
3B
1F
3C
Synchronization point Time
Figure 8-21. Weakly consistent memory uses synchronization operations to divide time into sequential epochs.
2D
CPU
CPU
M
Shared memory
Private memory
Shared memory CPU
CPU
M
CPU
CPU
Cache Bus (a)
(b)
(c)
Figure 8-22. Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories.
M
22222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222 1 1 Action 1 Local request Remote request 1 1 1 1 Read miss Fetch data from memory 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222 1 1 1 Read hit Use data from local cache 1 Write miss 1 Update data in memory 1 1 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 Write hit 1 Update cache and memory 1 Invalidate cache entry 1 22222222222222222222222222222222222222222222222222222222222222 Figure 8-23. The write through cache coherence protocol. The empty boxes indicate that no action is taken.
CPU 1
CPU 2
CPU 3 Memory
(a)
CPU 1 reads block A
A Exclusive Bus
Cache
CPU 1
CPU 2
CPU 3 Memory
(b)
CPU 2 reads block A
A Shared
Shared Bus
CPU 1
CPU 2
CPU 3 Memory
(c)
CPU 2 writes block A
A Modified Bus
CPU 1
CPU 2
CPU 3
A
A
Memory
(d) Shared
CPU 3 reads block A
Shared Bus
CPU 1
CPU 2
CPU 3 Memory
(e)
CPU 2 writes block A
A Modified Bus
CPU 1
CPU 2
CPU 3 Memory
(f)
CPU 1 writes block A
A Modified Bus
Figure 8-24. The MESI cache coherence protocol.
111
110
101
100
011
010
001
000
Memories Crosspoint switch is open
000 001
CPUs
010
(b)
011
Crosspoint switch is closed
100 101 110 111 (c)
Closed crosspoint switch
Open crosspoint switch (a)
Figure 8-25. (a) An 8 × 8 crossbar switch. (b) An open crosspoint. (c) A closed crosspoint.
16 × 16 Crossbar switch (Gigaplane-XB) Transfer unit is 64-byte cache block Board had 4 GB + 4 CPUs
UltraSPARC CPU
… 0
1
2
1-GB memory module
14
15
Four address buses for snooping
Figure 8-26. The Sun Enterprise 10000 symmetric multiprocessor.
A
X
B
Y (a)
Module
Address
Opcode
(b)
Figure 8-27. (a) A 2 × 2 switch. (b) A message format.
Value
3 Stages CPUs
Memories
000 001
1A
2A
000
3A
b
b
010
1B
2B
b
010
3B
011
011 b
100 1C
100 3C
2C
101 110 111
001
101 a
a 1D
a
2D
a
3D
Figure 8-28. An omega switching network.
110 111
CPU Memory
MMU
Local bus
CPU Memory
Local bus
CPU Memory
Local bus
CPU Memory
Local bus
System bus
Figure 8-29. A NUMA machine based on two levels of buses. The Cm* was the first multiprocessor to use this design.
Node 0
Node 1
CPU Memory
CPU Memory
Local bus
Local bus
Node 255 CPU Memory
Directory
… Local bus
Interconnection network (a) 218-1 Bits
8
18
6
Node
Block
Offset
(b)
4 3 2 1 0
0 0 1 0 0
82
(c)
Figure 8-30. (a) A 256-node directory-based multiprocessor. (b) Division of a 32-bit memory address into fields. (c) The directory at node 36.
Intercluster interface CPU with cache
Intercluster bus (nonsnooping) Memory
D
0
1
D
4
5
D
8
12
9
D
13
D
D
2
D
D
6
D
D
10
D
D
14
Local bus (snooping)
3
7
11
15
D
D
D
D Directory
Cluster
(a)
Cluster Block This is the directory for cluster 13. This bit tells whether cluster 0 has block 1 of the memory homed here in any of its caches.
0 1 2 3 4 5 6 7 8 9…
3 2 1 0
State 15
Uncached, shared, modified
(b)
Figure 8-31. (a) The DASH architecture. (b) A DASH directory.
Quad board with 4 Pentium Pros and up to 4 GB of RAM Snooping bus interface Directory controller
32-MB cache RAM Directory
Data pump
IQ board
SCI ring
RAM
CPU
Figure 8-32. The NUMA-Q multiprocessor.
Local memory table at home node
Bits 6 7 13 Back State Tag 219-1
6 Fwd
Back State
Tag
Fwd
Back State
Tag
Fwd
0 Node 4 cache directory
Node 9 cache directory
Node 22 cache directory
Figure 8-33. SCI chains all the holders of a given cache line together in a doubly-linked list. In this example, a line is shown cached at three nodes.
CPU
Node
Memory
…
…
Local interconnect
Disk and I/O
…
Local interconnect
Communication processor High-performance interconnection network
Figure 8-34. A generic multicomputer.
Disk and I/O
Network
Disk
Tape
GigaRing
Alpha
Shell
Node
Mem
Alpha
Mem
Control + E registers
Control + E registers
Commun. processor
Commun. processor
Alpha
…
Full-duplex 3D torus
Figure 8-35. The Cray Research T3E.
Mem
Control + E registers Commun. processor
Kestrel board
64-Bit local bus
38
PPro
PPro
64 MB
I/O
NIC
PPro
PPro
64 MB
I/O
NIC
32 2
64-Bit local bus
(a)
(b)
Figure 8-36. The Intel/Sandia Option Red system. (a) The kestrel board. (b) The interconnection network.
CPU group
CPU group
CPU group
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
1
1
4
2
3 5
Time
3
9
5
8 9
6
7
3 6
2
6 8
4
9
5
4
1 7
2
8
7
(a)
(b)
(c)
Figure 8-37. Scheduling a COW. (a) FIFO. (b) Without headof-line blocking. (c) Tiling. The shaded areas indicate idle CPUs.
CPU
CPU
CPU Backplane
Packet going east
Packet going west
(a)
Line card Ethernet (b)
Figure 8-38. (a) Three computers on an Ethernet. (b) An Ethernet switch.
Switch
CPU 1
2
3
4
Cell 7
5 Packet
6
Port
8
Virtual circuit
9
11
10
12
ATM switch
13 14
15 16
Figure 8-39. Sixteen CPUs connected by four ATM switches. Two virtual circuits are shown.
Globally shared virtual memory consisting of 16 pages 0
0
1
2
2
5
9
3
4
5
6
1
3
8
10
CPU 0
7
8
6
9
10 11 12 13 14 15
4
7
12
14
CPU 1
2
9
10
5
1
3
6
8
CPU 0
13
15 Memory
CPU 2
CPU 3
Network
(a)
0
11
4
7
12
14
CPU 1
11
13
CPU 2
15
CPU 3
(b)
0
2
9
10 CPU 0
5
1
3
8
10
6
CPU 1
4
7
12
14 CPU 2
11
13
15
CPU 3
(c)
Figure 8-40. A virtual address space consisting of 16 pages spread over four nodes of a multicomputer. (a) The initial situation. (b) After CPU 0 references page 10. (c) After CPU 1 references page 10, here assumed to be a read-only page.
(′′abc′′, 2, 5) (′′matrix-1′′, 1, 6, 3.14) (′′family′′, ′′is sister′′, Carolyn, Elinor) Figure 8-41. Three Linda tuples.
Object implementation stack; top:integer; # storage for the stack stack: array [integer 0..N-1] of integer; operation push(item: integer); function returning nothing begin stack[top] := item; push item onto the stack top := top + 1; # increment the stack pointer end; operation pop( ): integer; begin guard top > 0 do top := top - 1; return stack[top]; od; end; begin top := 0; end;
# function returning an integer # suspend if the stack is empty # decrement the stack pointer # return the top item
# initialization
Figure 8-42. A simplified ORCA stack object, with internal data and two operations.
A BINARY NUMBERS
1
dn
…
100's place
10's place
1's place
d2
d1
d0
.
.1's place
.01's place
.001's place
d–1
d–2
d–3
n
Number =
Σ
di × 10i
i = –k
Figure A-1. The general form of a decimal number.
…
d–k
1
Binary
1
Octal
1
1
1
1× 1024
+1× + 512
+1× + 256
+1× + 128
3
7
2
1
210
29
28
27
+1× + 64
0 26
+0× +0
1 25
+1× + 16
0 24
+0× +0
0 23
+0× +0
0 22
+0× +0
1 21
+ 1 × 20 +1
3 × 8 + 7 × 8 + 2 × 8 + 1 × 80 1536 + 448 + 16 + 1 3
Decimal
2
2
0
1
0
1
2 × 103 + 0 × 102 + 0 × 101 + 1 × 100 +0 +1 2000 + 0 Hexadecimal
7
D
1
.
7 × 162 + 13 × 161 + 1 × 160 1792 + 208 +1
Figure A-2. The number 2001 in binary, octal, and hexadecimal.
2222222222222222222222222222222222222222 1 Octal 1 Hex 1 Decimal 1 Binary 21 222222222222222222222222222222222222222 1 0 1 0 1 0 1 0 1 21 222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 21 222222222222222222222222222222222222222 2 1 10 1 2 1 2 1 21 222222222222222222222222222222222222222 3 1 11 1 3 1 3 1 21 222222222222222222222222222222222222222 1 1 1 1 1 4 1 100 1 3 1 3 1 21 222222222222222222222222222222222222222 5 1 101 1 5 1 5 1 21 222222222222222222222222222222222222222 6 1 110 1 6 1 6 1 21 222222222222222222222222222222222222222 1 1 1 1 7 111 7 7 1 21 222222222222222222222222222222222222222 1 1 1 1 8 1 1000 1 10 1 8 1 21 222222222222222222222222222222222222222 9 1 1001 1 11 1 9 1 21 222222222222222222222222222222222222222 12222222222222222222222222222222222222222 1 1 1 10 1010 12 A 1 1 1 1 1 1 11 1 1011 1 13 1 B 1 21 222222222222222222222222222222222222222 12 1 1100 1 14 1 C 1 21 222222222222222222222222222222222222222 13 1 1101 1 15 1 D 1 21 222222222222222222222222222222222222222 1 14 1 1110 1 16 1 E 1 21 222222222222222222222222222222222222222 1 1 1 1 15 1 1111 1 17 1 F 1 21 222222222222222222222222222222222222222 16 1 10000 1 20 1 10 1 21 222222222222222222222222222222222222222 20 1 10100 1 24 1 14 1 21 222222222222222222222222222222222222222 1 1 1 1 1 30 1 11110 1 36 1 1E 1 21 222222222222222222222222222222222222222 40 1 101000 1 50 1 28 1 21 222222222222222222222222222222222222222 50 1 110010 1 62 1 32 1 21 222222222222222222222222222222222222222 1 1 1 60 111100 74 1 3C 1 21 222222222222222222222222222222222222222 1 1 1 1 70 1 1000110 1 106 1 46 1 21 222222222222222222222222222222222222222 80 1 1010000 1 120 1 50 1 21 222222222222222222222222222222222222222 12222222222222222222222222222222222222222 1 90 1011010 1 132 1 5A 1 1 1 1 1 1 100 1 11001000 1 144 1 64 1 21 222222222222222222222222222222222222222 1000 1 1111101000 1 1750 1 3E8 1 21 222222222222222222222222222222222222222 112222222222222222222222222222222222222222 2989 11 101110101101 11 5655 11 BA 11 Figure A-3. Decimal numbers and their binary, octal, and hexadecimal equivalents.
Example 1 Hexadecimal Binary Octal
. B 6 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0. 1 0 1 1 0 1 1 0 0 5 0 . 5 1 4 1 5 4 1
9
4
8
Example 2 Hexadecimal Binary Octal
C 4 . B 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1. 1 0 1 1 1 1 0 0 0 1 0 0 7 5 3 . 5 7 0 4 6 4 7
B
A
3
Figure A-4. Examples of octal-to-binary and hexadecimal-tobinary conversion.
Quotients
Remainders
1492 746
0
373
0
186
1
93
0
46
1
23
0
11
1
5
1
2
1
1
0
0
1
1 0 1 1 1 0 1 0 1 0 0 = 149210
Figure A-5. Conversion of the decimal number 1492 to binary by successive halving, starting at the top and working downward. For example, 93 divided by 2 yields a quotient of 46 and a remainder of 1, written on the line below it.
1
0
1
1
1
0
1
1
0
1
1
1 1 + 2 × 1499 = 2999
Result
1 + 2 × 749 = 1499 1 + 2 × 374 = 749 0 + 2 × 187 = 374 1 + 2 × 93 = 187 1 + 2 × 46 = 93 0 + 2 × 23 = 46 1 + 2 × 11 = 23 1 + 2 × 5 = 11 1+2×2=5 0+2×1=2 1+2×0=1
Start here
Figure A-6. Conversion of the binary number 101110110111 to decimal by successive doubling, starting at the bottom. Each line is formed by doubling the one below it and adding the corresponding bit. For example, 749 is twice 374 plus the 1 bit on the same line as 749.
222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 N N −N −N −N −N 1 1 1 1 1 decimal 1 binary signed mag. 1’s compl. 2’s compl. excess 128 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 00000001 1 10000001 1 11111110 1 11111111 1 01111111 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 2 1 00000010 1 10000010 1 11111101 1 11111110 1 01111110 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 3 1 00000011 1 10000011 1 11111100 1 11111101 1 01111101 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 4 1 00000100 1 10000100 1 11111011 1 11111100 1 01111100 1 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 5 1 00000101 1 10000101 1 11111010 1 11111011 1 01111011 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 6 1 00000110 1 10000110 1 11111001 1 11111010 1 01111010 1 1 1 1 1 1 1 1 7 00000111 10000111 11111000 11111001 01111001 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 8 1 00001000 1 10001000 1 11110111 1 11111000 1 01111000 1 1 1 1 1 1 1 1 9 1 00001001 1 10001001 1 11110110 1 11110111 1 01110111 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 10 1 00001010 1 10001010 1 11110101 1 11110110 1 01110110 1 1 1 1 1 1 1 1 20 1 00010100 1 10010100 1 11101011 1 11101100 1 01101100 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 30 1 00011110 1 10011110 1 11100001 1 11100010 1 01100010 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 40 1 00101000 1 10101000 1 11010111 1 11011000 1 01011000 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 50 1 00110010 1 10110010 1 11001101 1 11001110 1 01001110 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 60 1 00111100 1 10111100 1 11000011 1 11000100 1 01000100 1 2 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 70 1 01000110 1 11000110 1 10111001 1 10111010 1 00111010 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 80 1 01010000 1 11010000 1 10101111 1 10110000 1 00110000 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 90 01011010 11011010 10100101 10100110 00100110 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 100 01100100 11011010 10011011 10011100 00011100 1 1 1 1 1 1 1 1 127 1 01111111 1 11111111 1 10000000 1 10000001 1 00000001 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11222222222222222222222222222222222222222222222222222222222222222222222222222222222 128 11 Nonexistent 11 Nonexistent 11 Nonexistent 11 10000000 11 00000000 11
Figure A-7. Negative 8-bit numbers in four systems.
Addend Augend Sum Carry
0 +0 33 0 0
0 +1 33 1 0
1 +0 33 1 0
Figure A-8. The addition table in binary.
1 +1 33 0 1
Decimal
1's complement
2's complement
10 + (−3)
00001010 11111100
00001010 11111101
+7
1 00000110
1 00000111
carry 1
discarded
00000111 Figure A-9. Addition in one’s complement and two’s complement.
B FLOATING-POINT NUMBERS
1
5 Positive underflow
3 Negative underflow 1 Negative overflow
—10100
2 Expressible negative numbers
4 Zero
—10—100 0
6 Expressible positive numbers
10—100
7 Positive overflow
10100
Figure B-1. The real number line can be divided into seven regions.
2 22222222222222222222222222222222222222222222222222222222222222 1 Digits in fraction 1 Digits in exponent 1 Lower bound 1 Upper bound1 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −12 9 1 1 1 10 1 1 3 1 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −102 3 2 1099 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −1002 999 3 3 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −10002 9999 1 1 1 1 1 3 4 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −13 9 1 1 1 1 1 4 1 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10−103 1 1 4 2 1099 1 1 1 1 1 −1003 4 3 10999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −10003 4 4 109999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 −14 9 1 1 1 1 1 5 1 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −104 99 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 5 2 10 1 1 1 1 1 −1004 5 3 10999 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1 1 1 1 1 1 −10004 9999 5 4 10 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −1009 999 1 1 1 1 1 10 3 10 10 21 22222222222222222222222222222222222222222222222222222222222222 1 1 1 1 −1019 999 1 1 1 1 1 20 3 10 12 22222222222222222222222222222222222222222222222222222222222222 1 1 10 1 1
Figure B-2. The approximate lower and upper bounds of expressible (unnormalized) floating-point decimal numbers.
Example 1: Exponentiation to the base 2 2–2 2
Unnormalized:
0 1010100
–1
.0
2–4 2
–3
2–6 2
–5
2–8 2
–7
2–10 2
–9
2
2–12 –11
2
2–14 –13
2
2–16 –15
20 –12 –13 –15 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 = 2 (1 × 2 + 1 × 2 + 1 × 2
+ 1 × 2–16) = 432 Sign Excess 64 Fraction is 1 × 2–12+ 1 × 2–13 –15 –16 + exponent is +1 × 2 + 1 × 2 84 – 64 = 20 To normalize, shift the fraction left 11 bits and subtract 11 from the exponent. Normalized:
0 1001001
.1
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 = 29 (1 × 2–1+ 1 × 2–2+ 1 × 2–4 + 1 × 2–5) = 432
Fraction is 1 × 2–1 + 1 × 2–2 +1 × 2–4 + 1 × 2–5
Sign Excess 64 + exponent is 73 – 64 = 9
Example 2: Exponentiation to the base 16
Unnormalized:
0 1000101
.
16–1
16–2
16–3
0 0 00
0 0 00
0 0 01
16–4 1 0 1 1 = 165 (1 × 16–3+ B × 16–4) = 432
Fraction is 1 × 16–3 + B × 16–4
Sign Excess 64 + exponent is 69 – 64 = 5
To normalize, shift the fraction left 2 hexadecimal digits, and subtract 2 from the exponent. Normalized:
0 1000011 Sign Excess 64 + exponent is 67 – 64 = 3
.
0001
1011
0000
0 0 0 0 = 163 (1 × 16–1+ B × 16–2) = 432
Fraction is 1 × 16–1 + B × 16–2
Figure B-3. Examples of normalized floating-point numbers.
Bits 1
8
23 Fraction
Sign
Exponent (a)
Bits 1
11
52
Exponent
Fraction
Sign (b)
Figure B-4. IEEE floating-point formats. (a) Single precision. (b) Double precision.
22222222222222222222222222222222222222222222222222222222222222222222222 1 1 Single precision 1 Double precision 1 Item 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Bits in sign 1 1 1 1 1 1 Bits in exponent 8 11 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Bits in fraction 1 1 1 23 52 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Bits, total 1 1 1 32 64 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Exponent system Excess 127 Excess 1023 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 Exponent range −126 to +127 −1022 to +1023 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 −126 −1022 1 Smallest normalized number 1 1 1 2 2 22222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 128 1024 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Largest normalized number approx. 2 approx. 2 1 1 1 1 −38 Decimal range to 1038 1 approx. 10−308 to 103081 122222222222222222222222222222222222222222222222222222222222222222222222 1 approx. 10 1 Smallest denormalized number1 1 1 approx. 10−45 approx. 10−324 122222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Figure B-5. Characteristics of IEEE floating-point numbers.
Normalized ±
0 < Exp < Max
Any bit pattern
Denormalized ±
0
Any nonzero bit pattern
Zero ±
0
0
Infinity ±
1 1 1…1
0
Not a number ±
1 1 1…1
Any nonzero bit pattern
Sign bit
Figure B-6. IEEE numerical types.